What is your argument (and in which direction)? Are we testing the 'too obvious' parts of the papers, or defining replication 'success' too easily? Or are we excluding an important part of what should be replicated?
As Simonsohn pointed out a while ago, a replication may not be significant, but also not significantly different to the original effect size. Generally, a failed replication doesn't mean that the claim is wrong. Specifically regarding the replications in experimental economics, see Yan Chen's paper in ExpEcon. The failed replications neglected important features of the experimental protocol. So the replication rate should actually be higher.
- finds a result that is within a reasonable confidence bound of the originally estimated effect size
- or if it has a smaller but 'significant' effect size in the same direction,
I don't think it is considered a 'failed replication'. Is it?
But generally, when I have seen these replication projects, I have often thought "yeah that will narrowly replicate but the thing they say is behind the result ... or the generalisation they are trying to make... is not correct". So sometimes I'd rather see 'replications' with some variation in the context.
This is to both of you. Saying wrong things is part of science. We need many mistakes to get at the right thing. I've been saying for years that the problem is not with production but with consumption. One paper is not enough to make a claim, even if it's in the top 5. I start believing in an effect when I see a body of work on the topic. The famous results in experimental economics are super robust (for example, positive and declining public goods donations). This is not the case with social psychology, especially when focusing on papers in Psych Science. Definitely not in marketing.
I fully agree with DR that we need to push the boundary conditions. Punishment in public goods is a good study case. We've had 8 to 10 years with scores of papers showing it works, followed by half a decade of papers showing when and where it fails. Unlike straight replications, this is the kind of research that gets rewarded, so it's less worrying.
Bob said something similar a while back. The counterargument is that then you get a two-class science. There are the big, replicated results which you can trust, but also a whole class of smaller results which you can't trust. Are they actually wrong? Or is it just that they weren't important enough to gain traction and get replications (in the broader sense)? And how can an outsider know which is which?
There are clues. If you see a paper that's widely cited for a decade, but you don't see replications, you know that there must be a ton of failed replications out there. I guess if it's not interesting enough for people to follow up on, it's not that important.
We need both, I think. But I think we need 'replicate the same experimental incentive structure (or general 'class' of stimulus) in a slightly different frame' too. Unfortunately, that is not particularly rewarded publication-wise.
> Hmm, again maybe. It’s certainly a big shock when you publish in a mainstream science journal and reviews come back within a week.
That is a separate issue. Again see bit.ly/unjournal -- we need more meaningful continuous feedback and rating, and less of the 'wait and waste' parts of the journal game.
But we need 'slower' more careful, larger-scale incremental social science imo. That has nothing to do with 'waiting 6 months for a rejection/acceptance from a journal.'
That seems low. Career incentives are much higher. But glad I don't have to worry about this any more. Academics talk about money and salaries a lot more than I thought they would
it occurs to me that one simple change would be to have a limit of say 8 years or 12 years as a reviewer for the top journals (with no ability to jump journals among the top). or perhaps instead a number of years after PhD cutoff. Just to keep the field moving faster. Only focusing on the top journals would avoid any reduction in quality, since there will still be plenty of super qualified reviewers that fit the criteria
So the incentives in economics at German universities designed to achieve the following goals (below Mannheim and Bonn): a) generate third party funding b) generate third party funding c) generate thrid party funding d) reach out to the public aka Twitter your opinion and e) publications, not necessarily innovative but in your "field", so that you can claim that you are "einschlägig". As third party funding mainly is awarded mainly on promises to deliver publishable and "relevant" outcomes, that can often not be kept due to the very nature of innovative research, this system creates perverse incentives. The incentives seem to be pretty much the same for all disciplines. The problem is that in the third party funding game, economists employ more rigorous standards (as they do for publication) as other social science disciplines, implying that there is less funding for rigorous research and more for less rigorous research in other social science disciplines, that have it far easier to claim to deliver output and get through with it. So there should be more imperialism also at the funding level, so that researchers from different disciplines who claim to resolve similar questions with their respective methodology actually should be evaluated to the same standards. So a sociologist should not get through with promises, an economist does not dare to make.
Sounds familiar. But I think in the long run our tough standards have helped the discipline gain credibility with policy-makers. (There might be some countervailing losses - thinking of what George Akerlof pointed out recently...)
> To sum up, there should be fewer social scientists, producing less.
I agree with the latter -- produce less, produce better, make it more readable, make it an organized continuously improved project rather than a disorganized vomiting of papers snuck into journals.
If even *one* paper from a typical academic economist's CV actually definitively showed what it claims to, it would be a huge boost to humanity.
But not *fewer* social scientists. We need *more*: to be able to do the sort of carefully checked, replicable/replicated well-justified and well-explained work that we need.
And I would come to hear that keynote!
I can argue about the rate of replication in economics, but everything else is spot on.
What is your argument (and in which direction)? Are we testing the 'too obvious' parts of the papers, or defining replication 'success' too easily? Or are we excluding an important part of what should be replicated?
As Simonsohn pointed out a while ago, a replication may not be significant, but also not significantly different to the original effect size. Generally, a failed replication doesn't mean that the claim is wrong. Specifically regarding the replications in experimental economics, see Yan Chen's paper in ExpEcon. The failed replications neglected important features of the experimental protocol. So the replication rate should actually be higher.
I mean, 60% though. Would it be OK if the "true" rate were 75%? Then only a quarter of what we publish is misleading....
Ah so you think we replicate *well*.
If a replication
- finds a result that is within a reasonable confidence bound of the originally estimated effect size
- or if it has a smaller but 'significant' effect size in the same direction,
I don't think it is considered a 'failed replication'. Is it?
But generally, when I have seen these replication projects, I have often thought "yeah that will narrowly replicate but the thing they say is behind the result ... or the generalisation they are trying to make... is not correct". So sometimes I'd rather see 'replications' with some variation in the context.
This is to both of you. Saying wrong things is part of science. We need many mistakes to get at the right thing. I've been saying for years that the problem is not with production but with consumption. One paper is not enough to make a claim, even if it's in the top 5. I start believing in an effect when I see a body of work on the topic. The famous results in experimental economics are super robust (for example, positive and declining public goods donations). This is not the case with social psychology, especially when focusing on papers in Psych Science. Definitely not in marketing.
I fully agree with DR that we need to push the boundary conditions. Punishment in public goods is a good study case. We've had 8 to 10 years with scores of papers showing it works, followed by half a decade of papers showing when and where it fails. Unlike straight replications, this is the kind of research that gets rewarded, so it's less worrying.
Bob said something similar a while back. The counterargument is that then you get a two-class science. There are the big, replicated results which you can trust, but also a whole class of smaller results which you can't trust. Are they actually wrong? Or is it just that they weren't important enough to gain traction and get replications (in the broader sense)? And how can an outsider know which is which?
I was thinking of Bob's ideas regarding exhibits.
There are clues. If you see a paper that's widely cited for a decade, but you don't see replications, you know that there must be a ton of failed replications out there. I guess if it's not interesting enough for people to follow up on, it's not that important.
We need both, I think. But I think we need 'replicate the same experimental incentive structure (or general 'class' of stimulus) in a slightly different frame' too. Unfortunately, that is not particularly rewarded publication-wise.
> Are we too slow?
> Hmm, again maybe. It’s certainly a big shock when you publish in a mainstream science journal and reviews come back within a week.
That is a separate issue. Again see bit.ly/unjournal -- we need more meaningful continuous feedback and rating, and less of the 'wait and waste' parts of the journal game.
But we need 'slower' more careful, larger-scale incremental social science imo. That has nothing to do with 'waiting 6 months for a rejection/acceptance from a journal.'
> 4-figure bonuses for a top five publication
That seems low. Career incentives are much higher. But glad I don't have to worry about this any more. Academics talk about money and salaries a lot more than I thought they would
it occurs to me that one simple change would be to have a limit of say 8 years or 12 years as a reviewer for the top journals (with no ability to jump journals among the top). or perhaps instead a number of years after PhD cutoff. Just to keep the field moving faster. Only focusing on the top journals would avoid any reduction in quality, since there will still be plenty of super qualified reviewers that fit the criteria
So the incentives in economics at German universities designed to achieve the following goals (below Mannheim and Bonn): a) generate third party funding b) generate third party funding c) generate thrid party funding d) reach out to the public aka Twitter your opinion and e) publications, not necessarily innovative but in your "field", so that you can claim that you are "einschlägig". As third party funding mainly is awarded mainly on promises to deliver publishable and "relevant" outcomes, that can often not be kept due to the very nature of innovative research, this system creates perverse incentives. The incentives seem to be pretty much the same for all disciplines. The problem is that in the third party funding game, economists employ more rigorous standards (as they do for publication) as other social science disciplines, implying that there is less funding for rigorous research and more for less rigorous research in other social science disciplines, that have it far easier to claim to deliver output and get through with it. So there should be more imperialism also at the funding level, so that researchers from different disciplines who claim to resolve similar questions with their respective methodology actually should be evaluated to the same standards. So a sociologist should not get through with promises, an economist does not dare to make.
Sounds familiar. But I think in the long run our tough standards have helped the discipline gain credibility with policy-makers. (There might be some countervailing losses - thinking of what George Akerlof pointed out recently...)
> To sum up, there should be fewer social scientists, producing less.
I agree with the latter -- produce less, produce better, make it more readable, make it an organized continuously improved project rather than a disorganized vomiting of papers snuck into journals.
But you know what I think about [THIS](bit.ly/unjournal)
If even *one* paper from a typical academic economist's CV actually definitively showed what it claims to, it would be a huge boost to humanity.
But not *fewer* social scientists. We need *more*: to be able to do the sort of carefully checked, replicable/replicated well-justified and well-explained work that we need.