A reproducibility effort has put high-profile journals under the spotlight by trying to replicate a slew of social-science results. In the work, published on 27 August in Nature Human Behaviour1, researchers attempted to reproduce 21 social-science results reported in Science and Nature between 2010 and 2015 and were able to reproduce 62% of the findings. That’s about twice the rate achieved by an earlier effort2 that examined the psychology literature more generally, but the latest result still raises questions about two out of every five papers studied.

Reproducibility of published work has been tested before2,3, but this is the first such effort that focuses on the top journals. “Putting the magnifying lens on high-impact journals is very necessary”, says Sarahanne Field, who studies meta-research at the University of Groningen in the Netherlands. “We assume that the quality of work in such outlets is always top-notch,” she says — but if it isn’t reproducible, “we need to re-evaluate how we see high-impact journals and what they offer.”

1,500 scientists lift the lid on reproducibility

The researchers — led by Brian Nosek of the University of Virginia in Charlottesville, who directed a previous effort to replicate 100 psychology studies — also created a ‘prediction market’ in which experts could bet on how reproducible a claim was likely to be. The market generated an overall replication rate very close to that observed in the study.

Main findings

To go about reproducing the studies, the researchers selected the key finding of each paper and sought to test them using protocols checked (in all cases bar one) by the original authors. They also increased the sample sizes compared with those used in the original studies, on average by a factor of five, to improve the confidence in the outcomes.

The team found a statistically significant effect in the same direction as was observed in the original study, for 13 of the 21 papers. But the strength of the effect was often smaller than what was originally reported: by about 50%, on average.

The claims tested ranged from a link between analytical thinking and religious scepticism to how “writing about testing worries boosts exam performance in the classroom”.

Nosek says that high-profile findings like these often receive substantial media interest. The study on exam performance4, for example, “has a lot of potential implications for stress and coping strategies in high-stakes situations”, he says. But the effect described was one of the findings that the researchers could not replicate.

Nosek suggests that the larger effect sizes reported in the original papers might have been partly due to the smaller sample sizes used. In that situation, “studies that obtain a significant result are likely to be exaggerations of the actual effect size”, he says.

The differences in effect size might also be the result of a publication bias, says cognitive scientist David Rand at the Massachusetts Institute of Technology in Cambridge: journals are more inclined to publish larger effects, and so will select for such studies.

Rand cautions that a failure to replicate should not be seen as an automatic invalidation of the original study — who, after all, is to say which of the two outcomes is correct?

How to make replication the norm

Besides, he says, meta-analyses often find a range of effect sizes including many non-significant results, even where there is an overall significant effect. For some of the papers that could not be replicated, Nature Human Behaviour has published accompanying commentaries from the original authors suggesting possible reasons for the discrepancies.

Betting market

To create the prediction market, for each paper included in their study, Nosek’s team assembled panels of up to 80 researchers, mostly psychologists and economists. Each participant read the study and could trade “shares” in the reliability of the reported result. The market predictions correlated well with the actual results, and generated an expected overall replication rate nearly identical to the one observed.

This result of the prediction market — a concept tested before in the context of reproducibility — suggest that “scientists are good assayers of the empirical reality of the world”, says Nicholas Christakis of Yale University in New Haven, Connecticut, who co-authored one of the studies that the new work was able to replicate.

Nosek says that participants in these markets often based their assessments on the quality of the statistical evidence and the plausibility of the original result. “If the original result was surprising, participants report having a sense that it is less likely to be true,” he says. “Hence the aphorism that extraordinary claims require extraordinary evidence.”

Researchers c﻿an take several steps to improve rates of reproducibility, says Nosek. In particular, researchers could be more open about their procedures and data, and should clearly state their aims and hypotheses when pre-registering experiments — a practice that is rising rapidly in the social-behavioural sciences. “Researchers are taking reproducibility seriously and looking for ways to improve the credibility and transparency of their claims”, Nosek says. “It’s a very exciting time.”

Authors could also be required to conduct replications of their own key experiments, and include those replications with the initial submission, says Rand.

Richard Klein of the Université Grenoble Alpes in France says that journals, too, can take action. For example, they could enforce rules about data sharing, require minimum standards of statistical power, require greater transparency and provide incentives to foster that transparency.

Five ways to fix statistics

Spokespersons for Nature and Science say that both journals are trying to encourage authors to explain their methods as fully as possible, to aid evaluation and replication of the work reported. (Nature’s news team is editorially independent of its journal team.)

Klein thinks that a different research culture might also be needed. “The emphasis on novel, surprising findings is great in theory, but in practice it creates publication incentives that don’t match the incremental, careful way science usually works,” he says. He thinks that bottom-up approaches to improve reproducibility, such as informal exchanges of best practice between labs, rather than top-down directives, might prove the most effective solution. “The vast majority of scientists want to do good science and to publish things they think are true,” he says.