In the last few years, psychologists have become increasingly aware of, and unsettled by, these problems. Some have created an informal movement to draw attention to the “reproducibility crisis” that threatens the credibility of their field. Others have argued that no such crisis exists, and accused critics of being second-stringers and bullies, and of favoring joyless grousing over important science. In the midst of this often acrimonious debate, Nosek has always been a level-headed figure, who gained the respect of both sides. As such, the results of the Reproducibility Project, published today in Science, have been hotly anticipated.

They make for grim reading. Although 97 percent of the 100 studies originally reported statistically significant results, just 36 percent of the replications did.

Does this mean that only a third of psychology results are “true”? Not quite. A result is typically said to be statistically significant if its p-value is less than 0.05—briefly, this means that if you did the study again, your odds of fluking your way to the same results (or better) would be less than 1 in 20. This creates a sharp cut-off at an arbitrary (some would say meaningless) threshold, in which an experiment that skirts over the 0.05 benchmark is somehow magically more “successful” than one that just fails to meet it.

So Nosek’s team looked beyond statistical significance. They also considered the effect sizes of the studies. These measure the strength of a phenomenon; if your experiment shows that red lights make people angry, the effect size tells you how much angrier they get. And again, the results were worrisome. On average, the effect sizes of the replications were half those of the originals.

“The success rate is lower than I would have thought,” says John Ioannidis from Stanford University, whose classic theoretical paper Why Most Published Research Findings are False has been a lightning rod for the reproducibility movement. “I feel bad to see that some of my predictions have been validated. I wish they’d been proven wrong.”

Nosek, a self-described “congenital optimist,” is less upset. The results aren’t great, but he takes them as a sign that psychologists are leading the way in tackling these problems. “It has been a fantastic experience, all this common energy around a very specific goal,” he says. “The collaborators all contributed their time to the project knowing that they wouldn’t get any credit for being 253rd author.”

Jason Mitchell from Harvard University, who has written critically about the replication movement, agrees. “The work is heroic,” he says. “The sheer number of people involved and the care with which it was carried out is just astonishing. This is an example of science working as it should in being very self-critical and questioning everything, especially its own assumptions, methods, and findings.”