Last summer, the field of psychology had a moment—possibly one of the most influential events in science last year. On August 27, 2015, a group called the Open Science Collaboration published the results of its Reproducibility Project, a three-year effort to re-do 100 psychology studies. Replication is, of course, one of the fundamental tenets of good science. The group wanted to see how many of the original effects they could replicate. The result: It only worked about 40 percent of the time.

That did not go over well. But now the psychology establishment is fighting back. Along with some colleagues, Dan Gilbert, a psychologist at Harvard University, has re-analyzed the paper about re-analyzing papers, and they say that it's wrong. And in fact, the public's conclusions about the paper—that psychology is in crisis—are even wronger. More wrong. “We’re arguing with virtually every journalist we know that wrote some version of ‘psychology’s in deep trouble,’” Gilbert says. The comment on the Reproducibility Project appears in Science today, an attempt to reinterpret the data and highlight what the researchers see as flaws. Their conclusion: Reproducibility in psychology is doing great.

“When we read the original article, we were shocked and chagrined,” Gilbert says. “What bad news for science!” Brian Nosek, a University of Virginia psychologist and a leader of the project, says that the group wanted to present an estimate of reproducibility, not to declare a replication crisis. But the media sure did. The study “confirmed the worst fears of scientists who have long worried that the field needed a strong correction,” wrote the New York Times.

That's what Gilbert's team is pushing back against. Nosek and colleagues followed with a response, as you'd expect. And their back-and-forth seems primed to pitch psychology into a second, more powerful reckoning.

First, a brief look at those papers. (If you have a friend with access to Science, read them yourself; they’re short and sweet as these things go, about three pages total). The original study had some serious problems, says the comment: It only looked at 100 studies, which limited its statistical power. Gilbert and colleagues also argue that the paper overestimated the rate of replication failure because the re-done studies weren’t faithful re-dos. In fact, sometimes they differed dramatically, like studying attitudes toward African Americans in Italians (in the replication) instead of Americans (in the original).

Finally, and maybe most significantly, the defenders of psychology suggest that bias could have colored the way that the Open Science Collaboration constructed its study. In particular, they point out that the studies whose original authors didn’t endorse the methodologies of the attempted replications did much more poorly—a replication rate of 15.4 percent—than those whose methodologies got the okay (59.7 percent).

Nosek and his colleagues respond to those criticisms in turn, chipping away at some of the statistical analysis in the comment. To the endorsement question, they countered that a scientist could decline to thumbs-up a replication for many reasons—not just, as implied, low confidence in the quality of the replication methodology. An original researcher could just as easily decline because they weren't confident in their *own *original results.

Are you not entertained? Wait—you're not? It's true that the back-and-forth doesn't really matter. What this is really about is how psychology sees itself—and how that vision could affect what scientists think of of the Reproducibility Project, positive or negative. “There is a community of researchers who think that there is just no problem whatsoever and a community of researchers who believe that the field is seriously in crisis,” says Jonathan Schooler, a psychologist at UC Santa Barbara. “There is some antagonism between those two communities, and both sides each have a perspective that may color the way they’re seeing things.”

Nosek feels it, too. "You think it's slightly antagonistic?" he says.

At its heart, both sides were driven to write these papers because they frickin’ love psychology. “What I want to observe is high reproducibility,” says Nosek. “That is better for us, the findings, and the field.” But that love is also what drove him to found the Center for Open Science—he saw things going wrong in his field and wanted to help fix them. Noble, but it may have driven the design and interpretation of the 100 replications in a way that would underestimate replication rates.

Gilbert and his coauthors, on the other hand, love psychology the other way. They’re reacting not to a paper in *Science *per se, but to a public that seems willing to condemn their profession. “Everybody takes this article to say that thousands, millions of people in this field of science are doing bad work,” says Gilbert. In 2014, he called replicators “shameless bullies” in an attempt to protect a researcher whose work was attacked after a replication attempt didn’t confirm her results.

Emotions are running high. Two groups of very smart people are looking at the exact same data and coming to wildly different conclusions. Science hates that. This is how beleaguered Gilbert feels: When I asked if he thought his defensiveness might have colored his interpretation of this data, he hung up on me.

One of the most potent complaints Gilbert's group levies against the Reproducibility folks is over their project’s inability to faithfully replicate studies. “Most people assume that when you say the word replication, you’re talking about a study that differed in only minor, uncontrollably minor details,” Gilbert says. That wasn’t the case in many of the Project’s replications, which depended on a small budget and volunteered time. Some studies were so difficult or expensive to replicate that they just … didn’t get replicated at all, including one of Gilbert's.

That has reduced this fight to one over statistics, which obscures the bigger question: Why were those studies so difficult to do over in the first place? Psychology experiments have to deal with humans—stupid, finicky humans who might act differently depending on the time of day, whether or not they’ve eaten, if they had a cigarette that day or enough sleep the night before. “These are not extraneous factors,” says Lisa Feldman Barrett, a psychologist at Northeastern University. “These are important factors that impact the measurement of the outcome variables.”

When a redone study fails, it might be because it doesn’t match all those tiny differences. Barrett calls all those unnoticed, unmodeled variables “underspecifications,” and they’re a major barrier to replication. But if that's true—if you believe that's a palpable hit on the replicability effort—then you have to extend that problem to the generalizability of all results in psychology. “I don’t care that people in your lab, when the temperature is exactly 69 degrees, and it is high noon on the fifth Wednesday of the month, that such and such will happen,” says Joachim Vandekerckhove, a cognitive scientist at UC Irvine. “That is not interesting.” What is interesting is distilling the essence of human behavior down to first principles.

“That’s what’s at issue here,” says Barrett. “How to we develop a generalizable science?” To do that, the field might have to change its understanding of what it means to find a meaningful effect. That could involve coping with the challenges of assembling larger sample sizes, and controlling for many more factors than it could possibly think are necessary. The field may have to think differently about how it thinks about itself.

Is psychology in the middle of a replication crisis? “No,” says Barrett. “But it’s in a crisis of philosophy of science.”