The Stigma of Failure Slows Scientific Progress

Negative results in science are just as essential as positive ones

By Simon Oxenham

Psychology is in a state of flux. Ever since the prominent Dutch psychologist Diederik Stapel admitted in 2011 to fabricating and manipulating the data in his work, the field’s research methods have been in the spotlight. The debate has focused on whether replication is a way to confirm accuracy. In theory, if a study was done properly, a research group should be able to reliably repeat it and get the same results. This view has long been a cornerstone of the scientific method. But is it realistic? If the answer is no, what does that mean for the field of psychology?

Last August, Nature published a three-year effort by 270 researchers who attempted to replicate 100 recent psychology studies­­. They managed to replicate only 40 percent of them. That result was called into question in March when Dan Gilbert, a Harvard University psychologist, and colleagues published a comment in Science that argued the replication efforts were flawed. The debate spawned a messy row in which Gilbert called the replicators “shameless little bullies.” It is fair to say things got personal.

The scientists who conducted the replications have hit back, arguing that their critics have mischaracterized the replications. While Gilbert’s side alleges key differences between the original experiments and the failed replications, the replicators argue that the differences are both incidental and irrelevant. They also note that the authors of the original experiments approved the changes before work on the replications began.

If you’re starting to get a little lost, don’t worry. It’s not just you. The editor of The Psychologist, Jon Sutton, summed up the argument well when he tweeted: “I guess it’s possible the paper that says the paper that says psychology is a bit shit is a bit shit is a bit shit.”

The replication crisis in psychology is not confined to relatively abstract claims; some of the most widely repeated psychological claims of recent years have now been shown to be unreplicable. And the replication crisis is by no means limited to psychology. A recent study has suggested $28 billion is spent every year on preclinical research that can’t be replicated.

For years, oxytocin was hailed as the “love hormone” after a string of experiments showed what appeared to be astounding effects produced by just a whiff. Oxytocin was found to make people more trusting in a money-sharing game by Claremont Graduate University economics professor Paul Zak and colleagues, and with their secrets by psychologists Moïra Mikolajczaka, Anthony Lane, and colleagues. Paul Zak’s TED talk on the “moral molecule” was watched more than 1.4 million times. These were just a few of the positive oxytocin findings that appeared over recent years. However, last year a statistical analysis of oxytocin studies concluded that most of the oxytocin findings had been false positives.

Normally, such a claim based on statistical digging by third-party researchers might offend the original researchers, but not in this case. Anthony Lane’s research group has, in fact, just published a paper stating that they no longer have faith in their original results. Their faith “slowly faded away over the years and the studies have turned us from ‘believers’ into ‘skeptics’” — culminating in 2015 when the group failed to replicate their own original experiment.

In a rare and bold step, the researchers are now publishing their file drawer of negative findings, which found oxytocin had no effect on behavior. Out of 25 experiments they ran, 24 produced null findings. However, they said their negative results “were rejected time and time again” by journals only interested in sexy positive findings. Six positive results that the researchers produced now appear to be invalid after the researchers applied statistical corrections for simultaneously testing multiple hypotheses.

But when oxytocin researcher Moïra Mikolajczak tried to discuss her failure to replicate her original findings with several high-profile figures in the field, they clammed up. None would share with her even the amount of unpublished negative results found in their own labs, according to an anecdote retold on Gideon Nave’s blog. Nave then discusses his own frustrations with Mikolajczak’s revelation:

“The conversation with Moïra made me angry. Apart from socializing and drinking, the point of going to conferences is having researchers talk to each other about their work. True, in some cases scientists might have fears of getting scooped by others. But this is not the case for null results. Sharing your failures with other researchers is especially important: it can save them the time, money, and frustration of chasing non-existing effects.”

Publication bias is a double-edged sword: Not only are researchers rarely rewarded for publishing negative results that knock down old conclusions, but when researchers muster the courage to come forward with negative findings, journals are unwilling to publish them. This dynamic creates an unholy combination of perverse incentives that encourage researchers to cherry pick results that support their hypotheses. Worse still, even when negative results are published, they often lie ignored. One study found that original scientific findings were cited 17 times more than their rebuttals.

One of Amy Cuddy’s “power poses” which she said might increase a person’s chance of success

Another recent failed replication challenges “power posing” — the idea that positioning your body in a powerful stance boosts testosterone and changes behavior, which was shared far and wide in the second-most-popular TED talk of all time with 25 million views. The idea was recently debunked by a larger and better controlled replication that failed to produce the effect.

The fact that many recent psychology experiments can’t be replicated might not come as a surprise, but even die-hard skeptics were shocked when news broke recently that a mass replication effort drew a blank on an effect called “ego depletion” that was supported by an enormous body of literature. The intuitive theory first conceived by psychologist Roy Baumeister states that we have a limited supply of willpower that can be used up. The idea has come to be highly influential and has resulted in an entire body of literature consisting of hundreds of experiments supporting it. Twenty-three different labs all replicated one of those supporting experiments. After testing 2,141 participants the research groups turned up nothing.

Michael Inzlicht, one of the authors of the replication, followed up the paper with a blog post outlining how the replication process is painful but not personal: “Science is brutal. It doesn’t care about you or me or anybody. Science is a killer, laying waste to our pet theories and dispatching our grandest ideas. Sometimes I hate science — that fucker doesn’t respect all my hard work!” Inzlicht’s post followed a formal response to the replication from Roy Baumeister, the author of the first ego-depletion study, a paper cited more than 3,000 times. Baumeister’s published response — in which he calls for continued “business as usual” — was unusually candid for a published article in an academic journal.

Baumeister’s remarks have shocked many in the research community, which has broadly rallied behind the movement for replications and larger sample sizes. Baumeister laments for a time when science was easier: “When I was in graduate school in the 1970s, n=10 was the norm, and people who went to n=20 were suspected of relying on flimsy effects.” In English that means Baumeister’s experiments in the 70s (which were fairly typical of social psychology experiments at the time) involved only 10 people! Yes, you read that right — 10!

Baumeister fears that “the new fetish for large samples will constrain the discovery process.” According to Baumeister, the movement toward large, rigorous experiments is killing off more interesting experiments, leading to research that is “boring the intellectual community.” Baumeister longs for the days when Freudian psychoanalytic theory held “influence on thinkers in other fields,” sidestepping the fact that Freudian psychoanalytic theory is now taken about as seriously as tarot card reading and dousing.

Baumeister goes on to make the argument that the replicators are “angry young men” without the “intuitive flair” to design experiments that produce massive effects out of tiny sample sizes, an attitude dubbed by pseudonymous neuroscientist and blogger Neuroskeptic as “Harry Potter Science.”

This is 2016, and if flagship journals don’t accept negative results, that doesn’t mean those results have to go in the bin — it isn’t exactly hard to put a manuscript on the internet.

It is ironic that the debate has come to center around ego depletion because, clearly, a large part of this debate concerns researchers’ own deflated egos. This is 2016, and if flagship journals don’t accept negative results, that doesn’t mean those results have to go in the bin — it isn’t exactly hard to put a manuscript on the internet.

So why, then, is it still so rare for researchers to publish negative results? There are no doubt systemic problems in academic publishing, but this is only half the story. And I don’t have to play devil’s advocate to give you an accurate depiction of what might be going on in some researchers’ heads.

Two years ago, psychology professor Jason Mitchell stunned the academic community by writing a (since deleted) essay on his Harvard website, arguing that “unsuccessful experiments have no meaningful scientific value.” To Mitchell, “the likeliest explanation for any failed replication will always be that the replicator bungled something along the way.” Neuroskeptic summed up the failure in this logic well, writing in Discovery: “Imagine the paradox this would create if two scientists were to hold different hypotheses, and thus different criteria for ‘positive’ and ‘negative’!” This is precisely the point where this entire argument comes crashing down.

The notion that only positive findings are useful is an “Alice in Wonderland” view of reality (my positive finding is your negative finding and vice versa). The fact that a Harvard professor could get this so wrong demonstrates just how blind we can be to the value of negative results. Verifying findings is at the heart of good science. As Karl Popper famously observed, seeing millions of white swans doesn’t prove all swans are white, but just one black swan is all it takes to show us that all swans are not white. Sometimes one black swan can tell you far more than a dozen white swans ever could.

While all good scientists respect the principle of falsifiability, it doesn’t make it any less painful when a result a scientist has staked their career on draws a negative. For example, when Yale psychologist John Bargh’s classic priming experiment could not be replicated, he blasted off a fierce attack on the researchers responsible for the replication, titled “nothing in their heads.” He later deleted his outburst amid equally fierce criticism.

We are fast moving to a world of better research practices, such as pre-registration of clinical trials and uploading of research materials to online repositories, but none of this will matter if scientists can’t overcome their egos. While ego isn’t something that will ever go away, the irrational idea that good scientists only ever find results that support their original hypotheses is something we can change. A good first step might be to stop calling these discoveries “failures.” Negative results are good. Negative results are healthy. Negative results are the very cornerstone of scientific process. It’s time we began to treat them as such.

Simon Oxenham is a science journalist based in the U.K. He is known for exposing and debunking misinformation and keeping a watchful eye on controversial findings within the fields of psychology and neuroscience. Simon has written and blogged for The Psychologist, Nature, Scientific American, The Guardian, and Big Think, among others, and has a weekly column at New Scientist. Follow him on Twitter and Facebook, or join his mailing list.