To be clear: I am in love with social psychology. I am writing here because I am still in love with social psychology. Yet, I am dismayed that so many of us are dismissing or justifying all those small (and not so small) signs that things are just not right, that things are not what they seem. “Carry-on, folks, nothing to see here,” is what some of us seem to be saying.

Our problems are not small and they will not be remedied by small fixes. Our problems are systemic and they are at the core of how we conduct our science. My eyes were first opened to this possibility when I read Simmons, Nelson, and Simonsohn’s paper during what seems like a different, more innocent time. This paper details how small, seemingly innocuous, and previously encouraged data-analysis decisions could allow for anything to be presented as statistically significant. That is, flexibility in data collection and analysis could make even impossible effects seem possible and significant.

What is worse, Andrew Gelman made clear that a researcher need not actively p-hack their data to reach erroneous conclusions. It turns out such biases in data analyses might not be conscious, that researchers might not even be aware of how their data-contingent decisions are warping the conclusions they reach. This is flat-out scary: Even honest researchers with the highest of integrity might be reaching erroneous conclusions at an alarming rate.

Third, is the problem of publication bias. As a field, we tend only to publish significant results. This could be because as authors we choose to focus on these; or, more likely, because reviewers, editors, and journals force us to focus on these and to ignore nulls. This creates the infamous file drawer that altogether warps the research landscape. Because it is unclear how large the file drawer is for any research literature, it is hard to determine how large or small any effect is, if it exists at all.

I think these three ideas—that data flexibility can lead to a raft of false positives, that this process might occur without researchers themselves being aware, and the unknown size of the file drawer—explains why so many of our cherished results can’t replicate. These three ideas suggest we might have been fooling ourselves into thinking we were chasing things that are real and robust, when we were pursuing neither.

As someone who has been doing research for nearly twenty years, I now can’t help but wonder if the topics I chose to study are in fact real and robust. Have I been chasing puffs of smoke for all these years?

I have spent nearly a decade working on the concept of ego depletion, including work that is critical of the model used to explain the phenomenon. I have been rewarded for this work, and I am convinced that the main reason I get any invitations to speak at colloquia and brown-bags these days is because of this work. The problem is that ego depletion might not even be a thing. By now, many people are aware that a massive replication attempt of the basic ego depletion effect involving over 2,000 participants found nothing, nada, zip. Only three of the 24 participating labs found a significant effect, but even then, one of these found a significant result in the wrong direction!

There is a lot more to this registered replication than the main headline, and there is still so much evidence indicating fatigue is a real phenomenon. I promise to get to these thoughts in a later post, once the paper is finally published. But for now, we are left with a sobering question: If a large sample pre-registered study found absolutely nothing, how has the ego depletion effect been replicated and extended hundreds and hundreds of times? More sobering still: What other phenomena, which we now consider obviously real and true, will be revealed to be just as fragile?

As I said, I’m in a dark place. I feel like the ground is moving from underneath me and I no longer know what is real and what is not.

I edited an entire book on stereotype threat, I have signed my name to an amicus brief to the Supreme Court of the United States citing stereotype threat, yet now I am not as certain as I once was about the robustness of the effect. I feel like a traitor for having just written that; like, I’ve disrespected my parents, a no no according to Commandment number 5. But, a meta-analysis published just last year suggests that stereotype threat, at least for some populations and under some conditions, might not be so robust after all. P-curving some of the original papers is also not comforting. Now, stereotype threat is a politically charged topic and there is a lot of evidence supporting it. That said, I think a lot more pains-taking work needs to be done on basic replications, and until then, I would be lying if I said that doubts have not crept in. Rumor has it that a RRR of stereotype threat is in the works.

To be fair, this is not social psychology’s problem alone. Many other allied areas in psychology might be similarly fraught and I look forward to these other areas scrutinizing their own work—areas like developmental, clinical, industrial/organizational, consumer behavior, organizational behavior, and so on, need an RPP project or Many Labs of their own. Other areas of science face similar problems too.

During my dark moments, I feel like social psychology needs a redo, a fresh start. Where to begin, though? What am I mostly certain about and where can my skepticism end? I feel like there are legitimate things we have learned, but how do we separate wheat from chaff? Do we need to go back and meticulously replicate everything in the past? Or do we use those bias tests Joe Hilgard is so sick and tired of to point us in the right direction? What should I stop teaching to my undergraduates? I don’t have answers to any of these questions.

This blogpost is not going to end on a sunny note. Our problems are real and they run deep. Okay, I do have some hope: I legitimately think our problems are solvable. I think the calls for more statistical power, greater transparency surrounding null results, and more confirmatory studies can save us. What is not helping is the lack of acknowledgement about the severity of our problems. What is not helping is a reluctance to dig into our past and ask what needs revisiting.

Time is nigh to reckon with our past. Our future just might depend on it.

--------------------------------

*In case you haven’t heard, Alison started a wonderful Facebook discussion group that I have the privilege of co-moderating. If you’re tired of bickering and incivility, but still want a place to discuss ideas, PsychMAP just might be for for you.