The Stanford Prison Experiment, one of the most famous and compelling psychological studies of all time, told us a tantalizingly simple story about human nature.

The study took paid participants and assigned them to be “inmates” or “guards” in a mock prison at Stanford University. Soon after the experiment began, the “guards” began mistreating the “prisoners,” implying evil is brought out by circumstance. The authors, in their conclusions, suggested innocent people, thrown into a situation where they have power over others, will begin to abuse that power. And people who are put into a situation where they are powerless will be driven to submission, even madness.

The Stanford Prison Experiment has been included in many, many introductory psychology textbooks and is often cited uncritically. It’s the subject of movies, documentaries, books, television shows, and congressional testimony.

But its findings were wrong. Very wrong. And not just due to its questionable ethics or lack of concrete data — but because of deceit.

A new exposé published by Medium based on previously unpublished recordings of Philip Zimbardo, the Stanford psychologist who ran the study, and interviews with his participants, offers convincing evidence that the guards in the experiment were coached to be cruel. It also shows that the experiment’s most memorable moment — of a prisoner descending into a screaming fit, proclaiming, “I’m burning up inside!” — was the result of the prisoner acting. “I took it as a kind of an improv exercise,” one of the guards told reporter Ben Blum. “I believed that I was doing what the researchers wanted me to do.”

The findings have long been subject to scrutiny — many think of them as more of a dramatic demonstration, a sort-of academic reality show, than a serious bit of science. But these new revelations incited an immediate response. “We must stop celebrating this work,” personality psychologist Simine Vazire tweeted, in response to the article. “It’s anti-scientific. Get it out of textbooks.” Many other psychologists have expressed similar sentiments.

(Update: Since this article published, the journal American Psychologist has published a thorough debunking of the Stanford Prison Experiment that goes beyond what Blum found in his piece. There’s even more evidence that the “guards” knew the results that Zimbardo wanted to produce, and were trained to meet his goals. It also provides evidence that the conclusions of the experiment were predetermined.)

Many of the classic show-stopping experiments in psychology have lately turned out to be wrong, fraudulent, or outdated. And in recent years, social scientists have begun to reckon with the truth that their old work needs a redo, the “replication crisis.” But there’s been a lag — in the popular consciousness and in how psychology is taught by teachers and textbooks. It’s time to catch up.

Many classic findings in psychology have been reevaluated recently

Getty Images

The Zimbardo prison experiment is not the only classic study that has been recently scrutinized, reevaluated, or outright exposed as a fraud. Recently, science journalist Gina Perry found that the infamous “Robbers Cave“ experiment in the 1950s — in which young boys at summer camp were essentially manipulated into joining warring factions — was a do-over from a failed previous version of an experiment, which the scientists never mentioned in an academic paper. That’s a glaring omission. It’s wrong to throw out data that refutes your hypothesis and only publicize data that supports it.

Perry has also revealed inconsistencies in another major early work in psychology: the Milgram electroshock test, in which participants were told by an authority figure to deliver seemingly lethal doses of electricity to an unseen hapless soul. Her investigations show some evidence of researchers going off the study script and possibly coercing participants to deliver the desired results. (Somewhat ironically, the new revelations about the prison experiment also show the power an authority figure — in this case Zimbardo himself and his “warden” — has in manipulating others to be cruel.)

Other studies have been reevaluated for more honest, methodological snafus. Recently, I wrote about the “marshmallow test,” a series of studies from the early ’90s that suggested the ability to delay gratification at a young age is correlated with success later in life. New research finds that if the original marshmallow test authors had a larger sample size, and greater research controls, their results would not have been the showstoppers they were in the ’90s. I can list so many more textbook psychology findings that have either not replicated, or are currently in the midst of a serious reevaluation.

Like:

Social priming: People who read “old”-sounding words (like “nursing home”) were more likely to walk slowly — showing how our brains can be subtly “primed” with thoughts and actions.

The facial feedback hypothesis: Merely activating muscles around the mouth caused people to become happier — demonstrating how our bodies tell our brains what emotions to feel.

Stereotype threat: Minorities and maligned social groups don’t perform as well on tests due to anxieties about becoming a stereotype themselves.

Ego depletion: The idea that willpower is a finite mental resource.

Alas, the past few years have brought about a reckoning for these ideas and social psychology as a whole.

Many psychological theories have been debunked or diminished in rigorous replication attempts. Psychologists are now realizing it's more likely that false positives will make it through to publication than inconclusive results. And they’ve realized that experimental methods commonly used just a few years ago aren’t rigorous enough. For instance, it used to be commonplace for scientists to publish experiments that sampled about 50 undergraduate students. Today, scientists realize this is a recipe for false positives, and strive for sample sizes in the hundreds and ideally from a more representative subject pool.

Nevertheless, in so many of these cases, scientists have moved on and corrected errors, and are still doing well-intentioned work to understand the heart of humanity. For instance, work on one of psychology’s oldest fixations — dehumanization, the ability to see another as less than human — continues with methodological rigor, helping us understand the modern-day maltreatment of Muslims and immigrants in America.

In some cases, time has shown that flawed original experiments offer worthwhile reexamination. The original Milgram experiment was flawed. But at least its study design — which brings in participants to administer shocks (not actually carried out) to punish others for failing at a memory test — is basically repeatable today with some ethical tweaks.

And it seems like Milgram’s conclusions may hold up: In a recent study, many people found demands from an authority figure to be a compelling reason to shock another. However, it’s possible, due to something known as the file-drawer effect, that failed replications of the Milgram experiment have not been published. Replication attempts at the Stanford prison study, on the other hand, have been a mess.

In science, too often, the first demonstration of an idea becomes the lasting one — in both pop culture and academia. But this isn’t how science is supposed to work at all!

Science is a frustrating, iterative process. When we communicate it, we need to get beyond the idea that a single, stunning study ought to last the test of time. Scientists know this as well, but their institutions have often discouraged them from replicating old work, instead of the pursuit of new and exciting, attention-grabbing studies. (Journalists are part of the problem too, imbuing small, insignificant studies with more importance and meaning than they’re due.)

Thankfully, there are researchers thinking very hard, and very earnestly, on trying to make psychology a more replicable, robust science. There’s even a whole Society for the Improvement of Psychological Science devoted to these issues.

Follow-up results tend to be less dramatic than original findings, but they are more useful in helping discover the truth. And it’s not that the Stanford Prison Experiment has no place in a classroom. It’s interesting as history. Psychologists like Zimbardo and Milgram were highly influenced by World War II. Their experiments were, in part, an attempt to figure out why ordinary people would fall for Nazism. That’s an important question, one that set the agenda for a huge amount of research in psychological science, and is still echoed in papers today.

Textbooks need to catch up

Psychology has changed tremendously over the past few years. Many studies used to teach the next generation of psychologists have been intensely scrutinized, and found to be in error. But troublingly, the textbooks have not been updated accordingly.

That’s the conclusion of a 2016 study in Current Psychology. “By and large,” the study explains (emphasis mine):

introductory textbooks have difficulty accurately portraying controversial topics with care or, in some cases, simply avoid covering them at all. ... readers of introductory textbooks may be unintentionally misinformed on these topics.

The study authors — from Texas A&M and Stetson universities — gathered a stack of 24 popular introductory psych textbooks and began looking for coverage of 12 contested ideas or myths in psychology.

The ideas — like stereotype threat, the Mozart effect, and whether there’s a “narcissism epidemic” among millennials — have not necessarily been disproven. Nevertheless, there are credible and noteworthy studies that cast doubt on them. The list of ideas also included some urban legends — like the one about the brain only using 10 percent of its potential at any given time, and a debunked story about how bystanders refused to help a woman named Kitty Genovese while she was being murdered.

The researchers then rated the texts on how they handled these contested ideas. The results found a troubling amount of “biased” coverage on many of the topic areas.

But why wouldn’t these textbooks include more doubt? Replication, after all, is a cornerstone of any science.

One idea is that textbooks, in the pursuit of covering a wide range of topics, aren’t meant to be authoritative on these individual controversies. But something else might be going on. The study authors suggest these textbook authors are trying to “oversell” psychology as a discipline, to get more undergraduates to study it full time. (I have to admit that it might have worked on me back when I was an undeclared undergraduate.)

There are some caveats to mention with the study: One is that the 12 topics the authors chose to scrutinize are completely arbitrary. “And many other potential issues were left out of our analysis,” they note. Also, the textbooks included were printed in the spring of 2012; it’s possible they have been updated since then.

Recently, I asked on Twitter how intro psychology professors deal with inconsistencies in their textbooks. Their answers were simple. Some say they decided to get rid of textbooks (which save students money) and focus on teaching individual articles. Others have another solution that’s just as simple: “You point out the wrong, outdated, and less-than-replicable sections,” Daniël Lakens, a professor at Eindhoven University of Technology in the Netherlands, said. He offered a useful example of one of the slides he uses in class.

Anecdotally, Illinois State University professor Joe Hilgard said he thinks his students appreciate “the ‘cutting-edge’ feeling from knowing something that the textbook didn’t.” (Also, who really, earnestly reads the textbook in an introductory college course?)

And it seems this type of teaching is catching on. A (not perfectly representative) recent survey of 262 psychology professors found more than half said replication issues impacted their teaching. On the other hand, 40 percent said they hadn’t. So whether students are exposed to the recent reckoning is all up to the teachers they have.

If it’s true that textbooks and teachers are still neglecting to cover replication issues, then I’d argue they are actually underselling the science. To teach the “replication crisis” is to teach students that science strives to be self-correcting. It would instill in them the value that science ought to be reproducible.

Understanding human behavior is a hard problem. Finding out the answers shouldn’t be easy. If anything, that should give students more motivation to become the generation of scientists who get it right.

“Textbooks may be missing an opportunity for myth busting,” the Current Psychology study’s authors write. That’s, ideally, what young scientist ought to learn: how to bust myths and find the truth.

Further reading: Psychology’s “replication crisis”