Lee Jussim February 23, 2016

Social psychology is in crisis because no one knows what to believe anymore. The journals are now filled with failed replication after failed replication. Published studies once believed to demonstrate all sorts of amazing world-changing pervasive effects have not been replicated by other researchers. And the issues go well beyond failed replications. Or, put differently, some of the most famous and most influential effects in social psychology have been called into question not only by failed replication after failed replication, but by revelations of questionable methodological, statistical, and interpretive practices.

What does this have to with Heterodox Academy? Isn’t this just methodological arcania, the equivalent of “inside baseball” for social psychologists? Not at all. Heterodox is about political diversity, but it is not only about political diversity. It is also about intellectual diversity. Intellectual diversity is crucial for solving difficult problems because it deflates the intellectual arrogance of those who think they know the answers, despite a lack of evidence. As Abraham Loeb put it, in an article in Nature:Physics on astronomy (not politics or the social sciences), “Uniformity of opinion is sterile; the co-existence of multiple ideas cultivates competition and progress.”

Difficult to replicate studies similarly deflate arrogance, or at least they should, and, therefore, should ultimately lead to a stronger and more sound science. However, deflating arrogance is one thing, declaring it all or mostly bunk is quite another. The first step, then, is to figure out how bad it actually is in social psychology.

But it does get worse before it gets better.

Part I: The (Ir?)replicability of Social Psychology

Some of the strongest evidence for the claim that “most social psych is false” comes from a single paper (Open Science Collaboration, 2015 published in Science) that examined research published in 2008 in several fields of psychology, including social psychology.

That paper was a multi-lab collaboration that attempted to replicate 52 studies published in two top social psych journals (Journal of Personality and Social Psychology and Psychological Science). What “counts” as a “successful replication” is itself not settled science. What counts as “evidence that the effect is real” is not settled science. So they used multiple measures. Depending on the criteria, they found that between 25 and 43% of studies replicated.

So far, this sounds like “Most social psych findings are false” is on pretty safe grounds. And it might be. But I do not think that general conclusion is justified by this large scale replication study.

Part II: OSC 2015 is a Great Study, But Let’s Not Overinterpret It

Here is the key thing that OSC did NOT do that renders the inference “most social psych findings are false” unjustified: They did not identify a population of social psych studies (say, since 1950 or 1970 or even 1990), randomly select ones, and then attempt to replicate them.

Instead, they first restricted replication attempts to 2008. Then they created subsamples of studies (e.g., the first 20 papers published in Psychological Science). They then allowed their replication teams to select the papers from which to attempt a replication. In general, by design, the last studies in multi-study reports were selected for replication attempts. Beyond that, however, from the report published in Science, it is impossible to know how the replication teams selected which paper to replicate. It is possible that, disproportionately, the teams selected papers reporting studies they thought were unlikely to replicate (there is no way to know short of surveying the over 100 co-authors of those replications, which I have not done). At minimum, this cannot be ruled out.

Regardless, absent bona fide random sampling of studies over a long time period, no general conclusion about the replicability of social psych can be reached on the basis of this paper. Hell, one cannot even reach clear conclusions about the replicability of social psych published in 2008 from this paper.

Of course, these limitations do not mean social psych is on safe grounds. They do not mean the study is definitively known to have provided results unrepresentative of social psychology. It certainly means lots of stuff is getting published that is difficult to replicate.

Part III: Replication in Social Psychology is Hard Even When the Effect is Known to be True

Jon Krosnick is a social psychologist/political scientist at Stanford who is also internationally recognized as one of the premier survey researchers in the social sciences. He once headed the American National Election Study, a nationally representative survey of political views that has been going on for decades, routinely appears in the NYTimes, and has received numerous awards for his work.

A few years ago, he collected survey data on almost 10,000 people. A series of well-known survey effects were identified as statistically significant in this large sample (e.g., order effects, acquiescence, etc.). Subsamples of about 500-1000 people were then examined to determine the frequency with which statistically significant subsamples would demonstrate the same effects.

Despite the fact that the phenomena under study was usually significant in the large sample, the subsamples found significant evidence of the effect only about half the time (analyses are still in progress and the exact number of replications for each phenomena is subject to change). Even if the 50% “replication” number is only ballpark pending final analyses, this speaks to the difficulties of replication, even with large samples, and even without any questionable research practices whatsoever.

That is, in some ways, good news. It means that, e.g., when smaller sample studies only replicate 30% or 40% of the time, it is not necessarily evidence of rampant problematic practices. It may simply be a testament to the large effects of sampling variability and minor changes in context (e.g., being conducted in a different state or country) or procedure. And there is more good news. At least with their large samples, Krosnick’s team’s preliminary results suggest that, whether they found significant evidence of the effect or not, about 80% of the studies were not significantly different from one another. Again, whether the final tally is 71% or 93% or 80%, that is a relatively high level of replication.

Why is this important? It shows how the vagaries of sampling variability can make detecting even a true effect quite difficult. It also means that, perhaps, we need to reconsider our understanding of how frequently a finding needs to replicate for it to be credible, and how we can ever distinguish a credible finding from an incredible one. Lots of scientists are working on just this issue and have developed whole new statistical tools for figuring out what is credible from what is not (p-curves, replication indices, statistical tests for identifying and controlling for publication biases, etc.) etc.). Most of those methods are, however, sufficiently new that it will probably be a while before we know which work best.

Part IV: The Replicability of Social Psychology

1.Some areas of social psychology are a mess, especially those involving “social priming” (see references for links to articles discussion the various priming crises and failures to replicate). I am not saying all are false, but, with some rare exceptions, I do not know which social priming effects are credible and which are not. Cognitive priming is not a mess. (If you do not know the difference, zip me a side email). There has long been excellent and easily replicable work on cognitive priming in cognitive psychology. After exposure to the word “black,” people more quickly recognize subsequent presentations of the word “black” (compared, e.g., to other words, such as “green” or “blasphemy”).

2. In my lab, over 30 years, I have replicated each of the following phenomena:

Stereotypes bias how people judge an individual when people lack much information (other than stereotype category membership) about that individual

People massively judge individuals based on their personal characteristics and hardly at all on stereotypes, if people have relevant information about that individual’s personal characteristics — e.g., their personality, accomplishments, behaviors, etc.

Moderate to high levels of accuracy in many demographic stereotypes

Pervasive inaccuracy in national stereotypes when evaluated against big five personality self-report criteria

Teacher expectations produce self-fulfilling prophecies in the classroom — but these effects tend to be weak, fragile, and fleeting (few other researchers would describe them this way, but when you look at the actual findings, this is pretty much what almost everyone has actually found).

Teacher expectations mostly predict student achievement because those expectations are accurate, not self-fulfilling.

Nonetheless, teacher expectations also bias their own evaluations of students to a modest degree.

Mortality salience increases anti-Semitism.

Self-consistency dominates cognitive reactions to performance feedback; self-enhancement dominates affective reactions to performance feedback

The fundamental attribution error

Self-serving biases

Politically motivated confirmation biases

I did not discover these phenomena. So my replications constitute independent evidence that the phenomena are real. However, none of these were direct replications (I did not attempt to follow original methodological procedures exactly). In modern parlance, all were conceptual replications (I tested the original hypothesis in a different way). Indeed, this distinction between direct and conceptual replication was itself not on my mind mine when I conducted those studies. Twenty-five years ago (or 15 or even 5) no one was talking about direct versus conceptual replications, and I just took for granted that other research had found a phenomena, and went about seeing if I could, too, usually in the service of some other research effort (e.g., Rosenthal & Jacobson, 1968 demonstrated experimentally-induced self-fulfilling prophecies; I wanted to see if expectations teachers developed on their own, without being misled by researchers, were also self-fulfilling – they were). Now, most of these are not the “hot flashy topics” of the last 20 years. No priming, no implicit prejudice, no power posing, no stereotype threat. Many, though not all, of these findings are accompanied by quite large effect sizes (which was one of the predictors of replication success in the OSC, 2015 paper).

That is just in my lab. Counting only stuff I know of from other folks, off the top of my head, that has been replicated in more than one independent lab:

Jon Haidt’s moral foundations

Similarity-attraction

Rightwing prejudice against leftwing groups and leftwing prejudice against rightwing groups People exaggerate the the views of their political opponents

Prejudice (disliking/liking a group) usually predicts all sorts of biases more strongly than do stereotypes (beliefs about groups.

Above chance accuracy in person perception based on thin slices of behavior.

Kahneman & Tversky-like heuristics

Ingroup biases

Self-serving self-evaluations of competence, morality, and health.

In person perception, people seek diagnostic information more than confirmatory information in just about every study that has ever given people the chance to seek diagnostic information.

As long as one is talking about technical results, rather than widespread overinterpretations of such results:

racial Implicit Association Test scores greater than zero widely replicate;

conservatives routinely score higher on common measures of rigidity and dogmatism than do liberals

race/ethnicity and class differences in academic achievement abound.

I am sure there are many more that I have not listed. Many findings are easy to replicate.

On the other hand, this is no random sample of topics either. It would not be justified to conclude from my personal experience or this off-the-top of head list that, in fact, social psych is just fine, thank you very much. And the problems go way beyond replication, but that is a missive for another day.

How will we figure out what, from the vast storehouse of nearly a century of social psychological research, is actually valid and believable? How can we distinguish dramatic, world-changing results that are just hype, terrific story-telling, researchers torturing the data till they confess, wishful thinking and, ultimately, snake oil, from dramatic world-changing results that we can really hang our hats on and go out and change the world with? No one really knows yet, and anyone who claims they do, without having subjected their claims to skeptical tests such as p curves, replication indices, and pre-registered replication attempts is just selling you repackaged snake oil.

To me, there is a single, crucial ingredient for figuring this out: Diversity of viewpoints and deep skepticism of one another’s claims. When answers are not settled science – and much of social psychology is currently unsettled – diversity and skepticism are essential tools for ferreting out truth from hype, signal from noise, real world-changing results from snake oil.

Groupthink and deference to prestigious scientific “authorities” and to repeated “scientific” stories resting on empirical feet of unclear firmness is the enemy. Heterodox is part of the solution by virtue of strongly supporting viewpoint diversity, which increases our chance of eventually figuring out what is actually true and what is not. Big doses of humility and uncertainty, at least with respect to our claims about social psychology, seem to be in order. In that spirit, we are probably best off eschewing extreme conclusions, including “most social psychology findings are false,” unless we know they have extremely strong foundations of scientific support.

Who knew that Mark Twain was a scientist? “It ain’t what you don’t know that gets you in trouble. It’s what you know for sure that just ain’t so.”

References

Jones, E. E., & Harris, V. A. (1967). The attribution of attitudes. Journal of Experimental Social Psychology, 3, 1-24.

Krosnick, J. A. Replication. Talk presented at the 2015 meeting of the Society for Personality and Social Psychology.

Loeb, A. (2014). Benefits of diversity. Nature: Physics, 10, 616-617.

Open Science Collaboration. (2015). Estimating the reproducibility of psychological science. Science, 349, aac4716. doi: 10.1126/science.aac4716

Rosenthal, R., & Jacobson, L. (1968a). Pygmalion in the classroom: Teacher expectations and student intellectual development. New York: Holt, Rinehart, and Winston.

Easy to Access On Line Resources on Problematic Priming and Other Difficult to Replicate Studies

Recent priming failures

Valid and invalid priming effects

An early failed priming replication

Unicorns of Social Psychology

Social Psychological Unicorns: Do Failed Replications Dispel Scientific Myths?

Is Power Posing Just Hype?