I came across this 2012 post by John Bargh who does not seem to be happy about the failures of direct replications of his much-cited elderly-words-and-slow-walking study.

What strikes me about Bargh’s comments is how they illustrate the moving-target approach to much of science.

Here’s the quick story. In 1996, Bargh, Chen, and Burrows published a paper with the striking finding that students walked more slowly when they were primed with elderly-related words such as bingo and Florida. The result was statistically significant at the 5% level.

The Bargh et al. paper has been influential and has been cited hundreds of times. But recent attempted replications of the effect have failed, which leads many outsiders (including me) to suspect that the original finding was a classic garden-of-forking-paths power=.06 story of an opportunistic data analysis.

But here’s what Bargh wrote:

There are already at least two successful replications of that particular study by other, independent labs, published in a mainstream social psychology journal. . . . Both appeared in the Journal of Personality and Social Psychology, the top and most rigorously reviewed journal in the field. [JPSP also published Bem’s notorious ESP paper — ed.] Both articles found the effect but with moderation by a second factor: Hull et al. 2002 showed the effect mainly for individuals high in self consciousness, and Cesario et al. 2006 showed the effect mainly for individuals who like (versus dislike) the elderly. Hull, J., Slone, L., Metayer, K., & Matthews, A. (2002). The nonconsciousness of self-consciousness. Journal of Personality and Social Psychology, 83, 406-4254. Cesario, J., Plaks, J., & Higgins, E. T. (2006). Automatic social behavior as motivated preparation to interact. Journal of Personality and Social Psychology, 90, 893-910. Moreover, at least two television science programs have successfully replicated the elderly-walking-slow effect as well, (South) Korean national television, and Great Britain’s BBC1. The BBC field study is available on YouTube.

OK, I think we can just pass by the replication-by-TV-show argument in polite silence.

More interesting is the case of the so-called replications by Hull et al. and Cesario et al., which follow the now-familiar pattern of whack-a-mole or chase-the-grain-of-rice-around-the-plate.

A study is performed, a statistically significant correlation is found, and the results are published. Then in an attempted replication, the effect no longer appears—but there is a statistically significant interaction. Then an other attempted replication, another interaction.

From Bargh’s point of view, this must look like science at its best: each new study brings new insight. A mere replication would be boring—maybe useful in quieting the skeptics, but that’s about it. But a new interaction (a “moderator”): that’s exciting, new stuff. Two new studies, two new interactions.

From my perspective, though, this is all consistent with noise mining, with statistical significance arising from zero (or, more precisely, highly variable) effects plus chance variation. It’s the garden of forking paths: with so many potential interactions, there are so many ways to win, to get “p less than .05.”

Does this mean I think interactions should be set aside? No, not at all. I’ve been on record for years as saying that interactions are important. Much of my own most successful applied work has involved interactions.

But . . . how seriously does Bargh himself take interactions? He mentions there papers: his original article with no interactions, the second paper with interactions with self consciousness, and the third paper with interactions with attitudes toward the elderly.

That’s all fine, but in that case, why not look at all of these interactions in all of the studies. What looks suspicious to me is that the interactions are only looked at when they are statistically significant. But, again, seeing occasionally statistically significant interactions is exactly what we would expect, from chance alone, if nothing were going on.

Bargh concludes:

Research has now moved on from the demonstration and replication of priming effects on social judgment and behavior to research on the mechanisms underlying the effects and the moderators, constraints, and limitations of those effects.

Ummmm, no. Bargh’s research may have moved on, and that’s fine; it’s good to move on and study new things. But for many of the rest of us, no, these effects have not been demonstrated, and the failed replications make the whole thing look like the sort of mess that Paul Meehl wrote about, decades ago.

And, no, I don’t think replications on Youtube count for much.