An interesting article in Slate by Daniel Engber reviews the story of Daryl Bem and his psi “Feeling the Future” research. If you are interested in this sort of thing the entire article is worth a read, but I want to highlight and expand upon the important bits.

For review, in 2011 Bem published a series of 10 experiments in the Journal of Personality and Social Psychology (JPSP). I wrote about the research at the time, and wasn’t impressed. I wrote:

“In the final analysis, this new data from Bem is not convincing at all. It shows very small effects sizes, within the range of noise, and has not been replicated. Further, the statistical analysis used was biased in favor of finding significance, even for questionable data.”

Bem had taken standard social psychology experimental protocols, mostly dealing with priming, and did an interesting thing – he reversed the order of the experiment so that the priming came after the subjects were tested. For example, he would give subjects a memory test and then let some of them study the material. He claimed that the studying had an effect backward in time to allow subjects to perform slightly better.

For experienced skeptics, this was not much of a surprise. When dealing with claims that have a vanishingly small prior probability, you need extraordinary evidence to be taken seriously, and this wasn’t it. We were already very familiar with these kinds of results – if you squint just right there is a teeny tiny effect size. But we already knew that experiments are easy to fudge, even unwittingly, and it would therefore take a lot more to rewrite all the physics textbooks. (What is more likely, that the fundamental nature of reality is not what we thought, or Bem was a little sloppy in his research?) The key (as acknowledged by Bem himself) would be in replication.

The Reaction to Bem

In the six years since Bem published his research there has been an increasing awareness of the potential problems in conducting rigorous scientific research, and not just in psychology research but in all of medicine and other areas as well. I have been carefully documenting both here and at SBM all the research that shows these problems – the problems with publication bias, p-hacking, and the failure to replicate.

This is, in fact, the central thesis of science-based medicine. It is too easy to manufacture false positive results, and there is too much incentive to do so and to publish such results. We need to tweak our incentives and filters, and take a more thorough look at the entire literature before we can arrive at reliable scientific conclusions.

This dedication to rigorous science is more important than ever in medicine, because we are facing a dedicated and well-funded incursion from so-called “alternative medicine” who are tirelessly trying to make the standards of science less rigorous.

Engber, however, makes a claim I have not heard before, and I wonder how true it is – that Bem’s publication allegedly showing psi phenomena, was a wake-up call to the world of psychology and actually led to this increased awareness of the general problems with research. I had never made that connection before.

My sense is that there is some truth to this, but it is not the whole story. I do agree with Engber’s central claim, made in his headline: “Daryl Bem Proved ESP Is Real, Which means science is broken.” That is a more dramatic way of saying what I stated above – it is more likely that Bem’s science is broken than it is that the laws of the universe are radically different than what we think.

It is probably an oversimplification, however, to credit Bem’s publication with the work of Ioannidis, for example, who studies the published literature for patterns of bias. My perspective is a little different since my colleagues and I have been beating this drum since long before Bem.

Still, dramatic examples are useful. I have made this type of argument many times before, essentially reversing the direction of argumentation about evidence. Typically we use scientific evidence to determine if a claim is true. We can also, however, use as a premise that a claim which is close to impossible is not true, and then use that premise to ask questions about the scientific research.

I have done this also with homeopathy. Since we know that homeopathy cannot possibly work, and can look at the totality of homeopathy research and ask – what does the medical research look like when we study a hypothesis that is clearly not true? It looks exactly like what we expect it to look like, given the work of Ioannidis, Simons and others who have documented patterns of bias and p-hacking in the literature.

Replications

Engber’s article also contains some nuggets that need to be highlighted. To his credit Bem has been open and supportive of efforts to replicate his research. In fact he teamed up with other psi researchers to do the kind of replication that, if possitive, would provide the kind of evidence skeptics claim never exists for psi. They registered protocols for replications involving both believer and skeptical researchers. They thought that perhaps skeptical researchers get negative results not because they are biased but because they are psychic. They actually psychically inhibit any psychic powers of the subjects.

Engber reports:

To distinguish this replication from earlier attempts, Bem, Schlitz, and Delorme took extra steps to rule out any possibility of bias. They planned to run the same battery of tests at a dozen different laboratories, and to publish the design of the experiment and its planned analysis ahead of time, so there could be no quibbling over the “garden of forking paths.” They presented their results last summer, at the most recent annual meeting of the Parapsychological Association. According to their pre-registered analysis, there was no evidence at all for ESP, nor was there any correlation between the attitudes of the experimenters—whether they were believers or skeptics when it came to psi—and the outcomes of the study. In summary, their large-scale, multisite, pre-registered replication ended in a failure.

And there you have it. This is the exact pattern that I and other skeptics and advocates of SBM keep talking about. Initial positive results are not definitive because there are just too many ways to bias the outcome (to p-hack). We start to get interested when there are rigorous replications, especially exact replications.

What Bem and the others did is exactly what you have to do in order to avoid p-hacking – you have to make all of the decisions about the research protocol beforehand, prior to collecting any data. If you make any decisions or adjustments after you start collective data, that alters the probabilities. It essentially gives you more throws of the dice, and renders the p-value meaningless (that’s p-hacking).

So, when all of the protocols were registered and locked in place, the multi-site large attempt at replicating Bem’s data, including believers and skeptics, was completely negative. If an effect goes away when you control for p-hacking, then the effect is not real. That is how science works.

Alas, Bem and his fellow psi researchers did not give up:

In their conference abstract, though, Bem and his co-authors found a way to wring some droplets of confirmation from the data. After adding in a set of new statistical tests, ex post facto, they concluded that the evidence for ESP was indeed “highly significant.”

<Facepalm> So close. They did the rigorous protocol, they put their nickle down like good scientists, but then they flinched from the results. That is the moment that separates real scientists from pseudoscientists. That is the ultimate test – can you accept results which definitively disprove your pet hypothesis? When they controlled for p-hacking, no psi effect. So they went back and added in some p-hacking. Ugh!

Some More Revealing Tidbits

As revealing as that is, some quotes from Bem tell more of the story.

“I’m all for rigor,” he continued, “but I prefer other people do it. I see its importance—it’s fun for some people—but I don’t have the patience for it.” It’s been hard for him, he said, to move into a field where the data count for so much. “If you looked at all my past experiments, they were always rhetorical devices. I gathered data to show how my point would be made. I used data as a point of persuasion, and I never really worried about, ‘Will this replicate or will this not?’ ”

That is almost a “Trumpian” admission. He is actually admitting that his research was a “rhetorical device” intended to persuade and not to explore. He was trying to show “that” psi is real, not find out “if” it is real. That, I would argue, is the core failure of the pseudoscientist.

He doesn’t have the patience for rigor – OK, fair enough if you are going to admit it. But the real problem is, when other people do the rigor, as in the replication experiments above, he doesn’t accept the results.

Psi is Dead, Science is Recovering

In a rational world, this entire affair would spell the death of psi research. The hypothesis was never plausible. The strong suspicion that razor-thin positive outcomes were the result of some sloppy research has been largely confirmed.

The fact remains that no psi research protocol has withstood the test of time, and held up to rigorous replication. There is no psi effect that meets my criteria for being compelling: simultaneously showing a statistically significant effect that also shows a significant signal to noise ratio with highly rigorous experimental protocols that hold up to replication. You can get some of these features with psi, but never all of them.

The most parsimonious explanation for this fact is that psi is not real. A century of psi research is enough. This is a dead end. Psi is pushing up the daisies. It is an ex-hypothesis.

But, as the reaction of Bem and his colleagues demonstrate, the world is not rational.

Meanwhile, back at mainstream science, the Bem affair was a bit of a wake-up call, along with many other alarm bells. I would not say that science is broken. It is clear that there are major weaknesses in the institutions of science, but we largely know what they are and how to fix them. We need to keep attention on this issue and push the institutions of science to greater rigor and reliability.

It may be true that the very people who are trying to lower the standards in science will be the ones who provide the necessary impetus to increase standards. Science-based medicine was a direct reaction to alternative medicine. Perhaps the “replication crisis” is psychology research was largely a reaction to Bem.

What matters now is what we do going forward.