The story starts in September, when psychology professor Fred Oswald wrote me:

I [Oswald] wanted to point out this paper in Science (Ramirez & Beilock, 2010) examining how students’ emotional writing improves their test performance in high-pressure situations. Although replication is viewed as the hallmark of research, this paper replicates implausibly large d-values and correlations across studies, leading me to be more suspicious of the findings (not less, as is generally the case).



He also pointed me to this paper:

Experimental disclosure and its moderators: A meta-analysis. Frattaroli, Joanne Psychological Bulletin, Vol 132(6), Nov 2006, 823-865. Disclosing information, thoughts, and feelings about personal and meaningful topics (experimental disclosure) is purported to have various health and psychological consequences (e.g., J. W. Pennebaker, 1993). Although the results of 2 small meta-analyses (P. G. Frisina, J. C. Borod, & S. J. Lepore, 2004; J. M. Smyth, 1998) suggest that experimental disclosure has a positive and significant effect, both used a fixed effects approach, limiting generalizability. Also, a plethora of studies on experimental disclosure have been completed that were not included in the previous analyses. One hundred forty-six randomized studies of experimental disclosure were collected and included in the present meta-analysis. Results of random effects analyses indicate that experimental disclosure is effective, with a positive and significant average r-effect size of .075. In addition, a number of moderators were identified.

At the time, Oswald went a message to the authors of the study, Sian Beilock and Gerardo Ramirez:

I read your Science article yesterday and was wondering whether you were willing to share your data for the sole purpose of reanalysis, per APA guidelines. Thank you for considering this request – your research findings that triangulate across multiple studies are quite compelling and I wanted to examine further out of personal curiosity about this phenomenon.

Beilock replied that they would be happy to share the data once they had time to put it together in an easy package to send him, and they asked what Oswald wanted to do with the data. Oswald replied:

I’m a bit wowed by such large effects found in these studies. I was just wanting to take a look at the score distributions for the d-values and correlations for each study — literally a reanalysis of the data, nothing new. Also, if you had brief advice on getting these anxiety/pressure manipulations to work, that would help one of our graduate students . . . who is implementing this type of manipulation in her dissertation on complex task performance . . . There is no real hurry on my end, but I do appreciate both your integrity and future efforts in relaying the data to me when you can.

It’s now early December. I was cleaning out my inbox, found this message from September, and emailed Oswald to ask if it was ok with him for me to blog this story. Oswald replied with a yes and with some new information:

Just last Thursday (some 2 months later), I [Oswald] sent Sian Beilock a follow-up to ask for an ETA on the data. This time, Sian replied the next day, telling me that she checked with her IRB that morning, and they encouraged her not to share the raw data [emphasis added]; she then asked what analyses I would like. So…maybe the analyses will come through.

I can’t say this surprises me—the whole IRB thing seems to be set up to discourage openness in data—and let me emphasize that I’m not blaming Beilock or Ramirez. For one thing, IRB’s really are difficult, and they usually do have some arguments on their side. After all, they’re studying people here, not chickens. Also, cleaning data takes work. Just the other day, I responded to a request for data and code from an old study of mine by saying no, it would be too much trouble to put it all in one place.

The real issue here, then, is not a criticism of these researchers, but rather:

1. Suspiciously large estimated effect sizes. This could be coming from the statistical significance filter or maybe some unstated conditions in the experiment.

2. Evil IRBs.

3. The difficulty of getting data to replicate a published analysis. We’ve been hearing a lot about this lately from Jelte Wicherts and others.

What’s really frustrating about this is that Oswald is (presumably) doing a selfless act here. It’s unlikely that you’ll get fame, fortune, or grant funding by replicating or even shooting down a published finding. It’s gotta be frustrating to need to jump through all sorts of hoops just to check some data. The study was published, after all.

P.S. One more time: I’m talking about general problems here, not trying to pick on Beilock and Ramirez. I myself have published a paper based on data we’re not allowed to release.