Someone pointed me to this article, “The more you play, the more aggressive you become: A long-term experimental study of cumulative violent video game effects on hostile expectations and aggressive behavior,” by Youssef Hasan, Laurent Bègue, Michael Scharkow, and Brad Bushman. My correspondent was suspicious of the error bars in Figure 1. I actually think the error bars in Figure 1 are fine—I’ll get to that later, as it’s not the main issue to be discussed here.

Long-term = 3 days??

The biggest problem I see with this paper is in the title: “A long-term experimental study.” What was “long term,” you might wonder? 5 years? 10 years? 20 years? Were violent video games even a “thing” 20 years ago?

Nope. By “long-term” here, the authors mean . . . 3 days.

In addition, the treatment is re-applied each day. So we’re talking about immediate, short-term effects.

I’ve heard of short-term thinking, but this is ridiculous! Especially given that the lag between the experimental manipulation and the outcome measure is, what, 5 minutes? The time lag isn’t stated in the published paper, so we just have to guess.

3 days, 5 minutes, whatever. Either way it’s not in any way “long term.” Unless you’re an amoeba.

Oddly enough, a correction notice has already been issued for this paper but this correction says nothing about the problem with the title; it’s all about experimental protocols and correlation measures.

According to Google Scholar, the paper’s been cited 100 times! It has a good title (also, following Zwaan’s Rule #12, it features a celebrity quote), and it’s published in a peer-reviewed journal. I guess that’s enough.

What happened in peer review?

Hey, wait! The paper was peer reviewed! How did the reviewers not catch the problem?

Two reasons:

1. You can keep submitting a paper to journal after journal until it gets accepted. Maybe this article was submitted initially to the Journal of Experimental Social Psychology and got published right away; maybe it was sent a few other places first, in which case reviewers at earlier journals might’ve caught these problems.

2. The problem with peer review is the peers, who often seem to have the same blind spots as the authors.

I’d love to know who were the peer reviewers who thought that 3 days is a long-term study.

Here’s is my favorite sentence of the paper. It comes near the end:

The present experiment is not without limitations.

Ya think?

More tomorrow on the systemic problems that let this happen.

The error bars in Figure 1

Finally, let me return to the fun little technical point that got us all started—assessing the error bars in Figure 1.

Here’s the graph, with point estimates +/- 1 standard error:

Here’s the question: Are these error bars too narrow? Should we be suspicious?

And here’s the answer:

The responses seem to be on a 0-7 scale; if they’re uniformly distributed you’d see a standard deviation of approximately 7*sqrt(1/12) = 2.0. The paper says N = 70; that’s 35 in each group so then you’d see a standard error of 2.0/sqrt(35) = 0.34 which is, hmmm, a bit bigger than we see in the figure. It’s hard to tell exactly. But, for example, if you look at Day 1 on the top graph those two entire error bars fit comfortably between 3 and 4. It looks like they come to approximately 0.6 combined, so that each individual s.e. is about 0.15.

So the error bars are about half as wide as you’d expect to see if responses were uniformly distributed between 1 and 7. But they’re probably not uniformly distributed! The outcome being studied is some sort of average of coded responses, so it’s completely plausible that the standard error is on the order of half what you’d get from a uniform distribution.

Thus, the error bars look ok. The surprising aspect of the graph is that the differences between groups are so large. But I guess that’s what happens when you do this particular intervention and measure these outcomes immediately after (or so it seems; the paper doesn’t say exactly when the measurements were taken).