Ben Goldacre, The Guardian, Saturday 13 August 2011

While the authorities are distracted by mass disorder, we can do some statistics. You’ll have seen plenty of news stories telling you that one part of the brain is bigger, or smaller, in people with a particular mental health problem, or even a specific job. These are generally based on real, published scientific research. But how reliable are the studies?

One way of critiquing a piece of research is to read the academic paper itself, in detail, looking for flaws. But that might not be enough, if some sources of bias might exist outside the paper, in the wider system of science.

By now you’ll be familiar with publication bias: the phenomenon where studies with boring negative results are less likely to get written up, and less likely to get published. Normally you can estimate this using a tool like, say, a funnel plot. The principle behind these is simple: big expensive landmark studies are harder to brush under the carpet, but small studies can disappear more easily. So you split your studies into “big ones”, and “small ones”: if the small studies, averaged out together, give a more positive result than the big studies, then maybe some small negative studies have gone missing in action.

Sadly this doesn’t work brain scan studies, because there’s not enough variation in size. So Professor John Ioannidis, a godlike figure in the field of “research about research”, took a different approach. He collected a large representative sample of these anatomical studies, counted up how many positive results they got, and how positive those results were, and then compared this to how many similarly positive results you could plausibly have expected to detect, simply from the sizes of the studies.

This can be derived from something called the “power calculation”. Everyone knows that bigger is better when collecting data for a piece of research: the more you have, the greater your ability to detect a modest effect. What people often miss is that the size of sample needed also changes with the size of the effect you’re trying to detect: detecting a true 0.2% difference in the size of the hippocampus between two groups, say, would need more subjects than a study aiming to detect a huge 25% difference.

By working backwards and sideways from these kinds of calculations, Ioannidis was able to to determine, from the sizes of effects measured, and from the numbers of people scanned, how many positive findings could plausibly have been expected, and compare that to how many were actually reported. The answer was stark: even being generous, there were twice as many positive findings as you could realistically have expected from the amount of data reported on.

What could explain this? Inadequate blinding is an issue: a fair amount of judgement goes into measuring the size of a brain area on a scan, so wishful nudges can creep in. And boring old publication bias is another: maybe whole negative papers aren’t getting published.

But a final, more interesting explanation is also possible. In these kinds of studies, it’s possible that many brain areas are measured, to see if they’re bigger or smaller, and maybe, then, only the positive findings get reported, within each study.

There is one final line of evidence to support this. In studies of depression, for example, 31 studies report data on the hippocampus, 6 on the putamen, and 7 on the prefrontal cortex. Maybe, perhaps, more investigators really did focus solely on the hippocampus. But given how easy it is to measure the size of another area – once you’ve recruited and scanned your participants – it’s also possible that people are measuring these other areas, finding no change, and not bothering to report that negative result in their paper, alongside the positive ones they’ve found.

There’s only one way to prevent this: researchers would have to publicly pre-register what areas they plan to measure, before they begin, and report all findings. In the absence of that process, the entire field might be distorted, by a form of exaggeration that is – we trust – honest and unconscious, but more interestingly, collective and disseminated.