For years, the drug company GlaxoSmithKline illegally marketed paroxetine, sold under the brand name Paxil, as an antidepressant for children and teenagers. It did so by citing what's known as Study 329 — research that was funded by the drug company and published in 2001, claiming to show that Paxil is "well tolerated and effective" for kids.



That marketing effort worked. In 2002 alone, doctors wrote 2 million Paxil prescriptions for children and adolescents.



Years later, after researchers reanalyzed the raw data behind Study 329, it became clear that the study's original conclusions were wildly wrong. Not only is Paxil ineffective, working no better than placebo, but it can actually have serious side effects, including self-injury and suicide.



So how did the researchers behind the trial manage to dupe doctors and the public for so long? In part, the study was a notorious example of what's called "outcome switching" in medical research.



Before researchers start clinical trials, they're supposed to pre-specify which health outcomes they're most interested in. For an antidepressant, these might include people's self-reports on their mood, how the drug affects sleep, sexual desire, and even suicidal thoughts.



The idea is that researchers won't just publish positive or more favorable outcomes that turn up during the study, while ignoring or hiding important results that don't quite turn out as they were hoping.



But that doesn't always happen. "In Study 329," explains Ben Goldacre, a crusading British physician and author, "none of the pre-specified analyses yielded a positive result for GSK’s drug, but a few of the additional outcomes that were measured did, and those were reported in the academic paper on the trial, while the pre-specified outcomes were dropped."



These days, it's easy to see whether researchers are engaged in outcome switching because we now have public clinical trials registries where they're supposed to report their pre-specified outcomes before a trial begins. In theory, when journals are considering a study manuscript, they should check to see whether the authors were actually reporting on those pre-specified outcomes. But even still, says Goldacre, this isn't always happening.



So with his new endeavor the Compare Project, Goldacre and a team of medical students are trying to address the problem. They compare each new published clinical trial in the top medical journals with the trial's registry entry.



When they detect outcome switching, they write a letter to the academic journal pointing out the discrepancy, and then they track how journals respond. I spoke to Goldacre to learn more.

"When we get the wrong answer, in medicine, that’s not a matter of academic sophistry — it causes avoidable suffering"

Julia Belluz: Why does outcome switching matter?



Ben Goldacre: This is an interesting example of a nerdy problem whose importance requires a few pages of background knowledge, and that’s probably why it’s been left unfixed for so long. But in short: Switching your outcomes breaks the assumptions in your statistical tests. It allows the "noise" or "random error" in your data to exaggerate your results (or even yield an outright false positive, showing a treatment to be superior when in reality it’s not).



We do trials specifically to detect very modest differences between one treatment and another. You don’t need to do a randomized trial on whether a parachute will save your life when you jump out of an airplane, because the difference in survival is so dramatic. But you do need a trial to spot the tiny difference between one medical intervention and another.



When we get the wrong answer, in medicine, that’s not a matter of academic sophistry — it causes avoidable suffering, bereavement, and death. So it’s worth being as close to perfect as we can possibly be.

JB: Don't people check to make sure the outcomes that researchers registered are the ones they reported in journals?



BG: We're finding that when you publish your trial in an academic journal, nobody checks that — even though all the journals claim that they do check. But the reality is when you go and check you find something different.

JB: I took a look at the dashboard on your site (shown below), and you've found many more "switched outcomes" than perfect trials. That seems pretty bad.

BG: We've been checking every single trial without exception published in top journals since we launched (in November).



We’re not just [tallying] the prevalence of bad reporting in clinical trials. We’re doing something that's a little more risky for us, depending on how confident you feel the academic community is going to have a serious public conversation about this stuff.



We’re holding individual journals and trials to account. I think it’s amazing that we have five medical students working on this, giving up their time every week to grade all these trials. They’re a real inspiration, because they are fearless and they are meticulous, and I hope fearless and meticulous medical students can help us fix the problem.

JB: Why do you think researchers do this?

BG: To be clear, I don’t think every trialist and journal that we’ve caught switching its outcomes is doing so deliberately, to rig the results. Often it’s clumsiness, or a failure to take the issue sufficiently seriously. But this sloppy reporting gives cover to people who are deliberately cherry-picking their results and rigging their findings. That’s why it’s so important to hold the line and strive to report trials correctly, or at least explain why you’ve switched from your original plans.



Medicine progresses in tiny forward shifts that all add up to a spectacular drop in your chances of dying in middle age. That’s the nature of modern medicine, what we might call "radical incrementalism," and we do well-designed trials specifically to detect those modest benefits. That’s why good methods, and good reporting, really matter. If we get sloppy, when we’re trying to detect small differences, then flaws in the way we design and report our experiments can slip in and pollute our data, drowning out any true statistical signal.