Following charges of flawed statistics, major medical journal sets the record straight

One year after a damning review suggested that many published clinical trials contain statistical errors, The New England Journal of Medicine ( NEJM ) today is correcting five of the papers fingered and retracting and republishing a sixth, about whether a Mediterranean diet helps prevent heart disease. (Spoiler alert: It still does, according to the new version of the paper.) Despite errors missed until now, in many ways the journal system worked as intended, with NEJM launching an inquiry within days of the accusations.

The journal’s unusual move was prompted by a controversial analysis published in June 2017. Writing in Anaesthesia , where he is also an editor, anesthesiologist John Carlisle of Torbay Hospital in Torquay, U.K., took a statistical deep dive into 5087 randomized, controlled trials. With the help of a computer program, Carlisle looked for a specific type of anomaly: nonrandom assignment of volunteers to different treatments, when the trial had claimed the assignments were random. This can skew a trial’s results—for example, if many more elderly people are assigned to a control group while younger ones get an experimental treatment, the new drug may look like it has fewer side effects because the people getting it are healthier.

Across eight journals, Carlisle analyzed how certain features of the volunteers—such as their height, weight, and age—were spread across the treatments tested. If he didn’t see certain patterns—if the distribution was too perfect, or too far off—he suspected the assignments were not truly random, whether because of scientific misconduct or honest error. Roughly 2% of the papers he ran through his program fell into this questionable category.

But, “Although my analysis throws up questions, it doesn’t necessarily throw up answers,” Carlisle says. For that, the journals needed to step in. Within days of Carlisle’s report, NEJM had homed in on 11 of its papers with the loudest alarm bells. Six turned out to contain mistakes. In five cases, the issue was a mixup of statistical terms—for example, writing “standard deviation,” which measures variability across data, in place of “standard error,” a type of standard deviation that depends on sample size.

But in the sixth, a large clinical trial in Spain published in 2013, which had reported that a Mediterranean diet could prevent heart disease in people at risk, deeper problems surfaced. “It turned out when we contacted the investigators, they had already been working on it, they had seen the same thing we had and been concerned,” says NEJM Editor-in-Chief Jeffrey Drazen in Waltham, Massachusetts.

Nearly 7500 people all over Spain had enrolled in the trial as long as 15 years ago, so tracking down what might have gone wrong was no easy feat. A months-long inquiry by the Spanish researchers and NEJM staff uncovered that up to 1588 people in the trial hadn’t been properly randomized: Some were assigned to the same diet as someone else in their household (a common feature of diet studies, but not reported in the original paper). Others, who lived in a rural area, were assigned to different diets based on the clinic closest to them—for example, one group had to pick up a liter of olive oil each week. “The investigator realized he couldn’t get people to travel as far as they needed so he made his study ‘cluster randomized,’” by clinic rather than by individual, Drazen says.

The authors reanalyzed their data without those 1588 participants and found that despite the missteps, the conclusion held: Nuts, olive oil, and fatty fish remained a net positive on heart health, though the conclusions came with somewhat less statistical oomph than in the original paper.

And what of the other seven journals targeted by Carlisle last year? Science contacted all of them to ask whether they, too, had investigated. The editor-in-chief of one, the Canadian Journal of Anesthesia , said an inquiry is in progress, but running slowly because of limited resources. At another journal, Anesthesiology , editors had looked over the papers and found no reason to retract any of them. The other journals either didn’t respond or have chosen not to investigate. Some noted that Carlisle’s paper had come under criticism when it was published, in part because its methods assumed certain variables, like height and weight, are unrelated. (Carlisle agrees this was a limitation, and says not every paper he identified necessarily contains errors.)

Carlisle pinpointed eight papers at his own journal, Anaesthesia , worth probing. “We wrote to the authors and got two responses … the others have not responded,” he says. In one of those two, a correction was published this week, though it didn’t impact the paper’s conclusions; in the second, the authors said they no longer had the patient data the journal was requesting. For the six whose answer was radio silence, Carlisle isn’t sure how hard to push. “How far do you drill into it?” he wonders, especially when the time and money to do so are scarce.

Although most of the errors so far are minor ones, Carlisle wonders whether they’re a harbinger of statistical problems in parts of papers he didn’t examine, such as the all-important results section. Drazen was unsettled enough by what his own journal found to give his manuscript editors a statistics course, and implement extra scrutiny of statistics in accepted papers.