Common sense tells you that if you run enough trials, by chance, you will occasionally get an unexpected outcome. When scientists deem a result “statistically significant,” they’re just saying that given their default expectations (e.g. around 50/50 for a coin toss), the outcomes obtained are unlikely to have occurred by random chance. A fair coin is unlikely to land on heads nine out of 10 tosses, so such an outcome suggests the coin is probably not fair. Unlikely is not the same as impossible, and if you look long and hard you will inevitably stumble upon random events that seem novel but are just the outcome of chance.

I bring this up because earlier this week the New York Times trumpeted: “Obesity Rate for Young Children Plummets 43% in a Decade.” A surprising discovery, and a pretty big deal, right? The article spread like wildfire on Twitter and Facebook. For once, some heartening news about the health of this nation! My immediate reaction, however, was that there must be something we don’t know about obesity to get such a massive change in such a short period of time. Then I started reading.

The warning signs are right there in the Times piece, where by the third paragraph the reporter, Sabrina Tavernise, reveals that “About 8 percent of 2- to 5-year-olds were obese in 2012, down from 14 percent in 2004.” The six-percentage-point difference in absolute terms results in the 43 percent relative difference. The Times’ headline blared the relative figure because the absolute drop is just not that impressive.

My curiosity was piqued enough to look at the original report from which the Times (and the Washington Post, USA Today, and CNN, to name a few) drew the findings. It appears in the Journal of the American Medical Association and comes from a group of researchers with Centers for Disease Control and Prevention affiliations—both legitimate institutions. The report’s closing two sentences are telling: “Overall, there have been no significant changes in obesity prevalence in youth or adults between 2003-2004 and 2011-2012. Obesity prevalence remains high and thus it is important to continue surveillance.” Would you have anticipated such a downbeat conclusion from the newspaper headlines? I doubt it. When evaluating the total sample across age groups, rather than just 2- to 5-year-olds, there hasn’t been any change at all. From the perspective of the researchers themselves, the continuing obesity problem seems to be the most important finding.

The study itself illuminates why we should be skeptical of the headlines about the study. Here is how its authors lay out exactly why one should be cautious about even the most optimistic findings, the 43 percent drop in obesity in the 2-to-5 age bracket:

In the current analysis, trend tests were conducted on different age groups. When multiple statistical tests are undertaken, by chance some tests will be statistically significant (eg, 5% of the time using α of .05). In some cases, adjustments are made to account for these multiple comparisons, and a P value lower than .05 is used to determine statistical significance. In the current analysis, adjustments were not made for multiple comparisons, but the P value is presented.

More plainly, the researchers are acknowledging that, yes, if you do enough comparisons and interpretations across various age cohorts, you’re bound to turn up an exciting statistically significant result eventually. If you do 10 flips of enough coins, you will at some point flip one to land on heads 10 times. This isn’t fate—it’s probability, and it’s inevitable in the long run. In order to separate out the noise from the real significant results, the authors should have held themselves to a higher standard. Instead they repeatedly declare that they did not and admit that the significant decline in obesity in the age group in question should be treated with caution. In isolation, the result in the decline in obesity has a 1 in 33 chance of being due to random chance (P value: 0.03). But remember that they kept checking for changes in obesity over the years, so looking at the whole study, the random chance of getting these results is much higher than 1 out of 33.



So, two primary takeaways. The first concerns how the sausage is made in modern science, and the question is: Why was this even published in the first place, with all the caveats? Because a great deal of research manages to get published. Where there’s a will, there’s a way. So when you read a headline that appears too good to be true, remember: Just because it appears in a reputable journal does not mean that a study has “proved” anything.

A far bigger issue is that studies like these, and the headlines that result, drive the discussion about public health and policy in this country. The media seizes on sexy results, amplifies them without due skepticism, and the public is misled. This can impact billions of dollars allocated to campaigns meant to capitalize on the supposed implications of scientific studies. It’s hardly an academic footnote in this case. Commentators are already attempting to adduce the reasons for the decline in obesity in this age, pointing to the dietary changes in preschool menus, awareness campaigns, and exercise programs that specifically target tots.

Let’s not congratulate these policies just yet, because the most likely upshot is that this finding won’t be verified over time. In other words, it is probably a statistical fluke. I will be thrilled if studies with more methodological rigor prove me wrong.