Fox News was excited: "Unplanned children develop more slowly, study finds." The Telegraph was equally shrill in its headline ("IVF children have bigger vocabulary than unplanned children"). And the British Medical Journal press release drove it all: "Children born after an unwanted pregnancy are slower to develop."

The last two, at least, made a good effort to explain that this effect disappeared when the researchers accounted for social and demographic factors. But was there ever any point in reporting the raw finding, from before this correction was made?

I will now demonstrate, with a nerdy table illustration, how you correct for things such as social and demographic factors. You'll have to pay attention, because this is a tricky concept; but at the end, when the mystery is gone, you will see why reporting the unadjusted figures as the finding, especially in a headline, is silly and wrong.

Correcting for an extra factor is best understood by doing something called "stratification". Imagine you do a study, and you find that people who drink are three times more likely to get lung cancer than people who don't. The results are in Table 1. Your odds of getting lung cancer as a drinker are 0.16 (that's 366÷2300). Your odds as a non-drinker are 0.05. So your odds of getting lung cancer are three times higher as a drinker (0.16÷0.05 is roughly 3, and that figure is called the "odds ratio") – see top table, right.

But then some clever person comes along and says: wait, maybe this whole finding is confounded by the fact that drinkers also smoke cigarettes? That could be an alternative explanation for the apparent relationship between drinking and lung cancer. So you want to factor smoking out.

The way to do this is to chop your data in half, and analyse non-smokers and smokers separately. So you take only the people who smoke, and compare drinkers against non-drinkers; then you take only the people who don't smoke, and compare drinkers against non-drinkers in that group separately. You can see the results of this in the second and third tables.

So, now your findings are a bit weird. Suddenly, since you've split the data up by whether people are smokers or not, drinkers and non-drinkers have exactly the same odds of getting lung cancer. The apparent effect of drinking has been eradicated, and this means that the observed risk of drinking was entirely due to smoking: smokers had a higher odds of lung cancer – in fact their odds were 0.3 rather than 0.03, ten times higher – and drinkers were more likely to also be smokers. Looking at the figures in these tables, 203 people smoked out of 1,954 non-drinkers, whereas 1,430 smoked out of 2,666 drinkers.

Finally, I explained all this with a theoretical example, where the odds of cancer apparently trebled before correction for smoking. Why didn't I just use the data from the unplanned pregnancies paper? Because in the real world of research, you're often correcting for lots of things at once. In the case of this paper, they corrected for parents' socioeconomic position and qualifications, sex of child, age, language spoken at home, and a huge list of other factors.

When you're correcting for so many things, you can't use old-fashioned stratification, like I did in this simple example, because you'd be dividing your data up among so many smaller tables that some would have no people in them at all. That's why you calculate your adjusted figures using cleverer methods, such as logistic regression, and likelihood theory. But it all comes down to the same thing. In our example above, alcohol wasn't really associated with lung cancer. And in this BMJ paper, unplanned pregnancy wasn't really associated with slower development. Pretending otherwise is just silly.

Please send your bad science to ben@badscience.net