Ben Goldacre

The Guardian

Saturday July 5 2008

Anyone would think the cold war was still on, with all this top secret scientific data that journalists constantly seem to be writing about. In last week’s column, as you will remember, we saw the Sunday Express front page claiming that a scientist and government adviser called Dr Coghill had performed scientific research, and found that the Bridgend suicide cases all lived closer to a mobile phone mast than average: this was an issue of great public health significance, but when I contacted the researcher, he wasn’t a doctor, he wasn’t really a government adviser, he couldn’t tell me what he meant by “average”, and he had, in a twist of almost incomprehensible ridiculousness, “lost” the data.

This week we have the same thing, from the insurance company Esure, and their agents Mischief PR. They’ve done a very good job of getting publicity for some survey figures. “Fornightly bin collections spark rat plague” was the headline in the Express this time. According to the Daily Mail “the number of pests plaguing homeowners has gone up by more than a fifth in a mere three years.” What caused it? “The rise in unwanted visitors coincides rather neatly with the introduction of fortnightly rubbish collections in half the country.” They all quote reams of detailed data. “Household reports of wasps have risen by 39 per cent, squirrels by 23 per cent, mice by 17 per cent and rats by 12 per cent,” and so on. Similar figures were reported in the Telegraph, on GMTV, and in the Daily Mirror.

I contacted Esure and Mischief to ask about the figures. It’s fairly standard practice to make your data publicly available on this kind of survey, as far as I know. Esure refused to give me the numbers. Have they lost it, perhaps, like Dr Coghill? Apparently not. They do not send out raw data (“this is company policy” is an eerily familiar phrase from insurers). They are, however, happy to answer individual questions.

This presents us with an interesting challenge: can you interrogate a statistical dataset through a letterbox, in a chat with a PR person? It might take a while.

Starting with the easy stuff: you will already have noticed that all the figures quoted are what statisticians would call “relative risk increases“: there is a “39% increase”, but 39% more than what? A very rare thing? A very common thing? The figures for “absolute risk increases” would be nice, please, Esure, and I’d be happy to calculate them myself, from your top secret data.

Then there are the basics of what information was gathered: Esure are claiming a change over time, but there’s no indication of what was measured in the past, when, and how it is being compared with current data. Or did they rely on recall, which is human and flawed, and prone to substantial biases (known in the trade as recall bias)? “Ooh yes.” “Really?” “Mmmm now you come to mention it since they changed the bins I do think I’ve definitely seen more rats…” There are the basic data analysis issues – like “did they only ask people whose rubbish collection patterns have changed about vermin patterns changing, or did they ask everyone?”

But then there are the fascinating statistical issues. Did they just cherry pick the biggest figures? Did they do a “correction for multiple comparisons“? After all, if you measure a huge number of different things, some of them are bound to change, or be different, or appear to be statistically significant, simply by chance: because if you toss a coin enough times, you’ll perfectly easily get five heads in a row, simply by chance. In fact, speaking of statistical significance, what tests did Esure and Mischief do to make sure that their results weren’t simply due to the play of chance? A chi-squared test, perhaps, and if so, on how many subjects? Did their data fulfil the assumptions of the chi-squared test? Was there other numerical information? What was the variance in the data? Where are the p-values? And so on.

I’m very happy to analyse a dataset by playing twenty questions through a letterbox with a PR person, but it might well require yes/no answers to several hundred thousand questions until we have the actual numbers. I don’t know how many, because I can’t even know what they’ve collected.

This research has received blanket media coverage, it’s clearly the subject of great public concern, it speaks to us of vitally important issues of public health, and once again the data is hidden from the public, preventing anyone from analysing its contents and significance. Nobody on the Mirror, the Mail, the Telegraph or the Express seems bothered by this. Clearly it’s you and I who are wrong.