‘The cure for “bad statistics” isn’t “no statistics” — it’s using statistical tools properly’

The chances of winning the UK’s National Lottery are absurdly low — almost 14 million to one against. When you next read that somebody has won the jackpot, should you conclude that he tampered with the draw? Surely not. Yet this line of obviously fallacious reasoning has led to so many shaky convictions that it has acquired a forensic nickname: “the prosecutor’s fallacy”.

Consider the awful case of Sally Clark. After her two sons each died in infancy, she was accused of their murder. The jury was told by an expert witness that the chance of both children in the same family dying of natural causes was 73 million to one against. That number may have weighed heavily on the jury when it convicted Clark in 1999.

As the Royal Statistical Society pointed out after the conviction, a tragic coincidence may well be far more likely than that. The figure of 73 million to one assumes that cot deaths are independent events. Since siblings share genes, and bedrooms too, it is quite possible that both children may be at risk of death for the same (unknown) reason.

A second issue is that probabilities may be sliced up in all sorts of ways. Clark’s sons were said to be at lower risk of cot death because she was a middle-class non-smoker; this factor went into the 73-million-to-one calculation. But they were at higher risk because they were male, and this factor was omitted. Which factors should be included and which should be left out?

The most fundamental error would be to conclude that if the chance of two cot deaths in one household is 73 million to one against, then the probability of Clark’s innocence was also 73 million to one against. The same reasoning could jail every National Lottery winner for fraud.

Lottery wins are rare but they happen, because lots of people play the lottery. Lots of people have babies too, which means that unusual, awful things will sometimes happen to those babies. The court’s job is to weigh up the competing explanations, rather than musing in isolation that one explanation is unlikely. Clark served three years for murder before eventually being acquitted on appeal; she drank herself to death at the age of 42.

Given this dreadful case, one might hope that the legal system would school itself on solid statistical reasoning. Not all judges seem to agree: in 2010, the UK Court of Appeal ruled against the use of Bayes’ Theorem as a tool for evaluating how to put together a collage of evidence.

As an example of Bayes’ Theorem, consider a local man who is stopped at random because he is wearing a distinctive hat beloved of the neighbourhood gang of drug dealers. Ninety-eight per cent of the gang wear the hat but only 5 per cent of the local population do. Only one in 1,000 locals is in the gang. Given only this information, how likely is the man to be a member of the gang? The answer is about 2 per cent. If you randomly stop 1,000 people, you would (on average) stop one gang member and 50 hat-wearing innocents.

We should ask some searching questions about the numbers in my example. Who says that 5 per cent of the local population wear the special hat? What does it really mean to say that the man was stopped “at random”, and do we believe that? The Court of Appeal may have felt it was spurious to put numbers on inherently imprecise judgments; numbers can be deceptive, after all. But the cure for “bad statistics” isn’t “no statistics” — it’s using statistical tools properly.

Professor Colin Aitken, the Royal Statistical Society’s lead man on statistics and the law, comments that Bayes’ Theorem “is just a statement of logic. It’s irrefutable.” It makes as much sense to forbid it as it does to forbid arithmetic.

. . .

These statistical missteps aren’t a uniquely British problem. Lucia de Berk, a paediatric nurse, was thought to be the most prolific serial killer in the history of the Netherlands after a cluster of deaths occurred while she was on duty. The court was told that the chance this was a coincidence was 342 million to one against. That’s wrong: statistically, there seems to be nothing conclusive at all about this cluster. (The death toll at the unit in question was actually higher before de Berk started working there.)

De Berk was eventually cleared on appeal after six years behind bars; Richard Gill, a British statistician based in the Netherlands, took a prominent role in the campaign for her release. Professor Gill has now turned his attention to the case of Ben Geen, a British nurse currently serving a 30-year sentence for murdering patients in Banbury, Oxfordshire. In his view, Geen’s case is a “carbon copy” of the de Berk one.

Of course, it is the controversial cases that grab everyone’s attention, so it is difficult to know whether statistical blunders in the courtroom are commonplace or rare, and whether they are decisive or merely part of the cut and thrust of legal argument. But I have some confidence in the following statement: a little bit of statistical education for the legal profession would go a long way.

Written for and first published at ft.com.