Ben Goldacre

Saturday April 7, 2007

The Guardian

It is possible to be very unlucky indeed. A nurse called Lucia de Berk has been in prison for 5 years in Holland, convicted of 7 counts of murder and 3 of attempted murder. An unusually large number of people died when she was on shift, and that, essentially, along with some very weak circumstantial evidence, is the substance of the case against her. She has never confessed, but her trial has generated a small collection of theoretical papers in the statistics literature (below), and a major government enquiry will report on her sentence in the next few weeks.

The judgement was largely based on a figure of “one in 342 million against”. Now, even if we found errors in this figure â€“ and we will â€“ the figure itself would still be largely irrelevant. Unlikely things do happen: somebody wins the lottery every week; children are struck by lightning; I have an extremely fit girlfriend. It is only significant that something very specific and unlikely happens if you have specifically predicted it beforehand.

Here is an analogy. Imagine I am standing near a large wooden barn with an enormous machine gun. I place a blindfold over my eyes and â€“ laughing maniacally â€“ I fire off many thousands and thousands of bullets into the side of the barn. I then drop the gun, walk over to the wall, examine it closely for some time, all over, pacing up and down: I find one spot where there are three bullet holes close to each other, and then I draw a target around them, announcing proudly that I am an excellent marksman. You would, I think, disagree with both my methods and conclusions for that deduction. But this is exactly what has happened in Lucia’s case: the prosecutors have found 7 deaths, on one nurse’s shifts, in one hospital, in one city, in one country, in the world, and then drawn a target around them. A very similar thing happened with the Sally Clark cot death case.

Before you go to your data, with your statistical tool, you have to have a specific hypothesis to test. If your hypothesis comes from analysing the data, then there is no sense in analysing the same data again to confirm it. This is a rather complex, philosophical, mathematical form of circularity: but there were also very concrete forms of circular reasoning in the case. To collect more data, the investigators went back to the wards to find more suspicious deaths. But all the people who have been asked to remember ‘suspicious incidents’ know that they are being asked because Lucia may be a serial killer. There is a high risk that “incident was suspicious” became synonymous with “Lucia was present”. Some sudden deaths when Lucia was not present are not listed in the calculations: because they are in no way suspicious, because Lucia was not present.

“We were asked to make a list of incidents that happened during or shortly after Lucia’s shifts,” said one hospital employee. In this manner more patterns were unearthed, and so it became even more likely that investigators found more suspicious deaths on Lucia’s shifts. This is the stuff of nightmares.

Meanwhile, a huge amount of corollary statistical information was almost completely ignored. In the three years before Lucia worked on the ward in question, there were 7 deaths. In the three years that Lucia did work on that ward, there were 6 deaths. It seems odd that the death rate should go down on a ward at the precise moment that a serial killer â€“ on a killing spree â€“ arrives on the scene. In fact, if Lucia killed them all, then there must have been no natural deaths on that ward at all, in the 3 years that she worked there.

On the other hand, as they revealed at her trial, Lucia did like tarot. And she does sound a bit weird in her private diary. So she might have done it after all.

But the strangest crime of all is that the prosecution’s statistician made a simple mathematical error to produce the figure of one in 342 million. He combined individual statistical tests by multiplying p-values. This bit’s for the hardcore science nerds, and will be edited out by the paper, but I intend to write it anyway. You do not just multiply p-values together, you weave them with a clever tool, like maybe “Fisher’s method for combination of independent p-values”.

If you multiply p-values together, then harmless incidents rapidly become dangerously unlikely. Let’s say you worked in 20 hospitals, each with a harmless incident pattern: say p=0.5. If you multiply those harmless p-values, you end up with a final p-value of 0.5 to the power of 20, which is p < 0.000001, which is extremely, very, highly statistically significant. With this mathematical error, if you change hospital a lot, you automatically become a suspect. Have you worked in 20 hospitals? For god's sake don't tell the Dutch police if you have. References: Here's a presentation to the UCL "Evidence" Group by Dutch statistician Peter Grunwald:

badscience.net/files/evidencehandout.PDF

Statistician Richard Gill’s page on the case:

www.math.leidenuniv.nl/~gill/lucia.html

“Elffersâ€™ [court statistician] method, and Elffersâ€™ mistake”

www.math.leidenuniv.nl/~gill/elfferscorrected.pdf

The wikipedia page is excellent for the basic story:

en.wikipedia.org/wiki/Lucia_de_Berk

arXiv pre-print

arxiv.org/PS_cache/math/pdf/0607/0607340v1.pdf

“Lucia: Killed by Innocent Heterogeneity”

www.math.leidenuniv.nl/~gill/hetero.pdf

And lastly – because he always got there first – Richard Feynman used an excellent example to illustrate this phenomenon of post hoc coincidence detection: “You know, the most amazing thing happened to me tonight. I was coming here, on the way to the lecture, and I came in through the parking lot. And you wonâ€™t believe what happened. I saw a car with the license plate ARW 357. Can you imagine? Of all the millions of license plates in the state, what was the chance that I would see that particular one tonight? Amazing…”

oh, and note the very informative post below from Peter Grunwald.

www.badscience.net/?p=392#comment-12455