Lately I’ve been interested in true crime stories. It started with Serial and Undisclosed, two excellent podcasts on the case of Adnan Syed, a Baltimore teenager wrongfully (yes, wrongfully) convicted of killing his girlfriend in 1999. Then came the popular Netflix documentary Making a Murderer, which detailed the case of Steven Avery, who was wrongfully convicted of rape in 1985, released in 2003 after being exonerated by DNA evidence, and then (apparently) framed for murder.

Because of my interest in these stories, my parents recommended I watch another, less known documentary series called Death on the Staircase. The series documented the trial of Michael Peterson (no relation) for allegedly killing his wife Kathleen. He found Kathleen at the base of the staircase and assumed she had taken a fall. The prosecution claimed that he actually beat her to death. However, the physical evidence didn’t really match a beating (no skull fractures or brain contusions, no splatter on the ceiling), nor did it match a fall (falls don’t usually cause that amount of bleeding). Neither theory can explain the microscopic owl feathers found in Kathleen’s hand, nor the suspiciously talon-shaped lacerations on her scalp.

In all four of these cases (including both of Steven Avery’s convictions), the jury found the defendant guilty on the basis of flawed, circumstantial evidence. I think all three men are innocent, but even if I’m wrong or have been misled (though I’ve done independent research on all three cases), it seems like there must be at least a reasonable doubt of their guilt. So how could the juries convict them? Unless…

The Presumption of Innocence is all a Sham

There, I said it. How does the presumption of innocence actually play out in jury trials? The judge tells the jury to presume innocence, to return a verdict of not guilty if there is any reasonable doubt of the accused’s guilt. Defense lawyers may remind the jury of this rule. But just because people are told to do something doesn’t mean they necessarily do it.

Supposed you’re a juror and you’re 83% sure the guy is guilty. (There’s a debate about whether it’s possible to quantify confidence in a legal setting like this, but I think it is possible and useful, see Tillers and Gottfried, 2007.) A judge or legal scholar might tell you that that 17% constitutes a reasonable doubt, and if you vote to convict (and you are a representative juror) then seventeen of every hundred people convicted of crimes will be innocent. But to a regular person, 83% is really close to certain. 83% is a B+. It’s the Rotten Tomatoes score of a good movie like Big Trouble in Little China. If I asked you if Big Trouble in Little China is a good movie, beyond a reasonable doubt, would you really say no? Be honest.

When you actually ask people to quantify what they think constitutes a reasonable doubt, the answers are shockingly low. According to Judge Richard Posner (quoted in Tillers and Gottfried, 2007):

When . . . judges and juries are asked to translate the requisite confidence into percentage terms or betting odds, they sometimes come up with ridiculously low figures-in one survey, as low as 76 percent, see United States v. Fatico, 458 F. Supp. 388, 410 (E.D.N.Y. 1978); in another, as low as 50 percent, see McCauliff, Burdens of Proof: Degrees of Belief, Quanta of Evidence, or Constitutional Guarantees?, 35 Vand. L. Rev. 1293, 1325 (1982) (tab. 2). The higher of these two figures implies that, in the absence of screening by the prosecutor’s office, of every 100 defendants who were convicted 24 (on average) might well be innocent.

[There’s actually a mistake in Posner’s argument. See this post for an explanation.]

I wouldn’t be surprised if jurors adopt a de facto balance-of-evidence standard despite being told to adopt a presumption of innocence. I can’t prove that (yet), but in most other contexts, 50-50 is the ideal. A judge at a competition or sporting event who gives each participant an ex ante equal chance to win is called unbiased, fair, neutral. The presumption of innocence is basically telling people to be biased, unfair, and not neutral in favour of the defence. It goes against their egalitarian inclinations.

Daniel Kahneman argues that when people encounter a hard question, they mentally substitute it for an easier question they know how to answer. I suspect that jurors might be doing a similar substitution. Instead of answering the question, “Do I think the prosecution has proven that this person is guilty beyond reasonable doubt?” they answer the simpler question, “Do I think this person is guilty?”

Meanwhile, in Scotland…

Suppose you are offered a choice between apple pie and chocolate cake. You choose apple pie. Now suppose instead that you are offered a choice between apple pie, chocolate cake, and chocolate cake on a different coloured plate. It wouldn’t make much sense for you to change your choice to chocolate cake (on a normal plate). If you did so, you would be violating an assumption economists call the independence of irrelevant alternatives (IIA). IIA is different in different contexts, but it basically says that your preference for A over B shouldn’t depend in any way on whether C is an available option.

I was reminded of IIA while watching Death on the Staircase. At one point, Michael Peterson’s attorney David Rudolf mentions that in Scotland they have a third verdict in addition to “guilty” and “not guilty.” It’s “not proven.” Both “not guilty” and “not proven” result in an acquittal, so adding an extra way to acquit shouldn’t result in a greater presumption of innocence. In the two-verdict system, people are supposed to treat “not guilty” as including both “not guilty” and “not proven.” If IIA holds, there should be no difference in the rate of convictions under each system.

The thing I didn’t mention about IIA is that it routinely fails experimental tests. You can sway people’s choices by presenting them with irrelevant alternatives. I believe that IIA would fail in this case, too. My intuition tells me that, yes, having two ways to acquit will result in more acquittals.

In 1807 at the trial of Aaron Burr for treason, the jury was not content to return one of the usual verdicts, guilty or not guilty. The evidence at trial failed to prove Burr’s guilt, but the jury was too suspicious of the scoundrel to declare him not guilty. Instead the jury offered this grudging acquittal: “We of the jury say that Aaron Burr is not proved to be guilty under this indictment by any evidence submitted to us.”! Almost two hundred years later, a United States senator echoed the Burr acquittal in the impeachment trial of President Clinton. Disliking both of the traditional verdicts, Senator Arlen Specter offered a verdict drawn from Scottish law: not proven. His vote was recorded, however, as not guilty.

Those are the opening paragraphs of a comment in the Chicago Law Review arguing for this third verdict (Bray, 2005). Bray argues that “The introduction of a third verdict would increase acquittals because people like to compromise. When given three choices, people will choose the middle one more often than they would if it were paired with one of the other choices” (p. 1314). For this claim, he cites Cass Sunstein, who says

People are averse to extremes. Whether an option is extreme depends on the stated alternatives. Extremeness aversion gives rise to compromise effects. As between given alternatives, people seek a compromise. In this as in other respects, the framing of choice matters; the introduction of (unchosen, apparently irrelevant) alternatives into the frame can alter the outcome. (Sunstein, 1997, p. 8)

I am inclined to agree, but I’d like to see a more thorough proof of this proposition.

Let the Science Begin!

How would one go about testing the effect of a three-verdict system on convictions and acquittals? The first thing that comes to mind is to observe the differences between Scotland and other countries. Since Scotland has a three-verdict system, we might wonder if it tends to convict fewer people charged with crimes. The problem with this sort of comparison is that there’s no guarantee that the set of people charged with crimes in Scotland is similar to the set of people charged with crimes in, say, the United States.

For one thing, there are different laws in different countries. You could control for this by only comparing the people accused of similar crimes in both countries. A more worrying difference is that the police in Scotland, anticipating a higher burden of evidence, might choose not to charge people they don’t think they can convict. Ideally, you would want to look at people charged with the same crimes, with the same quality of evidence mounted against them. This seems to be nearly impossible, unless you’re prepared to pay research assistants to read thousands upon thousands of pages of court documents to code each piece of evidence into a usable data set.

Given these difficulties, I propose studying the problem in an experimental setting.

First, I’d divide a set of participants, at random, into four treatments. The first treatment would be shown a series of hypothetical crimes with evidence for and against the accused’s guilt. They would be asked to choose between “guilty” and “not guilty” with the beyond-reasonable-doubt standard explained to them in the same way it would be in a real court. The second treatment would be shown the same list of hypothetical crimes, but they would be asked to choose between “guilty,” “not guilty,” and “not proven,” with the options explained in the same way they would be in a Scottish court.

The third and fourth treatments would have people looking at each of the crimes and assigning each of them a probability of guilt on a scale from 0% to 100%. Then, after they had assigned all these probabilities, the third treatment would be asked where they would draw the line between “guilty” and “not guilty,” while the fourth treatment would be asked where they would draw the lines between “guilty,” “not guilty,” and “not proven.”

The purpose of the first two treatments should be clear enough: compare whether people will acquit some of the people they would have convicted if offered a third possible verdict. The third and fourth treatments would allow us to both implicitly and explicitly measure what probability threshold constitutes “reasonable doubt.” The implicit measurement comes from comparing their answers to those of the first two treatments (e.g. “in cases where subjects in treatments 3 and 4 thought there was an 85% chance of guilt, most subjects in treatment 1 chose to convict”), the explicit measure is asking them what their threshold would be (e.g. “the average subject in treatment 3 told us that 94% was the appropriate threshold between guilt and non-guilt”).

Why measure these thresholds in two different ways? Well, I suspect that when people are asked to choose a number to place the threshold between “guilty” and “not guilty,” they’ll give a different answer than the one they would choose in practice. It’s one thing to say, in the abstract, that someone who you’re 90% sure is guilty should be acquitted; it’s quite another to look at a specific case and say, “despite the fingerprint evidence, he should go free.”

If I could get some solid evidence that people are not adhering to the presumption of innocence under a two-verdict system, but that they would under a three-verdict system, I hope I could push the envelope in the direction of a better, fairer criminal justice system.