E.T. Jaynes in his posthumous Probability Theory: The Logic of Science (on Bayesian statistics) includes a chapter 5 on “Queer Uses For Probability Theory”, discussing such topics as ESP; miracles; heuristics & biases; how visual perception is theory-laden; philosophy of science with regard to Newtonian mechanics and the famed discovery of Neptune; horse-racing & weather forecasting; and finally—section 5.8, “Bayesian jurisprudence”. Jaynes’s analysis is somewhat similar in spirit to my above analysis, although mine is not explicitly Bayesian except perhaps in the discussion of gender as eliminating one necessary bit.

The following is an excerpt; see also “Bayesian Justice”.

It is interesting to apply probability theory in various situations in which we can’t always reduce it to numbers very well, but still it shows automatically what kind of information would be relevant to help us do plausible reasoning. Suppose someone in New York City has committed a murder, and you don’t know at first who it is, but you know that there are 10 million people in New York City. On the basis of no knowledge but this, e(Guilty|X)=−70db is the plausibility that any particular person is the guilty one.

How much positive evidence for guilt is necessary before we decide that some man should be put away? Perhaps +40 db, although your reaction may be that this is not safe enough, and the number ought to be higher. If we raise this number we give increased protection to the innocent, but at the cost of making it more difficult to convict the guilty; and at some point the interests of society as a whole cannot be ignored.

For example, if 1000 guilty men are set free, we know from only too much experience that 200 or 300 of them will proceed immediately to inflict still more crimes upon society, and their escaping justice will encourage 100 more to take up crime. So it is clear that the damage to society as a whole caused by allowing 1000 guilty men to go free, is far greater than that caused by falsely convicting one innocent man.

If you have an emotional reaction against this statement, I ask you to think: if you were a judge, would you rather face one man whom you had convicted falsely; or 100 victims of crimes that you could have prevented? Setting the threshold at +40 db will mean, crudely, that on the average not more than one conviction in 10,000 will be in error; a judge who required juries to follow this rule would probably not make one false conviction in a working lifetime on the bench.

In any event, if we took +40 db starting out from −70 db, this means that in order to ensure a conviction you would have to produce about 110 db of evidence for the guilt of this particular person. Suppose now we learn that this person had a motive. What does that do to the plausibility for his guilt? Probability theory says

e(Guilty|Motive)=e(Guilty|X)+10log10P(Motive|Guilty)P(Motive|Not Guilty) (5-38)

≃−70−10log10P(Motive|Not Guilty)

since P(Motive|Guilty)≃1, i.e. we consider it quite unlikely that the crime had no motive at all. Thus, the [importance] of learning that the person had a motive depends almost entirely on the probability P(Motive|Not Guilty) that an innocent person would also have a motive.

This evidently agrees with our common sense, if we ponder it for a moment. If the deceased were kind and loved by all, hardly anyone would have a motive to do him in. Learning that, nevertheless, our suspect did have a motive, would then be very [important] information. If the victim had been an unsavory character, who took great delight in all sorts of foul deeds, then a great many people would have a motive, and learning that our suspect was one of them is not so [important]. The point of this is that we don’t know what to make of the information that our suspect had a motive, unless we also know something about the character of the deceased. But how many members of juries would realize that, unless it was pointed out to them?

Suppose that a very enlightened judge, with powers not given to judges under present law, had perceived this fact and, when testimony about the motive was introduced, he directed his assistants to determine for the jury the number of people in New York City who had a motive. If this number is Nm then

P(Motive|Not Guilty)=Nm−1(Number of people in New York)−1≃10−7(Nm−1)

and equation (5-38) reduces, for all practical purposes, to

e(Guilty|Motive)≃−10log(Nm−1) (5-39)

You see that the population of New York has canceled out of the equation; as soon as we know the number of people who had a motive, then it doesn’t matter any more how large the city was. Note that (5-39) continues to say the right thing even when Nm is only 1 or 2.

You can go on this way for a long time, and we think you will find it both enlightening and entertaining to do so. For example, we now learn that the suspect was seen near the scene of the crime shortly before. From Bayes’ theorem, the [importance] of this depends almost entirely on how many innocent persons were also in the vicinity. If you have ever been told not to trust Bayes’ theorem, you should follow a few examples like this a good deal further, and see how infallibly it tells you what information would be relevant, what irrelevant, in plausible reasoning.

In recent years there has grown up a considerable literature on Bayesian jurisprudence; for a review with many references, see Vignaux and Robertson (1996) [This is apparently Interpreting Evidence: Evaluating Forensic Science in the Courtroom –Editor].

Even in situations where we would be quite unable to say that numerical values should be used, Bayes’ theorem still reproduces qualitatively just what your common sense (after perhaps some meditation) tells you. This is the fact that George Polya demonstrated in such o exhaustive detail that the present writer was convinced that the connection must be more than qualitative.