Courtrooms are inexact places. Juries and the processes they use to reach verdicts are parameterized, but a trial is nonetheless all about convincing those juries of something that is inexact and subjective at its core. In the US, this is “reasonable doubt.” To find guilt, a judge and-or jury must determine beyond a reasonable doubt that a defendant is guilty of an offense. In one courtroom, that reasonable doubt maybe be countered by overwhelming physical evidence, while, in another, it may be the testimony of an incentivized witness.

What it comes down to in the end is 12 normal non-expert people—strangers—arguing in a room. It should be an unsettling idea. I find it unsettling but also very interesting.

This is part of why I flagged a study posted earlier this week to arVix describing a machine learning-based system for identifying “deception” in courtroom videos. Basically, it uses computer vision to identify and classify facial micro-expressions and audio frequency analysis to pick out revealing patterns in voices. The resulting automatic deception classifier was found to be almost 90 percent accurate, handily beating out humans assigned to the same task. This was based on evaluations of 104 mock courtroom videos featuring actors instructed to be either deceptive or truthful.

“Deception is common in our daily lives,” the study opens. “Some lies are harmless, while others may have severe consequences and can become an existential threat to society. For example, lying in a court may affect justice and let a guilty defendant go free. Therefore, accurate detection of a deception in a high stakes situation is crucial for personal and public safety.”

Machine learning models are trained to make predictions based on features, which are basically measurements of some property with a hypothetical predictive value. Like, if you wanted to come up with a model predicting whether or not a car is likely to break down in the next year, you might look at a whole bunch of cars that have broken down or not broken down and at things describing those cars like mileage and year and make. Those would be features.

Here, those features include things like the aforementioned micro-expressions, including but not limited to “lips protruded” and “eyebrows frown.” These were the most important/useful features behind the model, but it also incorporated audio analysis and textual analysis of courtroom transcripts. Audio and textual features didn’t contribute much of anything to the overall accuracy of the predictive model. Micro-expressions were generally enough to suss out a lie.

Of course, to come up with a predictive model, we need a source of ground truth. That is, we need to have some examples where we know for sure that someone is being deceptive. This is where the study gets a bit soft, arguably. It relies on a deception detection dataset released a few years ago by computer scientists at the University of Michigan and the University of North Texas. The dataset basically consists of videos in which participants were asked to be either truthful or deceptive in different scenarios. The setup is a bit more clever than it sounds, but it seems to me to be a limitation, however necessary, to not have any “real world” data.

The subjectivity of a courtroom is both a bug and a feature. It allows for things like empathy, but it also (frequently) allows for very wrong determinations of guilt. That’s unsettling, but so is the idea of AI courtroom lie detectors and what sort of impact that may have on judges and juries.