Statistical tools outmatch humans at predicting whether someone convicted of a crime will reoffend under conditions designed to mimic real life criminal justice settings, according to a study published in the Feb. 14 issue of Science Advances.

The results clarify the capabilities of these tools by building on another study that, contrary to most research, concluded algorithms predict recidivism no better than unassisted humans. The new study’s findings may inform efforts to contain prison growth and indirectly inform legislation by helping decision-makers identify and release individuals who do not pose much risk to public safety.

“These findings lend confidence to the basic principle that algorithms can help one obtain more accurate assessments of risk of reoffending,” said Jennifer Skeem, a psychologist at the University of California, Berkeley and an author on the study.

“But, like any tools, risk assessment instruments must be coupled with sound policy choices to spur broader criminal justice reform,” added Sharad Goel, a computational social scientist at Stanford University and another author on the study.

Risk Assessment Instruments (RAIs) are checklists regularly used to help judges, correctional authorities and parole boards determine whether someone who has committed a criminal offense is at low, medium or high risk of re-offending. This provides a basis for decisions about whether a defendant should be released or remain incarcerated, and about what kind of services he or she should receive. Risk assessment can also provide a basis for federal legislation. For example, the First Step Act, which was signed into law last year, requires the U.S. Attorney General to develop a risk assessment system to place prisoners in programs designed to help them reenter society.

“Risk assessment has long been a part of decision-making in the criminal justice system, based in part on the understanding that it increases predictive accuracy,” said Skeem. “The rise of more detailed datasets and more advanced predictive analytics have expanded the reach of risk assessment.”

However, a high-profile 2018 study called the tools’ validity into question. In that work, researchers reported that COMPAS — commercial software widely used to predict recidivism — was no more accurate than people with little or no criminal justice experience. The study involved conducting an online survey in which participants recruited from Amazon’s Mechanical Turk were given a brief description of a defendant and then asked to predict whether each defendant would commit another offense within two years of their most recent crime. Critically, participants were provided feedback informing them of their predictions’ accuracy. Unaided human participants achieved roughly the same accuracy as COMPAS, correctly predicting 62% of outcomes while the algorithm correctly predicted 65%.

In order to rigorously evaluate statistical tools such as COMPAS in contexts that better approximate the criminal justice system, Zhinyuan Lin, a Ph.D. candidate in computer science at Stanford University and the first author on the study, and colleagues performed a series of experiments in which 645 participants, also recruited from Amazon’s Mechanical Turk, were selected from six risk buckets ranging from “almost certainly NOT arrested” to “almost certainly arrested,” and five subcategories with designated percentage ranges of risk of re-offending. The goal was to more specifically estimate whether someone would reoffend within two years.

The researchers tested the impact of providing participants with streamlined descriptions of defendants, as in the 2018 study, that consisted of the individual’s sex, age, current charge and number of prior adult and juvenile offenses, versus enriched information drawn from another risk assessment tool that includes their education level, employment, and any substance use, painting a more detailed portrait of each individual. Lin and colleagues also randomly assigned participants to receive or not receive feedback across a series of trials and tested how the base rate of re-arrest for different types of crimes (which averages 48% for all crimes but dips to 11% for violent crime) impacted predictions.

Although the results mirrored the 2018 study when participants were given streamlined information and provided with feedback after each prediction, algorithms won out when feedback was absent and when risk factor information was more comprehensive—conditions that more closely parallel the uncertainty and complexity inherent in real-world criminal justice. Lin and colleagues suggest that the tools were simply able to make better use of additional information than humans.

Even though all participants were informed about base rates for re-offense, this information only improved accuracy among those who also received feedback on the outcome of each trial, suggesting humans might be on a more level playing field with algorithms if they could be provided with feedback, which decision-makers rarely receive.

While the study demonstrates clear benefits of risk assessment tools, the researchers caution that they must be carefully scrutinized.

“Policymakers must take care when using risk assessment tools to guide such high-stakes decision,” said Goel. “In other domains — ranging from healthcare to facial recognition — we've seen that poorly designed algorithms can mimic biases in the data on which they were trained, potentially exacerbating inequities. When applied to criminal justice, risk assessment tools must be intentionally designed and regularly audited to ensure equitable outcomes.”