Algorithms in the Criminal Justice System: Risk Assessment Tools

Summary

Artificial Intelligence is used widely throughout the criminal justice system. The most commonly used are "pretrial risk assessment" algorithms, used in nearly every state. Criminal justice algorithms—sometimes called “risk assessments” or “evidenced-based methods”—are controversial tools that purport to predict future behavior by defendants and incarcerated persons. The tools vary but estimate using “actuarial assessments” (1) the likelihood that the defendant will re-offend before trial (“recidivism risk”) and (2) the likelihood the defendant will fail to appear at trial (“FTA”).

These often proprietary techniques are used to set bail, determine sentences, and even contribute to determinations about guilt or innocence. Yet the inner workings of these tools are largely hidden from public view.

Many “risk assessment” algorithms take into account personal characteristics like age, sex, geography, family background, and employment status. As a result, two people accused of the same crime may receive sharply different bail or sentencing outcomes based on inputs that are beyond their control—but have no way of assessing or challenging the results.

As criminal justice algorithms have come into greater use at the federal and state levels, they have also come under greater scrutiny. Many criminal justice experts have denounced “risk assessment” tools as opaque, unreliable, and unconstitutional.

Background

"Risk assessment" tools are algorithms that use socioeconomic status, family background, neighborhood crime, employment status, and other factors to reach a supposed prediction of an individual's criminal risk, either on a scale from “low” to “high” or with specific percentages. See Wisconsin’s COMPAS risk assessment questionnaire, from ProPublica. In 2014, then-U.S. Attorney General Eric Holder called for the U.S. Sentencing Commission to study the use of algorithms in courts, concerned that the scores may be a source of bias. At the same time, the Justice Department expressed concern about the use of factors such as education levels, employment history, family circumstances, and demographic information. While the Sentencing Commission has studied the recidivism risk for federal offenders, it has not commissioned a study of risk scores.

Criminal justice algorithms are used across the country, but the specific tools differ by state or even county. In addition, because such algorithms are proprietary, they are not subject to state or federal open government laws. Jurisdictions have generally used one of three main systems, or adapted their own version of each: Correctional Offender Management Profiling for Alternative Sanctions (COMPAS), Public Safety Assessment (PSA) and Level of Service Inventory Revised (LSI-R). COMPAS, created by the for-profit company Northpointe, assesses variables under five main areas: criminal involvement, relationships/lifestyles, personality/attitudes, family, and social exclusion. The LSI-R, developed by Canadian company Multi-Health Systems, also pulls information from a wide set of factors, ranging from criminal history to personality patterns. Using a narrower set of parameters, the Public Safety Assessment, developed by the Laura and John Arnold Foundation, only considers variables that relate to a defendant’s age and criminal history.

A 2016 investigation by ProPublica tested the COMPAS system adopted by the state of Florida using the same benchmark as COMPAS: a likelihood of re-offending in two years. ProPublica found that the formula was particularly likely to flag black defendants as future criminals, labeling them as such at almost twice the rate as white defendants. In addition, white defendants were labeled as low risk more often than black defendants. But the investigators also found that the scores were unreliable in forecasting violent crime: only 20 percent of the people predicted to commit violent crimes actually went on to do so. When considering a full range of crimes, including misdemeanors, the correlation was found to be higher but not exceedingly accurate. Sixty-one percent of the candidates deemed liked to reoffend were arrested for any subsequent crimes within two years. According to ProPublica, some miscalculations of risk stemmed from inaccurate inputs (for example, failing to include one’s prison record from another state), while other results were attributed to the way factors are weighed (for example, someone who has molested a child may be categorized as low risk because he has a job, while someone who was convicted of public intoxication would be considered high risk because he is homeless).

Prediction Fails Differently for Black Defendants

WHITE AFRICAN-AMERICAN Labeled Higher Risk, But Didn't Re-Offend 23.5% 44.9% Labeled Lower Risk, Yet Didn't Re-Offend 47.7% 28.0%

Source: ProPublica

COMPAS is one of the most widely used algorithms in the country. Northpointe published a validation study of the system in 2009, but it did not include an assessment of predictive accuracy by ethnicity. It referenced a study that had evaluated COMPAS’ accuracy by ethnicity, which reported weaker accuracy for African-American men, but claimed the small sample size rendered it unreliable. Northpointe has not shared how its calculations are made but has stated that the basis of its future crime formula includes factors such as education levels and whether a defendant has a job. Many jurisdictions have adopted COMPAS, and other "risk assessment" methods generally, without first testing their validity.

Defense advocates are calling for more transparent methods because they are unable to challenge the validity of the results at sentencing hearings. Professor Danielle Citron argues that because the public has no opportunity to identify problems with troubled systems, it cannot present those complaints to government officials. In turn, government actors are unable to influence policy.

Over the last several years, prominent groups such as Pretrial Justice Institute (PJI) strongly advocated for the introduction of these tools and the Public Safety Assessment among many other risk assessments has been adopted in nearly every state, up from only a handful in the beginning of the decade. However, in February 2020, PJI reversed this position, specifically stating that they "now see that pretrial risk assessment tools, designed to predict an individual’s appearance in court without a new arrest, can no longer be a part of our solution for building equitable pretrial justice systems." One week later, Public Safety Assessment, a widely used risk assessment developed by the Laura and John Arnold Foundation, released a statement in which they clarify that "implementing an assessment cannot and will not result in the pretrial justice goals we seek to achieve."

Unanswered Questions

How much should judges rely on these algorithms?

Some argue that "risk assessment" should be limited to probation hearings or pre-trial release and not used in sentencing at all. In fact, the COMPAS system specifically was created, not for use in sentencing, but rather to aid probation officers in determining which defendants would succeed in specific treatment types. Others caution against overreliance in sentencing, which may be a natural tendency when given data that appears to be based on concrete, reliable calculations. At least one judge has set aside an agreed upon plea deal and given a defendant more jail time because of the defendant’s high "risk assessment" score. Judge Babler in Wisconsin overturned the plea deal that had been agreed on by the prosecution and defense (one year in county jail with follow-up supervision) and imposed two years in state prison and three years of supervision after he saw that the defendant had high risk for future violent crime and a medium risk for general recidivism.

Professor Sonja Starr argues that "risk assessment" results represent who has the the highest risk of recidivism, but the question most relevant to judges is whose risk of recidivism will be reduced the most by incarceration. Therefore, the consideration of risk in the abstract in sentencing may not advance the goal of deterrence. In addition, the recidivism rate produces a risk score within a particular period (ex: 2 years) from the time of release or from the sentence of probation. It does not convey information about the amount of crime one may commit if given one length of incarceration over another (ex: 2 years rather than 5 years). Starr rejects the assumption that incarcerating those who are considered riskiest will prevent more crimes as an oversimplification, because this view does not consider the effect of crimes undertaken by other individuals, nor that incarceration may make someone who is already risky even more dangerous by increasing their risk of recidivism.

What factors should be considered?

Factors such as demographic, socioeconomic background and family characteristics may serve as a proxy for race. Because these variables are highly correlated with race, they will likely have a racially disparate impact. In addition, because of de facto segregation and the higher crime rate in urban neighborhoods, including neighborhood crime rates will further compound the inequality. As a public policy matter, Starr argues that "risk assessment" factors based on demographic, socioeconomic background and family characteristics may not serve its intended goal of reducing incarceration because mass incarceration already has a racially disparate impact, which means that "risk assessment" algorithms produce higher risk estimates, all other things equal, for subgroups whose members are already disproportionately incarcerated.

Another arguable flaw with the input questions is that the consideration of employment history and financial resources result in extra, unequal punishment of the poor which may violate the equal protection clause, based on the precedent case Bearden v. Georgia in which the Supreme Court rejected Georgia’s argument that poverty was a recidivism factor that justified additional incapacitation. To prevent perpetuating a racially disparate impact, advocates are arguing for a narrow range of questions, such as strictly based on past or present criminal behavior, or an individual assessment of a defendant’s conduct, mental states, and attitudes.

Do proprietary algorithms violate a defendant's right to due process?

Since the specific formula to determine "risk assessment" is proprietary, defendants are unable to challenge the validity of the results. This may violate a defendant’s right to due process. The use of COMPAS in sentencing has been challenged in Loomis v. Wisconsin as a violation of the defendant’s right to due process on two grounds. The first part of the challenge is that the proprietary nature of COMPAS prevents defendants from challenging the COMPAS assessment’s scientific validity. The state does not dispute that the process is secret and non-transparent, but contends that Loomis fails to show that a COMPAS assessment contains or produces inaccurate information. Second, Loomis argues that the algorithmic is unconstitutional because of the way it considers gender. COMPAS has a separate scale for women and men, so all other factors being equal, assessment results will differ based on gender alone.

Risk Assessment Tools State-By-State

The following table is based on a survey of state practices by EPIC performed September 2019, updated February 2020 with Mississippi FOI Documents. The functions vary between pre-trial, sentencing, prison management, and parole. Most of these tools, including their existence, are largely opaque and change often.

* Bill enacted Mar. 2019: requires transparency, notification, and explainability.

**There is no official compendium of Risk Assessments used by states.

Abbreviations Key:



DV - Domestic Violence

COMPAS - Correctional Offender Management Profiling for Alternative Sanctions

PSA - Pretrial Safety Assessment

PTRA - Pretrial Risk Assessment Instrument

CPAT - Colorado Pretrial Assessment Tool

PRRS - Pretrial Release Risk Scale

DELPAT - Delaware Pretrial Assessment Tool

ODARA - Ontario Domestic Assault Risk Assessment Tool

MNPAT - Minnesota Pretrial Assessment Tool

ORAS - Ohio Risk Assessment System

LS/CMI - Level of Service/Case Management Inventory

PRAISTX - Pretrial Risk Assessment Information System

VPRAI - Virginia Pretrial Risk Assessment Instrument

IRAS - Indiana Risk Assessment System

EPIC's Interest

EPIC has a strong interest in open government. Public disclosure of this information improves government oversight and accountability. It also helps ensure that the public is fully informed about the activities of government. EPIC routinely files lawsuits to force disclose of agency records that impact critical privacy interests.

EPIC also has a strong interest in algorithmic transparency. Secrecy of the algorithms used to determine guilt or innocence undermines faith in the criminal justice system. In support of algorithmic transparency, EPIC submitted FOIA requests to six states to obtain the source code of "TrueAllele," a software product used in DNA forensic analysis. According to news reports, law enforcement officials use TrueAllele test results to establish guilt, but individuals accused of crimes are denied access to the source code that produces the results.

The Universal Guidelines for Artificial Intelligence, grounded in a human rights framework, set forth twelve principles that are intended to guide the design, development, and deployment of AI, and frameworks for policy and legislation. Broadly, the guidelines address the rights and obligations of: 1) fairness, accountability, and transparency; 2) autonomy and human determination; 3) data accuracy and quality; 4) safety and security; and 5) minimization of scope. These principles can also guide the use of algorithms in the pre-trial risk context.

The very first principle, transparency, is seldom required with pre-trial risk assessments. One of the primary criticisms of these risk assessment tools is that they are proprietary tools, developed by technology companies that refuse to disclose the inner workings of the “black box.” Trade secret and other IP protection defenses have been given to demands of the underlying logic of the systems. In March 2019, Idaho became the first state to enact a law specifically promoting transparency, accountability, and explainability in pre-trial risk assessment tools. Pre-trial risk assessments are algorithms that help inform sentencing and bail decisions for defendants. The law prevents a trade secrecy or IP defense, requires public availability of ‘all documents, data, records, and information used by the builder to build or validate the pretrial risk assessment tool,’ and empowers defendants to review all calculations and data that went into their risk score.

EPIC FOI Documents

EPIC obtained the following documents concerning criminal justice algorithms through state freedom of information requests.

District of Columbia

Georgia

Idaho

Missouri

Mississippi

Nebraska

New Hampshire

Vermont

Wisconsin

Resources

Legislation and Regulations

Government Studies

Nathan James, Risk and Needs Assessment in the Criminal Justice System, Congressional Research Service (Oct. 15, 2015)

Notable Cases

Academic Articles

Other resources

Books

Documents and Reports

Pretrial Justice Institute (PJI) No longer recommend Risk Assessment Tools, February 7, 2020.

Sample COMPAS risk assessment questionnaire - Wisconsin's 137 question risk assessment

Sample sentencing reports judges receive that includes risk assessment results

Jennifer Elek, Roger Warren & Pamela Casey, Using Risk and Needs Assessment Information at Sentencing: Observations from Ten Jurisdictions, National Center for State Courts’ Center for Sentencing Initiatives

Tara Agense & Shelley Curran, The California Risk Assessment Pilot Project: The Use of Risk and Needs Assessment Information in Adult Felony Probation Sentencing and Violation Proceedings, Judicial Council of California Operations and Programs Division Criminal Justice Services (December 2015)

News