Across the globe, algorithms are quietly but increasingly being relied upon to make important decisions that impact our lives. This includes determining the number of hours of in-home medical care patients will receive, whether a child is so at risk that child protective services should investigate, if a teacher adds value to a classroom or should be fired, and whether or not someone should continue receiving welfare benefits.

The use of algorithmic decision-making is typically well-intentioned, but it can result in serious unintended consequences. In the hype of trying to figure out if and how they can use an algorithm, organizations often skip over one of the most important questions: will the introduction of the algorithm reduce or reinforce inequity in the system?

There are various factors that impact the analysis. Here are a few that all organizations need to consider to determine if implementing a system based on algorithmic decision-making is an appropriate and ethical solution to their problem:

Will this algorithm influence—or serve as the basis of—decisions with the potential to negatively impact people’s lives?

Before implementing a decision-making system that relies on an algorithm, an organization must assess the potential for the algorithm to impact people’s lives. This requires taking a close look at who the system could impact and what that would look like, and identifying the inequalities that already exist in the current system—all before ever automating anything. We should be using algorithms to improve human life and well-being, not to cause harm. Yet, as a result of bad proxies, bias built into the system, decision makers who don’t understand statistics and who overly trust machines, and many other challenges, algorithms will never give us “perfect” results. And given the inherent risk of inequitable outcomes, the greater the potential for a negative impact on people’s lives, the less appropriate it is to ask an algorithm to make that decision—especially without implementing sufficient safeguards.

In Indiana, for example, after an algorithm categorized incomplete welfare paperwork as “failure to cooperate,“ one million people were denied access to food stamps, health care, and cash benefits over the course of three years. Among them was Omega Young, who died on March 1, 2009 after she was unable to afford her medication; the day after she died, she won her wrongful termination appeal and all of her benefits were restored. Indiana’s system had woefully inadequate safeguards and appeals processes, but the the stakes of deciding whether someone should continue receiving Medicaid benefits will always be incredibly high—so high as to question whether an algorithm alone should ever be the answer.

Virginia Eubanks discusses the failed Indiana system in Automating Inequality, her book about how technology affects civil and human rights and economic equity. Eubanks explains that algorithms can provide “emotional distance” from difficult societal problems by allowing machines to make difficult policy decisions for us—so we don’t have to. But some decisions cannot, and should not, be delegated to machines. We must not use algorithms to avoid making difficult policy decisions or to shirk our responsibility to care for one another. In those contexts, an algorithm is not the answer. Math alone cannot solve deeply-rooted societal problems, and attempting to rely on it will only reinforce inequalities that already exist in the system.

Can the available data actually lead to a good outcome?

Algorithms rely on input data—and they need the right data in order to function as intended. Before implementing a decision-making system that relies on an algorithm, organizations need to drill down on the problem they are trying to solve and do some honest soul-searching about whether they have the data needed to address it.

Take, for example, the department of Children, Youth and Families (CYF) in Allegheny County, Pennsylvania, which has implemented an algorithm to assign children “threat scores” for each incident of potential child abuse reported to the agency and help case workers decide which reports to investigate—another case discussed in Eubanks’ book. The algorithm’s goal is a common one: to help a social services agency most effectively use limited resources to help the community they serve. To achieve their goal, the county sought to predict which children are likely to become victims of abuse, i.e., the “outcome variable.” But the county didn’t have enough data concerning child-maltreatment-related fatalities or near fatalities to create a statistically meaningful model, so it used two variables that it had a lot of data on—community re-referrals to the CYF hotline and placement in foster care within two years—as proxies for child mistreatment. That means the county’s algorithm predicts a child’s likelihood of re-referral and of placement in foster care, and uses those predictions to assign the child a maltreatment “threat score.”

The problem? These proxy variables are not good proxies for child abuse. For one, they are subjective. As Eubanks explains, the re-referral proxy includes a hidden bias: “anonymous reporters and mandated reporters report black and biracial families for abuse and neglect three and a half more often than they report white families"— sometimes even by angry neighbors, landlords, or family members making intentionally false reports as punishment or retribution. As she wrote in Automating Inequality, “Predictive modeling requires clear, unambiguous measures with lots of associated data in order to function accurately.” Those measures weren’t available in Allegheny County, yet CYF pushed ahead and implemented an algorithm anyway.

The result? An algorithm with limited accuracy. As Eubanks reports, in 2016, a year with 15,139 reports of abuse, the algorithm would have made 3,633 incorrect predictions. This equates to the unwarranted intrusion into and surveillance of the lives of thousands of poor, minority families.

Is the algorithm fair?

The lack of sufficient data may also render the application of an algorithm inherently unfair. Allegheny County, for example, didn’t have data on all of its families; its data had been collected only from families using public resources—i.e., low-income families. This resulted in an algorithm that targeted low-income families for scrutiny, and that potentially created feedback loops, making it difficult for families swept up into the system to ever completely escape the monitoring and surveillance it entails. This outcome offends basic notions of what it means to be fair. It certainly must not feel fair to Allegheny County families adversely impacted.

There are many measures of algorithmic fairness. Does the algorithm treat like groups similarly, or disparately? Is the system optimizing for fairness, for public safety, for equal treatment, or for the most efficient allocation of resources? Was there an opportunity for the community that will be impacted to participate in and influence decisions about how the algorithm would be designed, implemented, and used, including decisions about how fairness would be measured? Is there an opportunity for those adversely impacted to seek meaningful and expeditious review, before the algorithm has caused any undue harm?

Organizations should be transparent about the standard of fairness employed, and should engage the various stakeholders—including (and most importantly) the community that will be directly impacted—in the decision about what fairness measure to apply. If the algorithm doesn’t pass muster, it should not be the answer. And in cases where a system based on algorithmic decision-making is implemented, there should be a continuous review process to evaluate the outcomes and correct any disparate impacts.

How will the results (really) be used by humans?

Another variable organizations must consider is how the results will be used by humans. In Allegheny County, despite the fact that the algorithm’s “threat score” was supposed to serve as one of many factors for caseworkers to consider before deciding which families to investigation, Eubanks observed that “in practice, the algorithm seems to be training the intake workers.” Caseworker judgment had, historically, helped counteract the hidden bias within the referrals. When the algorithm came along and caseworkers started substituting their own judgment with that of the algorithm, they effectively relinquished their gatekeeping role and the system became more class and race biased as a result

Algorithmic-decision making is often touted for its superiority over human instinct. The tendency to view machines as objective and inherently trustworthy—even though they are not— is referred to as “automation bias.” There are of course many cognitive biases at play whenever we try to make a decision; automation bias adds an additional layer of complexity. Knowing that we as humans harbor this bias (and many others), when the result of an algorithm is intended to serve as only one factor underlying a decision, an organization must take care to create systems and practices that control for automation bias. This includes engineering the algorithm to provide a narrative report rather than a numerical score, and making sure that human decision makers receive basic training both in statistics and on the potential limits and shortcomings of the specific algorithmic systems they will be interacting with.

And in some circumstances, the mere possibility that a decision maker will be biased toward the algorithm’s answer is enough to counsel against its use. This includes, for example, in the context of predicting recidivism rates for the purpose of determining prison sentences. In Wisconsin, a court upheld the use of the COMPAS algorithm to predict a defendant’s recidivism rate on the ground that, at the end of the day, the judge was the one making the decision. But knowing what we do about the human instinct to trust machines, it is naïve to think that the judge’s ‘inherent distraction’ was not unduly influenced by the algorithm. One study on the impact of algorithmic risk assessments on judges in Kentucky found that algorithms only impacted judges’ decision making for a short time, after which they return to previous habits, but the impact may be different across various communities of judges, and adversely impacting even one person is a big deal given what’s at stake—lost liberty. Given the significance of sentencing decisions, and the serious issues with trying to predict recidivism in the first place (the system “essentially demonizes black offenders while simultaneously giving white criminals the benefit of the doubt”), use of algorithms in this context is inappropriate and unethical.

Will people affected by these decisions have any influence over the system?

Finally, algorithms should be built to serve the community that they will be impacting—and never solely to save time and resources at whatever cost. This requires that data scientists take into account the fears and concerns of the community impacted. But data scientists are often far removed from the communities in which their algorithms will be applied. As Cathy O’Neil, author of Weapons of Math Destruction, told Wired earlier this year, “We have a total disconnect between the people building the algorithms and the people who are actually affected by them.” Whenever this is the case, even the most well-intended system is doomed to have serious unintended side effects.

Any disconnect between the data scientists, the implementing organization, and the impacted community must be addressed before deploying an algorithmic system. O’Neil proposes that data scientists prepare an “ethical matrix” taking into account the concerns of the various stakeholders that may be impacted by the system, to help “lay out all of these competing implications, motivations and considerations and allows data scientists to consider the bigger impact of their designs.” The communities that will be impacted should also have the opportunity to evaluate, correct, and influence these systems.

***

As the Guardian has noted, “Bad intentions are not needed to make bad AI.” The same goes for any system based on algorithmic decision-making. Even the most well-intentioned systems can cause significant harm, especially if an organization doesn’t take a step back and consider whether it is ethical and appropriate to use algorithmic decision-making in the first place. These questions are just starting points, and they won’t guarantee equitable results, but they are questions that all organizations should be asking themselves before implementing a decision-making system that relies on an algorithm.