Why We Need Accountable Algorithms

AI and machine learning algorithms are marketed as unbiased, objective tools. They are not. They are opaque mechanisms of bureaucracy and decisionmaking in which old-fashioned racist, sexist, and classist biases are hidden behind sophisticated technology, usually without a system of appeal. As their influence increases in society, we face a choice. Do we ignore their pernicious effects, or do we understand, regulate, and control the biases they exert? If we want them to represent transparent fairness, freedom, and consistency in an efficient, cost-saving manner, we must hold them accountable somehow.

What is an algorithm? For my purposes I simply mean a system trained on historical data and optimized to some definition of success. We even use informal algorithms, defined this way, in our own heads. The dinners I make for my family on a daily basis require the data of the ingredients in my kitchen and the amount of time I have to cook. The way I assess whether a meal is “successful” is to see, afterwards, if my kids ate their vegetables. Note that I curate the data - I actually don’t include certain foods, like ramen noodles or sprinkles, into my ingredients list. I also have a different definition of success than my kids would have. Over time, the succession of meals optimized to my definition of success varies wildly from the one my kids would have used. There are two obvious ways that I have inserted my agenda into my algorithm. Indeed any algorithm builder does this – they curate their data, and they define success and likewise measure the cost of failure.

In general, people are intimidated by algorithms and don’t question them the way they should. Thousands of teachers have been told “it’s math, you wouldn’t understand it,” regarding administrators’ statistical value-added model for teachers, even though teachers’ tenure or job status depend on the results. Criminal defendants likewise have no recourse to understand or protest against their recidivism risk scores, used by the court to decide whether a criminal defendant’s profile matches someone who can be expected to return to prison after leaving, even though a higher score can mean a longer prison term. The people targeted by these algorithms – usually in the form of scoring systems – have very little power, and typically no recourse to understand or interrogate their scores.

Algorithms don’t make things fair. They embed historical practices and patterns. When the medical school at St. George’s Hospital in London automated their application process, they noted that it came out both sexist and xenophobic. That surprised them, since they’d expect a computer wouldn’t be discriminatory. But it happened, of course, because the historical data they fed to the algorithm to train it was, itself, sexist and xenophobic. The algorithm of course picked up on this pattern and propagated it.

In general, there is unintentional, implicit bias in all kinds of ways in our culture and our processes. We can expect biases to be automated when we feed historical data into training these processes – even when that historical data is very recent. Until we consistently rid our society and ourselves of implicit bias, we cannot trust algorithms to be clear of it. Said another way: all algorithms are likely racist, sexist, and xenophobic unless they’ve been treated not to be. Why assume any characteristic of a complicated mathematical metric or measure is inherently a specific setting, after all, especially when we know it tends not to be? That’s like assuming the IQ of a given person is 100, even though you’re in a place overrun by known geniuses.

Of course, the above argument was made under the assumption that bias and discrimination is wrong. That’s a moral choice, which brings us to our next point.

There is no such thing as a morally neutral algorithm. We embed ethics into our objective functions. A great example of how we do this – even without thinking about it – comes from an algorithm that predicts child abuse. I was talking to a group in California that is attempting to build a predictive algorithm that will help them decide whether a given call from a teacher, doctor, neighbor, or family member about a child in danger is sufficiently ominous to send out a caseworker. They don’t have enough resources to send out a caseworker for every call, so they have to make judgment calls. How should they train a predictive algorithm to improve their service?

The first answer, as it is with most data problems, is to make sure they have clean and relevant data. An important aspect of this is to know, when someone calls, whether they are referring to a child that is already in the system. Spelling errors or vague information could hamper this investigation, as could a family that moves frequently or is homeless. Secondarily, what kind of information can and should one use about the family, assuming a child is positively identified? After all, this call usually amounts to suspicion, not conviction of child abuse, so there are privacy concerns. Moreover, the information is not equally distributed: there’s likely to be far more information in the system if the family is poor and minority, if it contains people who’ve been homeless or on welfare, or who have mental health problems or criminal records.

The second answer is to think about the objective function: what are you optimizing to? No algorithm is perfect, so although you’re always trying to build an algorithm that is as accurate as possible, you must also consider the errors. And in this case, that means you must balance false positives against false negatives.

Let’s work out the scenarios there: a false positive is when you suspect a family of abuse when there is none, or when it doesn’t rise to the level of severe abuse. Depending on the outcome, this could end in tragedy, if the child is taken away from their family, say. A false negative, on the other hand, is where you decide the child is probably safe and you don’t investigate, but the child actually is abused. It’s also a tragedy.

What’s the trade-off? How many false positives would you give for one false negative? It’s a tough question to answer but it’s absolutely required to train this algorithm.

Every algorithm has, at its heart, an ethical dilemma. Some of them are much less difficult to resolve, but they exist nonetheless. One of the most important goals of algorithmic accountability is that the data scientist should not make those ethical decisions for the sake of society, but rather should be a translator of ethical decisions into code. In other words, a data scientist should make transparent what the trade-offs are and even build tests, or ongoing monitors of their algorithms, to make sure the decisions are consistently upheld by the algorithm. Specifically, such a monitor should keep track of errors, and the distribution of those errors over the population.

Here’s an example to illustrate how badly this can go wrong. Last year the Australian Department of Human Services introduced an automated “debt recovery system” that utilized a crude fraud detection algorithm which calculated the extent to which Australian citizens had been overpaid by the welfare system. The fraud detection program was flawed because it assumed a steady income throughout the year for all recipients, while indviduals’ actual monthly incomes often varied considerably. The number of such compliance letters jumped from 20,000 per year to 20,000 per month with the new system, and the department was flooded with complaints. This is a system that, I would argue, was not carefully vetted for its errors, especially its false positive rate. Rather, it was optimized for cost-savings with little regard to the human toll of errors.

I don’t wish to overstate the case against algorithms, or rather the case for accountable algorithms. Algorithms aren’t the only or the biggest problem we face as a society. They are simply the most recent set of tools for power. Power is old, after all. Powerful institutions and individuals have had tools before and they’ll have them again in the future, even if algorithms are held accountable and appealable by then. But, as a mathematician, it appalls me that these particular tools are being wielded as mathematical weapons.

Neither am I the arbiter of truth. I’m not saying what the embedded ethics of a given algorithm should be. But neither are you, nor should a lone Facebook engineer, unschooled in ethics, decide how the difference between information and propaganda is understood, and how data is therefore disseminated to the rest of us. Personally, I don’t know what a “qualified” candidate is in a given situation, but I know that the historical process of hiring people should probably be given a second look. And that means by a group of people who take the job seriously.

Here’s an important first step: let’s separate conversations around rules and morals from conversations around the technology of algorithms or AI. The first type of conversation is one that can include everyone, and the second is a technical discussion that should be informed by the first. If we left it to Silicon Valley technologists to decide on the new rules, we could end up with a machine deciding who gets organs based on the expected value of a human being to society, which in turn could be determined by who makes the most money on the stock market.

Next, let’s hold algorithms accountable. That means, if we decide an algorithm has the potential to act unethically or illegally, we monitor it from the get-go and resolve the problem if necessary. The tools to do this exist, although they’re not refined.

The biggest pushback I expect to get from this idea is that it’s expensive. That’s true. It’s expensive in that it simply costs money to add layers to an already complex process, and ongoing monitoring and fiddling with algorithms is definitely one or maybe multiple layers. Moreover, and more importantly, it’s expensive because on any reasonable definition it will generally cut down on profit to be nondiscriminatory. If you don’t know what I mean, check out this recent paper written by Moritz Hardt from Google and others, which examines the “cost of fairness” when you optimize to profit with a fairness constraint, under multiple definitions of fairness. They even have an section, starting on page 16, that works out the case of a FICO-type credit score optimized with various choices of fairness constraints, and they measure how much each one eats into profit as a function of the error tradeoff we mentioned earlier.

No one company will likely take on this challenge unless they need to, because none of them want to incur the real expense that ethical constraints must bring. A first mover in this space will be at a competitive disadvantage. That’s why we need rules, or laws, around algorithmic fairness, especially when they directly threaten profit margins. If everyone has to abide by the same standards, then the industry as a whole will be fair, and the profit will be distributed based on other competitive advantages like customer service or ease of use, as we want.

I don’t see a way forward in this space without a place for standards that amount to anti-discrimination and fairness laws. In fact, I encourage tech companies to come up with reasonable laws before government comes up with unreasonable ones.