By Jennifer T. Chayes, Distinguished Scientist and Managing Director, Microsoft Research, and ACM Fellow

Machine learning (ML) is a subfield of computer science in which algorithms learn by recognizing patterns in data. For example, our personal assistants, like Cortana, Siri, and Alexa, learn how to recognize what we are saying and how best to respond to our questions by using lots of data from interactions with many millions of people.

As computers become more “intelligent,” some data scientists have been puzzled as they’ve observed their algorithms being sexist or racist. This shouldn’t be surprising, these algorithms were trained with social data that reflect society’s biases, and algorithms amplify these biases to improve their performance metrics.

For example, if one naively trains an ML algorithm to filter resumes and find the most qualified candidates for certain jobs based on data on prior hires, even if the algorithm is explicitly instructed to ignore “protected attributes” like race or gender, the results can turn out to be race- or gender-biased. It turns out that race and gender are correlated with other “unprotected” information like names, which the naive algorithm can use. In hiring, people are known to do the same—they are not told the gender of the applicant, but they recognize a female name and don’t interview her since most of their previous hires are male.

In general, with careful algorithm design, computers can be fairer than typical human decision-makers, despite the biased training data. Just as we teach our children that anyone has the potential to have any job despite whom they see working in those jobs, we can teach intelligent algorithms how to disregard these discriminatory biases in their training data. Fortunately, as computers become smarter it is also easier to teach them. They understand what race and gender are, and the social data can also be used to automatically uncover and remove biases.

Current systems sometimes have significant biases. When Harvard Professor Latanya Sweeney put her name into a search engine, she was delivered an ad saying “Latanya Sweeney, Arrested?” and offered a background check for a fee. The result of that background check was that Dr. Sweeney had no arrest record, as is the case for most distinguished scientists. This ad is obviously deeply unfair and discriminatory to Dr. Sweeney. If potential employers put Dr. Sweeney’s name into a search engine, they might write her off immediately upon seeing the ad. Moreover, Dr. Sweeney showed that searching on people whose first names indicated they were more likely to be black, like Latanya, was much more likely to produce this “Arrested? ad” than searching on racially neutral names.

The good news is that we have many computer scientists who care deeply about the fairness of ML algorithms, and have developed methods to make them less biased than humans. A few years ago, a group of researchers at Microsoft Research and Boston University uncovered gender discrimination inherent in certain linguistic tools used in many search engines. When used to complete the analogy “man is to computer programmer as woman is to ___,” this tool produced the answer “homemaker.” Our team debiased this tool so that it delivered gender neutral results, making it less biased than humans.

A group of researchers at Microsoft Research and Harvard recently designed an intelligent algorithm which looks directly at “protected attributes” like race or gender at an intermediate stage, and produces decisions that are sometimes less biased than human judgements. Consider a hypothetical hiring decision for a new management position in my organization. Our naive hiring algorithm learns characteristics associated with our good past managers, and recommends applicants with those characteristics. Let’s say it finds that having a break in employment history before we hire someone is negatively correlated with being a good manager. Since most of the managers in my data are male, what this probably means is that men with breaks in employment history turned out to be worse in our management jobs.

Now let’s think about women. It could be that most women who take a few years off are raising children and, in the process, learning how to juggle a lot of competing priorities—something which makes them better managers when they return to the workforce. But the naive algorithm doesn’t see that in the data since it’s overwhelmed by the preponderance of data on men. Our researchers showed that if they apply the naive algorithm separately for different protected groups, it is often less biased in its decision-making. In this case, it would result in not penalizing women who had a break in employment history and were seeking management jobs. Using gender in hiring decisions is not allowed by current law, but this result could inform future regulation.

Some people think debiasing algorithms is inherently impossible, but like self-driving cars which will inevitably get in accidents, the first step is to design systems that are safer or less biased than their human counterparts. The process of mathematically defining “fair” decision-making metrics also forces us to pin down tradeoffs between fairness and accuracy that must be faced and have sometimes been swept under the carpet by policy-makers. It makes us rethink what it really means to treat all groups equally—in some cases equal treatment may only be possible by learning different group-specific criteria.

There is an entirely new field emerging at the intersection of computer science, law, and ethics. It will not only lead to fairer algorithms, but also to algorithms which track accountability, and make clear which factors contributed to a decision. There’s much reason to be hopeful!