Why we urgently need to measure AI’s societal impacts

By Kate Crawford and Meredith Whittaker

How will artificial intelligence systems change the way we live? This is a tough question: on one hand, AI tools are producing compelling advances in complex tasks, with dramatic improvements in energy consumption, audio processing, and leukemia detection. There is extraordinary potential to do much more in the future. On the other hand, AI systems are already making problematic judgements that are producing significant social, cultural, and economic impacts in people’s everyday lives.

AI and decision-support systems are embedded in a wide array of social institutions, from influencing who is released from jail to shaping the news we see. For example, Facebook’s automated content editing system recently censored the Pulitzer-prize winning image of a nine-year old girl fleeing napalm bombs during the Vietnam War. The girl is naked; to an image processing algorithm, this might appear as a simple violation of the policy against child nudity. But to human eyes, Nick Ut’s photograph, “The Terror of War”, means much more: it is an iconic portrait of the indiscriminate horror of conflict, and it has an assured place in the history of photography and international politics. The removal of the image caused an international outcry before Facebook backed down and restored the image. “What they do by removing such images, no matter what good intentions, is to redact our shared history,” said the Prime Minister of Norway, Erna Solberg.

It’s easy to forget that these high-profile instances are actually the easy cases. As Tarleton Gillespie has observed, hundreds of content reviews are occurring with Facebook images thousand of times per day, and rarely is there a Pulitzer prize to help determine lasting significance. Some of these reviews include human teams, and some do not. In this case, there is also considerable ambiguity about where the automated process ended and the human review began: which is part of the problem. And Facebook is just one player in complex ecology of algorithmically-supplemented determinations with little external monitoring to see how decisions are made or what the effects might be.

The ‘Terror of War’ case, then, is the tip of the iceberg: a rare visible instance that points to a much larger mass of unseen automated and semi-automated decisions. The concern is that most of these ‘weak AI’ systems are making decisions that don’t garner such attention. They are embedded at the back-end of systems, working at the seams of multiple data sets, with no consumer-facing interface. Their operations are mainly unknown, unseen, and with impacts that take enormous effort to detect.

Sometimes AI techniques get it right, and sometimes they get it wrong. Only rarely will those errors be seen by the public: like the Vietnam war photograph, or when a AI ‘beauty contest’ held this month was called out for being racist for selecting white women as the winners. We can dismiss this latter case as a problem of training data — they simply need a more diverse selection of faces to train their algorithm with, and now that 600,000 people have sent in their selfies, they certainly have better means to do so. But while a beauty contest might seem like a bad joke, or just a really good trick to get people to give up their photos to build a large training data set, it points to a much bigger set of problems. AI and decision-support systems are reaching into everyday life: determining who will be on a predictive policing ‘heat list’, who will be hired or promoted, which students will be recruited to universities, or seeking to predict at birth who will become a criminal by the age of 18. So the stakes are high.

For example, the few studies that have been done into the use of AI and algorithmic decision-support systems in core social domains have produced troubling results. A recent RAND study showed that Chicago’s predictive policing ‘heat list’ — a list of people determined to be at high-risk of involvement with gun violence — was ineffective at predicting who will be involved in violent crime. However, it did lead to the increased harassment of those on the list. Similarly, a ProPublica exposé showed criminal risk-assessment software produced results that were biased against black defendants. To ensure people’s rights and liberties are upheld, we will need validation, auditing, and assessment of these systems to ensure basic fairness. Without it, we risk incorrect classifications, biased data, and faulty models amplifying injustice rather than redressing it.

Turing said in 1947 that if a machine is expected to be infallible, it cannot also be intelligent. What concerns us is that these fallible automated systems are being rapidly rolled out into the complex nervous system of society. These issues are front of mind for us, as we recently chaired AI Now, a White House symposium dedicated to the social and economic impacts of artificial intelligence in the next 10 years. AI Now also included an experts’ workshop, where leaders from academia, civil society, industry, and government discussed challenges in four thematic areas: social inequality, ethics, labor, and health — places where AI is already raising pressing questions.

The insights and diverse perspectives at AI Now were deeply informative, but they also revealed an uncomfortable truth: there are no agreed-upon methods to assess the human effects and longitudinal impacts of AI as it is applied across social systems. This knowledge gap is widening as the use of AI is proliferating, which heightens the risk of serious unintended consequences.

The core issue here isn’t that AI is worse than the existing human-led processes that serve to make predictions and assign rankings. Indeed, there’s much hope that AI can be used to provide more objective assessments than humans, reducing bias and leading to better outcomes. The key concern is that AI systems are being integrated into key social institutions, even though their accuracy, and their social and economic effects, have not been rigorously studied or validated.

There needs to be a strong research field that measures and assesses the social and economic effects of current AI systems, in order to strengthen AI’s positive impacts and mitigate its risks. By measuring the impacts of these technologies, we can strengthen the design and development of AI, assist public and private actors in ensuring their systems are reliable and accountable, and reduce the possibility of errors. By building an empirical understanding of how AI functions on the ground, we can establish evidence-led models for responsible and ethical deployment, and ensure the healthy growth of the AI field.

If the social impacts of artificial intelligence are hard to see, it is critical to find rigorous ways to make them more visible and accountable. We need new tools to allow us to know how and when automated decisions are materially affecting our lives — and, if necessary, to contest them.