University of California, Berkeley professor Stuart Russell on the existential threat posed by super-intelligent tech — and how ‘explainability’ will be the key to building ‘good’ AI systems.

Stuart Russell is not afraid to ask the biggest — and perhaps the most troubling — question facing humanity. The Professor of Computer Science at the University of California, Berkeley, who is frequently dubbed ‘the godfather of artificial intelligence’, is prepared to look calmly and rationally into the future and wonder: What is the most important event likely to happen to the human race?

He mulls over a range of possibilities that include the end of human beings due to some form of catastrophe, humans living forever thanks to medical breakthroughs, the invention of faster-than-light travel allowing us to conquer the universe, or even a visit from a superior alien civilization. However, he believes, the one thing we should definitely prepare for — now — is the invention of machines that can equal or surpass human intelligence and capabilities.

This has led him to issue a stark warning. Unless everyone involved in the development and deployment of artificial intelligence (AI), from the scientists in the lab, to industry leaders, policy-makers and regulators, act soon to rethink how we approach this technology and manage our relationship with it, we could be heading for disaster — most probably within our lifetimes.

When such predictions issue from a global authority such as Russell, who has spent four decades leading ground-breaking research into his field and authored the standard university textbook on AI (Artificial Intelligence: A Modern Approach, with Peter Norvig), people in positions of authority and influence start to sit up and take notice.

King Midas problem

To make his point, Russell talks about King Midas, the mythical king of Phrygia, who desired that everything he touched would turn to gold. The ruler’s wish came true, and everything he came into contact with was transformed into the precious metal — including his friends, family, food and drink. He was unable to reverse this power, once it was granted to him, and he died miserably of starvation.

Midas’s problem was that he set a fixed objective that didn’t correctly align with what he actually hoped for. Russell sees a similar problem when it comes to our relationship with AI: that what we ask an algorithm to do might not actually lead to the outcome we really want. That may not be much of an issue right now, as algorithms aren’t yet all that smart, he argues (although he already sees early warning signs in the realm of social media — more of which later). But once machines reach and surpass human-level intelligence, the human race could blunder into running enormous risks without even realizing the potentially disastrous consequences.

“A framework only produces beneficial systems if you can be sure that the objective is neatly and correctly specified,” Russell explains. “But invariably we fail to do that. And what happens, particularly when you leave things out of the objective that can be influenced by the algorithm, is that the algorithm will manipulate those other variables to the maximum possible extent in order to achieve the best result for the objective it’s pursuing.”

That doesn’t mean he foresees the advent of super-intelligent, evil robots deliberately setting out to destroy mankind — in fact, Russell is quite dismissive of this kind of Terminator-style science fiction. Instead, his fear is that machines will become supremely competent rather than malicious.



He gives the example of tasking AI to fix climate change. “If you design a system and say the objective is to reduce carbon dioxide to pre-industrial levels, the system will, without breaking a sweat, get rid of all the humans because they’re the ones producing carbon dioxide,” he says. “It’s the first thing you think of. So you might say, ‘Well, you can’t kill people.’ So wish number two: reduce carbon dioxide and don’t kill anybody. Well, then it has a multi-decade, surreptitious social engineering campaign to convince people not to have children, and it gets rid of the human race that way. And wish number three… well, there’s no one left to have wish number three.”

“Exposure to social media can turn a middle-of-the-road sort of person into an unrecognizable extremist.”

That may sound far-fetched, but there are already early warnings of such miscalculations happening in the world right now, he argues, pointing to the example of social media. “Its algorithms optimize click-through but I don’t think the social platforms considered that there are other variables that the algorithms affect, namely human minds,” he says. “Human opinions, preferences and attitudes are affected by what they read and consume in video form, games and so on. So if you leave that out of the objective and you just optimise click-through, you end up manipulating human preferences. That’s what the algorithms do — they don’t just learn what people want; they manipulate what you want, so that you’re more predictable and they can feed you more of what will make you click.”



Although he concedes that several other factors have also had a bearing on this, he concludes that social media’s ‘dumb algorithms’ can therefore largely be blamed for the global rise of populism and extremism. “It’s well documented that exposure to social media over a certain period can easily turn a formerly middle-of-the-road sort of person into an unrecognizable extremist. And it’s not deliberate in the sense of the algorithm thinking, ‘I want to turn everyone into a fascist or an eco-terrorist.’ It’s just thinking, ‘I want to maximize click-through and I found by trial and error that if I keep telling people this kind of story I get more clicks.’”

Rethinking our relationship with AI



So, how do we fix the problem before it’s too late?



Russell’s central argument, as expounded in his highly-acclaimed new book, Human Compatible: AI and the Problem of Control, is that we need to establish another way to do AI — and urgently — otherwise we risk being unable to put the genie back in the bottle. His solution, somewhat counter-intuitively, is to ensure machines don’t actually know what their true objective is. He explains: “That means their fundamental constitution is to be of benefit to the human user or to human society. But they know that they don’t know the full definition of benefit — that is, they don’t know the objective. They have some partial information about it, but that’s all.”



This foundational shift in how we build and evaluate the effectiveness of AI, he explains, should lead to a positive outcome rather than a risky one. “If a machine comes up with a plan that does a good job of optimizing the things that it does know you want, but messes with things that it’s not sure whether you like to be messed with, then the natural solution will be for it to ask permission,” says Russell. “So, it comes up with that plan for reducing carbon dioxide which involves getting rid of all the people. But, crucially, it asks: ‘Is that OK?’ And you say, ‘No, find another plan. And, by the way, people like to be alive.’ And so the process gradually refines the system’s knowledge of human preferences until you get to a point where the system is confident enough that what it’s going to do will be beneficial to humans.”



A useful analogy, he says, is how restaurants operate. “The restaurant doesn’t assume that it knows perfectly what you want for dinner. That’s why they have a menu and that’s why they ask you, ‘Would you like fries with that?’ They would like to make the customer happy but they don’t know what the customer wants. So they ask them. But they’ll give you water without asking, because they’re fairly sure that everyone would like a glass of water with their dinner, and so on. So, it’s not that we’ve never seen examples of systems behaving this way.”

The value of ‘explainable AI’



The challenge, therefore, is to ensure that machines behave in a similar way. One solution to this, frequently proposed, is ‘explainable AI’ – that is, ensuring that the decisions made by an algorithm can be justified to humans in human terms. Russell is broadly in agreement with this approach, but argues that how we measure the performance of AI needs to change in order for it to happen effectively.



“I think ‘explainable AI’ is going to be part of building ‘good’ AI systems,” he says. “But I think there’s a general attitude of readjustment that needs to take place in the AI community.

Users have a role to play here in demanding explainability as a critical performance factor from AI vendors.



Because we have adopted this mindset of competitive benchmarking, we have come to associate ‘good’ AI with getting a good score on, for example, some speech-recognition testbed or whatever it might be — and anything else is an annoyance, right?” But maintaining a single-minded focus on such a restricted benchmark is, as he has already explained, likely to lead to undesirable outcomes.



To ensure disaster is avoided, therefore, users have a role to play here, he argues, by demanding explainability as a critical performance factor from AI vendors. “If someone says, ‘It can’t explain itself, so I’m sorry we can’t buy your software — that’s annoying [for the vendor]. Why should I sacrifice performance to get explainability? But explainability is — partly — performance, right? And ditto with behaving safely and not destroying the world. So that should be part of the performance characteristic that would be beneficial to humans.”

A new golden age



It may appear that Russell’s warnings put him in the camp of the anti-AI doomsayers. In reality, such Luddite leanings could not be further from the truth. Having devoted his career to working at the cutting edge of AI, he is at pains to point out that he is very much in favour of continuing humanity’s progress in this field.



What’s more, he argues, there is a compelling imperative to build super-human intelligence — as long as we do it in the right way — since the potential benefits can be enormous and widespread. “Our whole civilisation is built out of intelligence and hard work,” he says, “and so if we have access to much more of this, we can actually have a much better civilization. As a minimum, we can end a perennial problem for mankind, which is the scarcity of the means of life.”

“It should easily be possible for everyone on Earth to have a decent life because of AI.”

He points to a future where time-consuming and complex projects, such as construction, food production, travel and infrastructure, are performed ‘as a service’, efficiently and cheaply by machines for the benefit of humanity. “It should easily be possible for everyone on Earth to have a decent life because of AI,” he says. “Both the software and the hardware technology – the robots, the self-driving vehicles, and so on – will supply goods and services at costs that are negligible compared to what they are now.”



Which leads him to an optimistic, if somewhat caveated, conclusion: “Assuming we figure out some of the politics and economics so that groups of people don’t hoard the technology to themselves, this should lead us to a golden age.”



• Stuart Russell was a speaker at Digital Transformation EXPO Europe. His new book, Human Compatible: AI and the Problem of Control — recently described by Nobel Prize-winner Daniel Kahneman as “The most important book I have read in quite some time” — is out now.