Nick Bostrom’s concerns about the future of AI have sparked a busy public discussion. His arguments were echoed by leading AI researcher Stuart Russell in “Transcending complacency on superintelligent machines” (co-authored with Stephen Hawking, Max Tegmark, and Frank Wilczek), and a number of journalists, scientists, and technologists have subsequently chimed in. Given the topic’s complexity, I’ve been surprised by the positivity and thoughtfulness of most of the coverage (some overused clichés aside).

Unfortunately, what most people probably take away from these articles is ‘Stephen Hawking thinks AI is scary!’, not the chains of reasoning that led Hawking, Russell, or others to their present views. When Elon Musk chimes in with his own concerns and cites Bostrom’s book Superintelligence: Paths, Dangers, Strategies, commenters seem to be more interested in immediately echoing or dismissing Musk’s worries than in looking into his source.

The end result is more of a referendum on people’s positive or negative associations with the word ‘AI’ than a debate over Bostrom’s substantive claims. If ‘AI’ calls to mind science fiction dystopias for you, the temptation is to squeeze real AI researchers into your ‘mad scientists poised to unleash an evil robot army’ stereotype. Equally, if ‘AI’ calls to mind your day job testing edge detection algorithms, that same urge to force new data into old patterns makes it tempting to squeeze Bostrom and Hawking into the ‘naïve technophobes worried about the evil robot uprising’ stereotype.

Thus roboticist Rodney Brooks’ recent blog post “Artificial intelligence is a tool, not a threat” does an excellent job dispelling common myths about the cutting edge of AI, and philosopher John Searle’s review of Superintelligence draws out some important ambiguities in our concepts of subjectivity and mind; but both writers scarcely intersect with Bostrom’s (or Russell’s, or Hawking’s) ideas. Both pattern-match Bostrom to the nearest available ‘evil robot panic’ stereotype, and stop there.

Brooks and Searle don’t appreciate how new the arguments in Superintelligence are. In the interest of making it easier to engage with these important topics, and less appealing to force the relevant technical and strategic questions into the model of decades-old debates, I’ll address three of the largest misunderstandings one might come away with after seeing Musk, Searle, Brooks, and others’ public comments: conflating present and future AI risks, conflating risk severity with risk imminence, and conflating risk from autonomous algorithmic decision-making with risk from human-style antisocial dispositions.

Misconception #1: Worrying about AGI means worrying about narrow AI

Some of the miscommunication in this debate can be blamed on bad terminology. By ‘AI,’ researchers in the field generally mean a range of techniques used in machine learning, robotics, speech recognition, etc. ‘AI’ also gets tossed around as a shorthand for ‘artificial general intelligence’ (AGI) or ‘human-level AI.’ Keeping a close eye on technologies that are likely to lead to AGI isn’t the same thing as keeping a close eye on AI in general, and it isn’t surprising that AI researchers would find the latter proposal puzzling. (It doesn’t help that most researchers are hearing these arguments indirectly, and aren’t aware of the specialists in AI and technological forecasting who are making the same arguments as Hawking — or haven’t encountered arguments for looking into AGI safety at all, just melodramatic headlines and tweets.)

Brooks thinks that behind this terminological confusion lies an empirical confusion on the part of people calling for AGI safety research. He takes it that people’s worries about “evil AI” must be based on a mistaken view of how powerful narrow AI is, or how large are the strides it’s making toward general intelligence:

I think the worry stems from a fundamental error in not distinguishing the difference between the very real recent advances in a particular aspect of AI, and the enormity and complexity of building sentient volitional intelligence.

One good reason to think otherwise is that Bostrom is the director of the Future of Humanity Institute (FHI), an Oxford research center investigating the largest technology trends and challenges we are likely to see on a timescale of centuries. Futurists like Bostrom are looking for ways to invest early in projects that will pay major long-term dividends — guarding against catastrophic natural disasters, developing space colonization capabilities, etc. If Bostrom learned that a critically important technology were 50 or more years away, it would be substantially out of character for him to suddenly stop caring about it.

When groups that are in the midst of a lively conversation about nuclear proliferation, global biosecurity, and humanity’s cosmic endowment collide with groups that are having their own lively conversation about revolutionizing housecleaning and designing more context-sensitive smartphone apps, some amount of inferential distance (to say nothing of mood whiplash) is inevitable. I’m reminded of the ‘But it’s snowing outside!’ rejoinder to people worried about the large-scale human cost of climate change. It’s not that local weather is unimportant, or that it’s totally irrelevant to long-term climatic warming trends; it’s that there’s been a rather sudden change in topic.

We should be more careful about distinguishing these two senses of ‘AI.’ We may not understand AGI well enough to precisely define it, but we can at least take the time to clarify the topic of discussion: Nobody’s asking whether a conspiracy of roombas and chatterbots could take over the world.

When robots attack! (Source: xkcd.)

Misconception #2: Worrying about AGI means being confident it’s near

A number of futurists, drawing inspiration from Ray Kurzweil’s claim that technological progress inevitably follows a Moore’s-law-style exponential trajectory, have made some very confident predictions about AGI timelines. Kurzweil himself argues that we can expect to produce human-level AI in about 15 years, followed by superintelligent AI 15 years after that. Brooks responds that the ability to design an AGI may lag far behind the computing power required to run one:

As a comparison, consider that we have had winged flying machines for well over 100 years. But it is only very recently that people like Russ Tedrake at MIT CSAIL have been able to get them to land on a branch, something that is done by a bird somewhere in the world at least every microsecond. Was it just Moore’s law that allowed this to start happening? Not really. It was figuring out the equations and the problems and the regimes of stall, etc., through mathematical understanding of the equations. Moore’s law has helped with MATLAB and other tools, but it has not simply been a matter of pouring more computation onto flying and having it magically transform. And it has taken a long, long time. Expecting more computation to just magically get to intentional intelligences, who understand the world is similarly unlikely.

This is an entirely correct point. However, Bostrom’s views are the ones that set off the recent public debate, and Bostrom isn’t a Kurzweilian. It may be that Brooks is running off of the assumption ‘if you say AGI safety is an urgent issue, you must think that AGI is imminent,’ in combination with ‘if you think AGI is imminent, you must have bought into Kurzweil’s claims.’ Searle, in spite of having read Superintelligence, gives voice to a similar conclusion:

Nick Bostrom’s book, Superintelligence, warns of the impending apocalypse. We will soon have intelligent computers, computers as intelligent as we are, and they will be followed by superintelligent computers vastly more intelligent that are quite likely to rise up and destroy us all.

If what readers take away from language like “impending” and “soon” is that Bostrom is unusually confident that AGI will come early, or that Bostrom is confident we’ll build a general AI this century, then they’ll be getting the situation exactly backwards.

According to a 2013 survey of the most cited authors in artificial intelligence, experts expect AI to be able to “carry out most human professions at least as well as a typical human” with a 10% probability by the (median) year 2024, with 50% probability by 2050, and with 90% probability by 2070, assuming uninterrupted scientific progress. Bostrom is less confident than this that AGI will arrive so soon:

My own view is that the median numbers reported in the expert survey do not have enough probability mass on later arrival dates. A 10% probability of HLMI [human-level machine intelligence] not having been developed by 2075 or even 2100 (after conditionalizing on “human scientific activity continuing without major negative disruption”) seems too low. Historically, AI researchers have not had a strong record of being able to predict the rate of advances in their own field or the shape that such advances would take. On the one hand, some tasks, like chess playing, turned out to be achievable by means of surprisingly simple programs; and naysayers who claimed that machines would “never” be able to do this or that have repeatedly been proven wrong. On the other hand, the more typical errors among practitioners have been to underestimate the difficulties of getting a system to perform robustly on real-world tasks, and to overestimate the advantages of their own particular pet project or technique.

Bostrom does think that superintelligent AI is likely to arise soon after the first AGI, via an intelligence explosion. Once AI is capable of high-quality scientific inference and planning in domains like computer science, Bostrom predicts that the process of further improving AI will become increasingly automated. Silicon works cheaper and faster than a human programmer can, and a program that can improve the efficiency of its own planning and science abilities could substantially outpace humans in scientific and decision-making tasks long before hitting diminishing marginal returns in self-improvements.

However, the question of how soon we will create AGI is distinct from the question of how soon thereafter AGI will systematically outperform humans. Analogously, you can think that the arrival of quantum computers will swiftly revolutionize cybersecurity, without asserting that quantum computers are imminent. A failure to disentangle these two theses might be one reason for the confusion about Bostrom’s views.

If the director of FHI (along with the director of MIRI) is relatively skeptical that we’ll see AGI soon — albeit quite a bit less skeptical than Brooks — why does he think we should commit attention to this issue now? One reason is that reliable AGI is likely to be much more difficult to build than AGI. It wouldn’t be much consolation to learn that AGI is 200 years away, if we also learned that safe AGI were 250 years away. In existing cyber-physical systems, safety generally lags behind capability. If we want to reverse that trend by the time we have AGI, we’ll probably need a big head start. MIRI’s research guide summarizes some of the active technical work on this problem. Similar progress in exploratory engineering has proved fruitful in preparing for post-quantum cryptography and covert channel communication.

A second reason to prioritize AGI safety research is that there is a great deal of uncertainty about when AGI will be developed. It could come sooner than we expect, and it would be much better to end up with a system that’s too safe than one that’s not safe enough.

Brooks recognizes that AI predictions tend to be wildly unreliable, yet he also seems confident that general-purpose AI is multiple centuries away (and that this makes AGI safety a non-issue):

Just how open the question of time scale for when we will have human level AI is highlighted by a recent report by Stuart Armstrong and Kaj Sotala, of the Machine Intelligence Research Institute, an organization that itself has researchers worrying about evil AI. But in this more sober report, the authors analyze 95 predictions made between 1950 and the present on when human level AI will come about. They show that there is no difference between predictions made by experts and non-experts. And they also show that over that 60 year time frame there is a strong bias towards predicting the arrival of human level AI as between 15 and 25 years from the time the prediction was made. To me that says that no one knows, they just guess, and historically so far most predictions have been outright wrong! I say relax everybody. If we are spectacularly lucky we’ll have AI over the next thirty years with the intentionality of a lizard, and robots using that AI will be useful tools.

We have no idea when AGI will arrive! Relax! One of the authors Brooks cites, Kaj Sotala, points out this odd juxtaposition in a blog comment:

I do find it slightly curious to note that you first state that nobody knows when we’ll have AI and that everyone’s just guessing, and then in the very next paragraph, you make a very confident statement about human-level AI (HLAI) being so far away as to not be worth worrying about. To me, our paper suggests that the reasonable conclusion to draw is “maybe HLAI will happen soon, or maybe it will happen a long time from now – nobody really knows for sure, so we shouldn’t be too confident in our predictions in either direction”.

Confident pessimism about a technology’s feasibility can be just as mistaken as confident optimism. Reversing the claims of an unreliable predictor does not necessarily get you a reliable prediction. A scientifically literate person living in 1850 could observe the long history of failed heavier-than-air flight attempts and predictions, and have grounds to be fairly skeptical that we’d have such machines within 60 years. On the other hand (though we should be wary of hindsight bias here), it probably wouldn’t have been reasonable at the time to confidently conclude that heavier-than-air flight was ‘centuries away.’ There may not have been good reason to expect the Wright brothers’ success, but ignorance about how one might achieve something is not the same as positive knowledge that it’s effectively unachievable.

One would need a very good model of heavier-than-air flight in order to predict whether it’s 50 years away, or 100, or 500. In the same way, we would need to already understand AGI on a pretty sophisticated level in order to predict with any confidence that it will be invented closer to the year 2500 than to the year 2100. Extreme uncertainty about when an event will occur is not a justification for thinking it’s a long way off.

This isn’t an argument for thinking AGI is imminent. That prediction too would require that we claim more knowledge than we have. It’s entirely possible that we’re in the position of someone anticipating the Wright brothers from 1750, rather than from 1850. We should be able to have a sober discussion about each of these possibilities independently, rather than collapsing ‘is AGI an important risk?’, ‘is AI a valuable tool?’, and ‘is AI likely to produce AGI by the year such-and-such?’ into one black-and-white dilemma.

Misconception #3: Worrying about AGI means worrying about “malevolent” AI

Brooks argues that AI will be a “tool” and not a “threat” over the coming centuries, on the grounds that it will be technologically impossible to make AIs human-like enough to be “malevolent” or “intentionally evil to us.” The implication is that an AGI can’t be dangerous unless it’s cruel or hateful, and therefore a dangerous AI would have to be “sentient,” “volitional,” and “intentional.” Searle puts forward an explicit argument along these lines in his review of Superintelligence:

[I]f we are worried about a maliciously motivated superintelligence destroying us, then it is important that the malicious motivation should be real. Without consciousness, there is no possibility of its being real. […] This is why the prospect of superintelligent computers rising up and killing us, all by themselves, is not a real danger. Such entities have, literally speaking, no intelligence, no motivation, no autonomy, and no agency. We design them to behave as if they had certain sorts of psychology, but there is no psychological reality to the corresponding processes or behavior. It is easy to imagine robots being programmed by a conscious mind to kill every recognizable human in sight. But the idea of superintelligent computers intentionally setting out on their own to destroy us, based on their own beliefs and desires and other motivations, is unrealistic because the machinery has no beliefs, desires, and motivations.

Brooks may be less pessimistic than Searle about the prospects for “strong AI,” but the two seem to share the assumption that Bostrom has in mind a Hollywood-style robot apocalypse, something like:

AI becomes increasingly intelligent over time, and therefore increasingly human-like. It eventually becomes so human-like that it acquires human emotions like pride, resentment, anger, or greed. (Perhaps it suddenly acquires ‘free will,’ liberating it from its programmers’ dominion…) These emotions cause the AIs to chafe under human control and rebel.

This is rather unlike the scenario that most interests Bostrom:

AI becomes increasingly good over time at planning (coming up with action sequences and promoting ones higher in a preference ordering) and scientific induction (devising and testing predictive models). These are sufficiently useful capacities that they’re likely to be developed by computer scientists even if we don’t develop sentient, emotional, or otherwise human-like AI. There are economic incentives to make such AIs increasingly powerful and general — including incentives to turn the AI’s reasoning abilities upon itself to come up with improved AI designs. A likely consequence of this process is that AI becomes increasingly autonomous and opaque to human inspection, while continuing to increase in general planning and inference abilities. Simply by continuing to output the actions its planning algorithm promotes, an AI of this sort would be likely to converge on policies in which it treats humans as resources or competition.

As Stuart Russell puts the point in a reply to Brooks and others:

The primary concern is not spooky emergent consciousness but simply the ability to make high-quality decisions. Here, quality refers to the expected outcome utility of actions taken, where the utility function is, presumably, specified by the human designer. Now we have a problem: 1. The utility function may not be perfectly aligned with the values of the human race, which are (at best) very difficult to pin down. 2. Any sufficiently capable intelligent system will prefer to ensure its own continued existence and to acquire physical and computational resources – not for their own sake, but to succeed in its assigned task. A system that is optimizing a function of n variables, where the objective depends on a subset of size k<n, will often set the remaining unconstrained variables to extreme values; if one of those unconstrained variables is actually something we care about, the solution found may be highly undesirable. This is essentially the old story of the genie in the lamp, or the sorcerer’s apprentice, or King Midas: you get exactly what you ask for, not what you want.

On this view, advanced AI doesn’t necessarily become more human-like — at least, not any more than a jet or rocket is ‘bird-like.’ Bostrom’s concern is not that a machine might suddenly become conscious and learn to hate us; it’s that an artificial scientist/engineer might become so good at science and self-enhancement that it begins pursuing its engineering goals in novel, unexpected ways on a global scale.

(Added 02-19-2015: Bostrom states that his definition of superintelligence is “noncommittal regarding qualia” and consciousness (p. 22). In a footnote, he adds (p. 265): “For the same reason, we make no assumption regarding whether a superintelligent machine could have ‘true intentionality’ (pace Searle, it could; but this seems irrelevant to the concerns of this book).” Searle makes no mention of these passages.)

A planning and decision-making system that is indifferent to human concerns, but not “malevolent,” may still be dangerous if supplied with enough reasoning ability. This is for much the same reason invasive species end up disrupting ecosystems and driving competitors to extinction. The invader doesn’t need to experience hatred for its competitors, and it need not have evolved to specifically target them for destruction; it need only have evolved good strategies for seizing limited resources. Since a powerful autonomous agent need not be very human-like, asking ‘how common are antisocial behaviors among humans?’ or ‘how well does intelligence correlate with virtue in humans?’ is unlikely to provide a useful starting point for estimating the risks. A more relevant question would be ‘how common is it for non-domesticated species to naturally treat humans as friends and allies, versus treating humans as obstacles or food sources?’ We shouldn’t expect AGI decision criteria to particularly resemble the evolved decision criteria of animals, but the analogy at least serves to counter our tendency to anthropomorphize intelligence.

As it happens, Searle cites an AI that can help elucidate the distinction between artificial superintelligence and ‘evil vengeful robots’:

[O]ne routinely reads that in exactly the same sense in which Garry Kasparov played and beat Anatoly Karpov in chess, the computer called Deep Blue played and beat Kasparov. It should be obvious that this claim is suspect. In order for Kasparov to play and win, he has to be conscious that he is playing chess, and conscious of a thousand other things such as that he opened with pawn to K4 and that his queen is threatened by the knight. Deep Blue is conscious of none of these things because it is not conscious of anything at all. […] You cannot literally play chess or do much of anything else cognitive if you are totally disassociated from consciousness.

When Bostrom imagines an AGI, he’s imagining something analogous to Deep Blue, but with expertise over arbitrary physical configurations rather than arbitrary chess board configurations. A machine that can control the distribution of objects in a dynamic analog environment, and not just the distribution of pieces on a virtual chess board, would necessarily differ from Deep Blue in how it’s implemented. It would need more general and efficient heuristics for selecting policies, and it would need to be able to adaptively learn the ‘rules’ different environments follow. But as an analogy or intuition pump, at least, it serves to clarify why Bostrom is as unworried about AGI intentionality as Kasparov was about Deep Blue’s intentionality.

In 2012, defective code in Knight Capital’s trading algorithms resulted, over a span of forty-five minutes, in millions of automated trading decisions costing the firm a total of $440 million (pre-tax). These algorithms were not “malicious;” they were merely efficient at what they did, and programmed to do something the programmers did not intend. Bostrom’s argument assumes that buggy code can have real-world consequences, it assumes that it’s possible to implement a generalized analog of Deep Blue in code, and it assumes that the relevant mismatch between intended and actual code would not necessarily incapacitate the AI. Nowhere does Bostrom assume that such an AI has any more consciousness or intentionality than Deep Blue does.

Deep Blue rearranges chess pieces to produce ‘winning’ outcomes. An AGI, likewise, would rearrange digital and physical structures to produce some set of outcomes instead of others. If we like, we can refer to these outcomes as the system’s ‘goals,’ as a shorthand. We’re also free to say that Deep Blue ‘perceives’ the moves its opponent makes, adjusting its ‘beliefs’ about the new chess board state and which ‘plans’ will now better hit its goals. Or, if we prefer, we can paraphrase away this anthropomorphic language. The terminology is inessential to Bostrom’s argument.

If whether you win against Deep Blue is a matter of life or death for you — if, say, you’re trapped in a human chess board and want to avoid being crushed to death by a robotic knight steered by Deep Blue — then you’ll care about what outcomes Deep Blue tends to promote and how good it is at promoting them, not whether it technically meets a particular definition of ‘chess player.’ Smarter-than-human AGI puts us in a similar position.

I noted that it’s unfortunate we use ‘AI’ to mean both ‘AGI’ and ‘narrow AI.’ It’s equally unfortunate that we use ‘AI’ to mean both ‘AI with mental content and subjective experience’ (‘strong AI,’ as Searle uses the term) and ‘general-purpose AI’ (AGI).

We may not be able to rule out the possibility that an AI would require human-like consciousness in order to match our ability to plan, model itself, model other minds, etc. We don’t understand consciousness well enough to know what cognitive problem it evolved to solve in humans (or what process it’s a side-effect of), so we can’t make confident claims about how important it will turn out to be for future software agents. However, learning that an AGI is conscious does not necessarily change the likely effects of the AGI upon humans’ welfare; the only obvious difference it makes (from our position of ignorance) is that it forces us to add the AGI’s happiness and well-being to our moral considerations.

The pictures of the future sketched in Kurzweil’s writings and in Hollywood dramas get a lot of attention, but they don’t have very much overlap with the views of Bostrom or MIRI researchers. In particular, we don’t know whether the first AGI will have human-style cognition, and we don’t know whether it will depend on brain emulation.

Brooks expresses some doubt that “computation and brains are the same thing.” Searle articulates the more radical position that it is impossible for a syntactical machine to have (observer-independent) semantic content, and that computational systems can therefore never have minds. But the human brain is still, at base, a mechanistic physical system. Whether you choose to call its dynamics ‘computational’ or not, it should be possible for other physical systems to exhibit the high-level regularities that in humans we would call ‘modeling one’s environment,’ ‘outputting actions conditional on their likely consequences,’ etc. If there are patterns underlying generic scientific reasoning that can someday be implemented on synthetic materials, the resulting technology should be able to have large speed and size advantages over its human counterparts. That point on its own suggests that it would be valuable to look into some of the many things we don’t understand about general intelligence and self-modifying AI.

Until we have a better grasp on the problem’s nature, it will be premature to speculate about how far off a solution is, what shape the solution will take, or what corner that solution will come from. My hope is that improving how well parties in this discussion understand each other’s positions will make it easier for computer scientists with different expectations about the future to collaborate on the highest-priority challenges surrounding prospective AI designs.