Reducing Long-Term Catastrophic Risks from Artificial Intelligence

(PDF Version Available)

In 1965, the eminent statistician I. J. Good proposed that artificial intelligence beyond some threshold level would snowball, creating a cascade of self-improvements: AIs would be smart enough to make themselves smarter, and, having made themselves smarter, would spot still further opportunities for improvement, leaving human abilities far behind.[3] Good called this process an “intelligence explosion,” while later authors have used the terms “technological singularity” or simply “the Singularity”.[10] [21]

The Machine Intelligence Research Institute aims to reduce the risk of a catastrophe, should such an event eventually occur. Our activities include research, education, and conferences. In this document, we provide a whirlwind introduction to the case for taking AI risks seriously, and suggest some strategies to reduce those risks.

What We’re (Not) About

The Machine Intelligence Research Institute is interested in the advent of smart, cross-domain, human-plus-equivalent, self-improving Artificial Intelligence. We do not forecast any particular time when such AI will be developed. We are interested in analyzing points of leverage for increasing the probability that the advent of AI turns out positive. We do not see ourselves as having the job of foretelling that it will go well or poorly – if the outcome were predetermined there would be no point in trying to intervene. We suspect that AI is primarily a software problem which will require new insight, not a hardware problem which will fall to Moore’s Law. We are interested in rational analyses which try to support each point of claimed detail, as opposed to storytelling in which many interesting details are invented but not separately supported.

Indifference, Not Malice

Anthropomorphic ideas of a “robot rebellion,” in which AIs spontaneously develop primate-like resentments of low tribal status, are the stuff of science fiction. The more plausible danger stems not from malice, but from the fact that human survival requires scarce resources: resources for which AIs may have other uses.[13][14] Superintelligent AIs with real-world traction, such as access to pervasive data networks and autonomous robotics, could radically alter their environment, e.g., by harnessing all available solar, chemical, and nuclear energy. If such AIs found uses for free energy that better furthered their goals than supporting human life, human survival would become unlikely.

Many AIs will converge toward being optimizing systems, in the sense that, after self-modification, they will act to maximize some goal.[1][13] For instance, AIs developed under evolutionary pressures would be selected for values that maximized reproductive fitness, and would prefer to allocate resources to reproduction rather than supporting humans.[1] Such unsafe AIs might actively mimic safe benevolence until they became powerful, since being destroyed would prevent them from working toward their goals. Thus, a broad range of AI designs may initially appear safe, but if developed to the point of a Singularity could cause human extinction in the course of optimizing the Earth for their goals.

An intelligence Explosion May Be Sudden

The pace of an intelligence explosion depends on two conflicting pressures: each improvement in AI technology increases the ability of AIs to research more improvements, while the depletion of low-hanging fruit makes subsequent improvements more difficult. The rate of improvement is hard to estimate, but several factors suggest it would be high. The predominant view in the AI field is that the bottleneck for powerful AI is software, rather than hardware, and continued rapid hardware progress is expected in coming decades.[4] If and when the software is developed, there may thus be a glut of hardware to run many copies of AIs, and to run them at high speeds, amplifying the effects of AI improvements.[8] As we have little reason to expect that human minds are ideally optimized for intelligence, as opposed to being the first intelligences sophisticated enough to produce technological civilization, there is likely to be further low-hanging fruit to pluck (after all, the AI would have been successfully created by a slower and smaller human research community). Given strong enough feedback, or sufficiently abundant hardware, the first AI with humanlike AI research abilities might be able to reach superintelligence rapidly; in particular, more rapidly than researchers and policy-makers can develop adequate safety measures.

Is Concern Premature?

The absence of a clear picture of how to build AI means that we cannot assign high confidence to the development of AI in the next several decades. It also makes it difficult to rule out unforeseen advances. Past underestimates of the AI challenge (perhaps most infamously, those made for the 1956 Dartmouth Conference)[12] do not guarantee that AI will never succeed, and we need to take into account both repeated discoveries that the problem is more difficult than expected, and incremental progress in the field. Advances in AI and machine learning algorithms,[17] increasing R&D expenditures by the technology industry, hardware advances that make computation-hungry algorithms feasible,[4] enormous datasets,[5] and insights from neuroscience give advantages that past researchers lacked. Given the size of the stakes and the uncertainty about AI timelines, it seems best to allow for the possibility of medium-term AI development in our safety strategies.

Friendly AI

Concern about the risks of future AI technology has led some commentators, such as Sun co-founder Bill Joy, to suggest the global regulation and restriction of such technologies.[9] However, appropriately designed AI could offer similarly enormous benefits. More specifically, human ingenuity is currently a bottleneck in making progress on many key challenges affecting our collective welfare: eradicating diseases, averting long-term nuclear risks, and living richer, more meaningful lives. Safe AI could help enormously in meeting each of these challenges. Further, the prospect of those benefits along with the competitive advantages from AI would make a restrictive global treaty very difficult to enforce.

SIAI’s primary approach to reducing AI risks has thus been to promote the development of AI with benevolent motivations which are reliably stable under self-improvement, what we call “Friendly AI”.[22]

To very quickly summarize some of the key ideas in Friendly AI:

We can’t make guarantees about the final outcome of an agent’s interaction with the environment, but we may be able to make guarantees about what the agent is trying to do, given its knowledge — we can’t determine that Deep Blue will win against Kasparov just by inspecting Deep Blue, but an inspection might reveal that Deep Blue searches the game tree for winning positions rather than losing ones. Since code executes on the almost perfectly deterministic environment of a computer chip, we may be able to make very strong guarantees about an agent’s motivations (including how that agent rewrites itself), even though we can’t logically prove the outcomes of environmental strategies. This is important, because if the agent fails on an environmental strategy, it can update its model of the world and try again; but during self-modification, the AI may need to implement a million code changes, one after the other, without any of them being catastrophic. If Gandhi doesn’t want to kill people, and someone offers Gandhi a pill that will alter his brain to make him want to kill people, and Gandhi knows this is what the pill does, then Gandhi will very likely refuse to take the pill. Most utility functions should be trivially stable under reflection — provided that the AI can correctly project the result of its own self-modifications. Thus, the problem of Friendly AI is not creating an extra conscience module that constrains the AI despite its preferences, but reaching into the enormous design space of possible minds and selecting an AI that prefers to be Friendly. Human terminal values are extremely complicated, although this complexity is not introspectively visible at a glance, for much the same reason that major progress in computer vision was once thought to be a summer’s work. Since we have no introspective access to the details of human values, the solution to this problem probably involves designing an AI to learn human values by looking at humans, asking questions, scanning human brains, etc., rather than an AI preprogrammed with a fixed set of imperatives that sounded like good ideas at the time. The explicit moral values of human civilization have changed over time, and we regard this change as progress, and extrapolate that progress may continue in the future. An AI programmed with the explicit values of 1800 might now be fighting to reestablish slavery. Static moral values are clearly undesirable, but most random changes to values will be even less desirable — every improvement is a change, but not every change is an improvement. Possible bootstrapping algorithms include “do what we would have told you to do if we knew everything you knew,” “do what we would’ve told you to do if we thought as fast as you did and could consider many more possible lines of moral argument,” and “do what we would tell you to do if we had your ability to reflect on and modify ourselves.” In moral philosophy, this notion of moral progress is known as reflective equilibrium.[15]

Seeding Research Programs

As we get closer to advanced AI, it will be easier to learn how to reduce risks effectively. The interventions to focus on today are those whose benefits will compound over time: lines of research that can guide other choices or that entail much incremental work. Some possibilities include:

Friendly AI: Theoretical computer scientists can investigate AI architectures that self-modify while retaining stable goals. Theoretical toy systems exist now: Gödel machines make provably optimal self-improvements given certain assumptions [19]. Decision theories are being proposed that aim to be stable under self-modification.[2] These models can be extended incrementally into less idealized contexts.

Stable brain emulations: One conjectured route to safe AI starts with human brain emulation. Neuroscientists can investigate the possibility of emulating the brains of individual humans with known motivations, while evolutionary theorists can investigate methods to prevent dangerous evolutionary dynamics and social scientists can investigate social or legal frameworks to channel the impact of emulations in positive directions.[18]

Models of AI risks: Researchers can build models of AI risks and of AI growth trajectories, using tools from game theory, evolutionary analysis, computer security, or economics.[1][6][8][14][22] If such analysis is done rigorously it can help to channel the efforts of scientists, graduate students, and funding agencies to the areas with the greatest potential benefits.

Institutional improvements: Major technological risks are ultimately navigated by society as a whole: success requires that society understand and respond to scientific evidence. Knowledge of the biases that distort human thinking around catastrophic risks,[23] improved methods for probabilistic forecasting,[16] or risk analysis,[11] and methods for identifying and aggregating expert opinions[7] can all improve our collective odds. So can methods for international cooperation around AI development, and for avoiding an “AI arms race” that might be won by the competitor most willing to trade off safety measures for speed.[20]

Our Aims

We aim to seed the above research programs. We are too small to carry out all the needed research ourselves, but we can get the ball rolling.

We have groundwork already. We have: (a) seed research about catastrophic AI risks and AI safety technologies; (b) human capital; and (c) programs that engage outside research talent, including our annual Singularity Summits and our Visiting Fellows program.

Going forward, we plan to continue our recent growth by scaling up our visiting fellows program, extending the Singularity Summits and similar academic networking, and writing further papers to seed the above research programs, in-house or with the best outside talent we can find. We welcome potential co-authors, Visiting Fellows, and other collaborators, as well as any suggestions or cost-benefit analyses on how to reduce catastrophic AI risk.

The Upside and Downside of Artificial Intelligence

Human intelligence is the most powerful known biological technology, with a discontinuous impact upon the planet compared to past organisms. But our place in history probably rests, not on our being the smartest intelligences that could exist, but rather, on being the first intelligences that did exist. We probably are to intelligence what the first replicator was to biology. The first single-stranded RNA capable of copying itself was nowhere near being an ultra-sophisticated replicator — but it still had an important place in history, due to being first.

The future of intelligence is — hopefully — very much greater than its past. The origin and shape of human intelligence may end up playing a critical role in the origin and shape of future civilizations on a much larger scale than one planet. And the origin and shape of the first self-improving Artificial Intelligences humanity builds, may have a similarly strong impact, for similar reasons. It is the values of future intelligence that will shape future civilization. What stands to be won or lost is the values of future intelligences, and thus the value of future civilization.

Recommended Reading

This has been a very quick introduction. For more information, please contact anna@intelligence.org, or see:

References