We have updated our thinking on this subject since this page was published. For our most current content on this topic, see this blog post.

This is a writeup of a shallow investigation , a brief look at an area that we use to decide how to prioritize further research.

In a nutshell What is the problem? It seems plausible that some time this century, people will develop algorithmic systems capable of efficiently performing many or even all of the cognitive tasks that humans perform. These advances could lead to extreme positive developments, but could also potentially pose risks from intentional misuse or catastrophic accidents. For example, it seems possible that (i) the technology could be weaponized or used as a tool for social control, or (ii) someone might create an extremely powerful artificial intelligence agent with values misaligned with humanity’s interests. It also seems possible that progress along these directions could be surprisingly rapid, leaving society underprepared for the transition.

It seems plausible that some time this century, people will develop algorithmic systems capable of efficiently performing many or even all of the cognitive tasks that humans perform. These advances could lead to extreme positive developments, but could also potentially pose risks from intentional misuse or catastrophic accidents. For example, it seems possible that (i) the technology could be weaponized or used as a tool for social control, or (ii) someone might create an extremely powerful artificial intelligence agent with values misaligned with humanity’s interests. It also seems possible that progress along these directions could be surprisingly rapid, leaving society underprepared for the transition. What are possible interventions? A philanthropist could fund technical research aimed at ensuring the robustness, predictability, and goal-alignment of advanced artificial intelligence systems; research in law, ethics, economics, and policy related to advanced artificial intelligence; and/or education related to such research.

A philanthropist could fund technical research aimed at ensuring the robustness, predictability, and goal-alignment of advanced artificial intelligence systems; research in law, ethics, economics, and policy related to advanced artificial intelligence; and/or education related to such research. Who else is working on this? Elon Musk has donated $10 million to the Future of Life Institute for regranting to researchers focused on addressing these and other potential future risks from advanced artificial intelligence. In addition, a few relatively small nonprofit/academic institutes work on potential future risks from advanced artificial intelligence. The Machine Intelligence Research Institute and the Future of Humanity Institute each have an annual budget of about $1 million, and a couple of other new organizations work on these issues as well.



Published: August 2015

Background and process

We have been engaging in informal discussions around this topic for several years, and have done a significant amount of reading on it. The debates are in some cases complex; this write-up focuses on reporting the primary factors we’ve considered and the strongest sources we know of that bear on our views.

For readers highly interested in this topic, we would recommend the following as particular useful for getting up to speed:

Getting a basic sense of what recent progress in AI has looked like, and what it might look like going forward. Unfortunately, we know of no single source for doing this, as our main source has been conversations with AI researchers. However, we have extensive conversation notes forthcoming for one such conversation that can provide some background.

Reading Superintelligence by Nick Bostrom, which gives a detailed discussion of some potential risks, with particular attention to a particular kind of accident risk. (Those seeking a shorter and more accessible introduction might prefer two highly informal posts by blogger Tim Urban, who largely attempts to summarize Superintelligence as a layperson. We do not necessarily agree with all of the content of his posts, but believe they offer a good introduction to the subject.) We don’t endorse all of the arguments of this book, but it is the most detailed argument for a particular potential risk associated with artificial intelligence, and we believe it would be instructive to review both the book and some of the response to it (see immediately below).

Reviewing an Edge.org online discussion responding to Superintelligence. . We feel that the arguments made in this forum are broadly representative of the arguments we’ve seen against the idea that risks from artificial intelligence are important.

For our part, our understanding of the matter is informed by the following:

When we began our investigation of global catastrophic risks, we believed that this topic was worth looking into due to the high potential stakes and our impression that it was getting little attention from philanthropists. We were already broadly familiar with the arguments that this issue is important, and we initially focused on trying to determine why these arguments hadn’t seemed to get much engagement from mainstream computer scientists. However, we paused our investigations (other than keeping up on major new materials such as Bostrom 2014 and some of the critical response to it) when we learned about an upcoming conference specifically on this topic, which we attended. Since then, we have reviewed further relevant materials such as FLI’s open letter and research priorities document.

What is the problem?

Timeline

According to many machine learning researchers, there has been substantial progress in machine learning in recent years, and the field could potentially have an enormous impact on the world. It appears possible that the coming decades will see substantial progress in artificial intelligence, potentially even to the point where machines come to outperform humans in many or nearly all intellectual domains, though it is difficult or impossible to make confident forecasts in this area. For example, recent surveys of researchers in artificial intelligence found that many researchers assigned a substantial probability to the creation of machine intelligences “that can carry out most human professions at least as well as a typical human” in 10-40 years. Following Muller and Bostrom, who organized the survey, we will refer to such machine intelligences as “high-level machine intelligences” (HLMI).

More information about timelines for the development of advanced AI capabilities is available here.





Loss of control of advanced agents

In addition to significant benefits, creating advanced artificial intelligence could carry significant dangers. One potential danger that has received particular attention—and has been the subject of particularly detailed arguments—is the one discussed by Prof. Nick Bostrom in his 2014 book Superintelligence. Prof. Bostrom has argued that the transition from high-level machine intelligence to AI much more intelligent than humans could potentially happen very quickly, and could result in the creation of an extremely powerful agent whose objectives are misaligned with human interests. This scenario, he argues, could potentially lead to the extinction of humanity.

Prof. Bostrom has offered the two following highly simplified scenarios illustrating potential risks:

Riemann hypothesis catastrophe. An AI, given the final goal of evaluating the Riemann hypothesis, pursues this goal by transforming the Solar System into “computronium” (physical resources arranged in a way that is optimized for computation)— including the atoms in the bodies of whomever once cared about the answer.

Paperclip AI. An AI, designed to manage production in a factory, is given the final goal of maximizing the manufacture of paperclips, and proceeds by converting first the Earth and then increasingly large chunks of the observable universe into paperclips.

Stuart Russell (a Professor of Computer Science at UC Berkeley and co-author of a leading textbook on artificial intelligence) has expressed similar concerns. While it is unlikely that these specific scenarios would occur, they are illustrative of a general potential failure mode: an advanced agent with a seemingly innocuous, limited goal could seek out a vast quantity of physical resources—including resources crucial for humans—in order to fulfill that goal as effectively as possible. To be clear, the risk Bostrom and Russell are describing is not that an extremely intelligent agent would misunderstand what humans would want it to do and then do something else. Instead, the risk is that intensely pursuing the precise (but flawed) goal that the agent is programmed to pursue could pose large risks.

The above argument is difficult to briefly summarize and highly speculative, but we think it highlights plausible scenarios that seem worth considering and preparing for. Some considerations that make this argument seem relatively plausible to us, and/or point to a more general case for seeing AI as a potential source of major global catastrophic risks:

Over a relatively short geological timescale, humans have come to have enormous impacts on the biosphere, often leaving the welfare of other species dependent on the objectives and decisions of humans. It seems plausible that the intellectual advantages humans have over other animals have been crucial in allowing humans to build up the scientific and technological capabilities that have made this possible. If advanced artificial intelligence agents become significantly more powerful than humans, it seems possible that they could become the dominant force in the biosphere, leaving humans’ welfare dependent on their objectives and decisions. As with the interaction between humans and other species in the natural environment, these problems could be the result of competition for resources rather than malice.

In comparison with other evolutionary changes, there was relatively little time between our hominid ancestors and the evolution of humans. There was therefore relatively little time for evolutionary pressure to lead to improvements in human intelligence relative to the intelligence of our hominid ancestors, suggesting that the increases in intelligence may be small on some absolute scale. Yet it seems that these increases in intelligence have meant the difference between mammals with a limited impact on the biosphere and a species that has had massive impact. In turn, this makes it seem plausible that creating intelligent agents that are more intelligent than humans could have dramatic real-world consequences even if the difference in intelligence is small in an absolute sense.

Highly capable AI systems may learn from experience and run at a much faster serial processing speed than humans. This could mean that their capabilities change quickly and make them hard to manage with trial-and-error processes. This might pose novel safety challenges in very open-ended domains. Whereas it is possible to establish the safety of a bridge by relying on well-characterized engineering properties in a limited range of circumstances and tasks, it is unclear how to establish the safety of a highly capable AI agent that would operate in a wide variety of circumstances.

When tasks are delegated to opaque autonomous systems—as they were in the 2010 Flash Crash—there can be unanticipated negative consequences. Jacob Steinhardt, a PhD student in computer science at Stanford University and a scientific advisor to the Open Philanthropy Project, suggested that as such systems become increasingly complex in the long term, “humans may lose the ability to meaningfully understand or intervene in such systems, which could lead to a loss of sovereignty if autonomous systems are employed in executive-level functions (e.g. government, economy).”

It seems plausible that advances in artificial intelligence could eventually enable superhuman capabilities in areas like programming, strategic planning, social influence, cybersecurity, research and development, and other knowledge work. These capabilities could potentially allow an advanced artificial intelligence agent to increase its power, develop new technology, outsmart opposition, exploit existing infrastructure, or exert influence over humans.

Concerns regarding the loss of control of advanced artificial intelligence agents were included among many other issues in a research priorities document linked to in the open letter discussed above, which was signed by highly-credentialed machine learning researchers, scientists, and technology entrepreneurs. Prior to the release of this open letter, potential risks from advanced artificial intelligence received limited attention from the mainstream computer science community, apart from some discussions that we found unconvincing. We are uncertain about the extent to which the people who signed this open letter saw themselves as supporting the idea that loss of control of advanced artificial intelligence agents is a problem worth doing research to address. To the extent that they do see themselves as actively supporting more research on this topic, we see that as reason to take the problem more seriously. To the extent that they did not, we feel that signing the letter (without public comments or disclaimers beyond what we’ve seen) indicates a general lack of engagement with this question, which we would take as—in itself—a reason to err on the side of being concerned about and investing in preparation for the risk, as it would imply that some people in a strong position to be carefully examining the issue and communicating their views may be failing to do so.

Our understanding is that it is not clearly possible to create an advanced artificial intelligence agent that avoids all challenges of this sort. In particular, our impression is that existing machine learning frameworks have made much more progress on the task of acquiring knowledge than on the task of acquiring appropriate goals/values.

Peace, security, and privacy

It seems plausible to us that highly advanced artificial intelligence systems could potentially be weaponized or used for social control. For example:

In the shorter term, machine learning could potentially be used by governments to efficiently analyze vast amounts of data collected through surveillance.

Cyberattacks in particular—especially if combined with the trend toward the “Internet of Things”—could potentially pose military/terrorist risks in the future.

The capabilities described above—such as superhuman capabilities in areas like programming, strategic planning, social influence, cybersecurity, research and development, and other knowledge work—could be powerful tools in the hands of governments or other organizations. For example, an advanced AI system might significantly enhance or even automate the management and strategy of a country’s military operations, with strategic implications different from the possibilities associated with autonomous weapons. If one nation anticipates such advances on the part of another, it could potentially destabilize geopolitics, including nuclear deterrence relationships. Our scientific advisor Dario Amodei suggested to us that this may be one of the most understudied and serious risks of advanced AI, though also potentially among the most challenging to address.

Our understanding is that this class of scenarios has not been a major focus for the organizations that have been most active in this space, such as the Machine Intelligence Research Institute (MIRI) and the Future of Humanity Institute (FHI), and there seems to have been less analysis and debate regarding them, but risks of this kind seem potentially as important as the risks related to loss of control.

Other potential concerns

There are a number of other possible concerns related to advanced artificial intelligence that we have not examined closely, including social issues such as technological disemployment and the legal and moral standing of advanced artificial intelligence agents. We may investigate these and other possible issues more deeply in the future.

Uncertainty about these risks

We regard many aspects of these potential risks as highly uncertain. For example:

It seems highly uncertain when high-level machine intelligence might be developed.

Losing control of an advanced agent would seem to require an extremely broad-scope artificial intelligence, considering a wide space of possible actions and reasoning about a wide space of different domains. A “narrower” artificial intelligence might, for example, simply analyze scientific papers and propose further experiments, without having intelligence in other domains such as strategic planning, social influence, cybersecurity, etc. Narrower artificial intelligence might change the world significantly, to the point where the nature of the risks change dramatically from the current picture, before fully general artificial intelligence is ever developed.

Losing control of an advanced agent would also seem to require that advanced artificial intelligence will function as an agent: identifying actions, using a world model to estimate their likely consequences, using a scoring system (such as a utility function) to score actions as a function of their likely consequences, and selecting high- or highest-scoring actions. While it seems plausible that such agents will eventually be created, it also seems plausible that the creation of such agents could come after other artificial intelligence tools—which do not rely on an agent-based architecture—have been created. Elsewhere, Holden Karnofsky (co-founder of GiveWell) has argued that creating advanced non-agents before agents is plausible and could substantially change the strategic situation for those preparing for risks from advanced artificial intelligence.

It isn’t a given that superior intelligence, coupled with a problematic goal, would lead to domination of the biosphere. It’s possible (though it seems unlikely to us) that there are limited benefits to having substantially more intelligence than humans, and it’s possible that an artificial intelligence would maximize a problematic utility function primarily via degenerate behavior (e.g., hacking itself and manually setting its reward function to the maximum) rather than behaving in a way that could pose a global catastrophic risk.

It seems highly uncertain to us how quickly advanced artificial intelligence will progress from subhuman to superhuman intelligence. For example, it took decades for chess algorithms to progress from being competitive with the top few tens of thousands of players to being better than any human.

At the same time, these risks seem plausible to us, and we believe the extreme uncertainty about the situation—when combined with plausibility and extremely large potential stakes—favors preparing for potential risks.

We have made fairly extensive attempts to look for people making sophisticated arguments that the risks aren’t worth preparing for (which is distinct from saying that they won’t necessarily materialize), including reaching out to senior computer scientists working in AI-relevant fields (not all notes are public, but we provide the ones that are) and attending a conference specifically on the topic. We feel that the Edge.org online discussion responding to Superintelligence is broadly representative of the arguments we’ve seen against the idea that risks from artificial intelligence are important, and we find those arguments largely unconvincing. We invite interested readers to review those arguments in light of the reasoning laid out on this page, and draw their own conclusions about whether the former provide strong counter-considerations to the latter. We agree with Stuart Russell’s assessment that many of these critiques do not engage the most compelling arguments (e.g. by discussing scenarios involving conscious AI systems driven by negative emotions instead of scenarios where an advanced AI system causes harm by faithfully pursuing a badly specified objective). For a more comprehensive discussion of these and other critiques, see a collection of objections and replies created by Luke Muehlhauser, the former Executive Director of MIRI. Luke is a GiveWell research analyst, but he did not produce this collection as part of his work for us. We agree with much of Luke’s analysis, but we have not closely examined it and do not necessarily agree with all of it.

What are possible interventions?

Potential research agendas we are aware of

Many prominent researchers in machine learning and other fields recently signed an open letter recommending “expanded research aimed at ensuring that increasingly capable AI systems are robust and beneficial,” and listing many possible areas of research for this purpose. The Future of Life Institute recently issued a request for proposals on this topic, listing possible research topics including:

Computer Science: Verification: how to prove that a system satisfies certain desired formal properties. (“Did I build the system right?”) Validity: how to ensure that a system that meets its formal requirements does not have unwanted behaviors and consequences. (“Did I build the right system?”) Security: how to prevent intentional manipulation by unauthorized parties. Control: how to enable meaningful human control over an AI system after it begins to operate.

Law and ethics: How should the law handle liability for autonomous systems? Must some autonomous systems remain under meaningful human control? Should some categories of autonomous weapons be banned? Machine ethics: How should an autonomous vehicle trade off, say, a small probability of injury to a human against the near-certainty of a large material cost? Should such trade-offs be the subject of national standards? To what extent can/should privacy be safeguarded as AI gets better at interpreting the data obtained from surveillance cameras, phone lines, emails, shopping habits, etc.?

Economics: Labor market forecasting Labor market policy How can a low-employment society flourish?

Education and outreach: Summer/winter schools on AI and its relation to society, targeted at AI graduate students and postdocs Non-technical mini-schools/symposia on AI targeted at journalists, policymakers, philanthropists and other opinion leaders.



FLI also requested proposals for centers focused an AI policy, which could address questions such as:

What is the space of AI policies worth studying? Possible dimensions include implementation level (global, national, organizational, etc.), strictness (mandatory regulations, industry guidelines, etc.) and type (policies/monitoring focused on software, hardware, projects, individuals, etc.)

Which criteria should be used to determine the merits of a policy? Candidates include verifiability of compliance, enforceability, ability to reduce risk, ability to avoid stifling desirable technology development, adoptability, and ability to adapt over time to changing circumstances to prevent intentional manipulation by unauthorized parties.

Which policies are best when evaluated against these criteria of merit? Addressing this question (which is anticipated to involve the lion’s share of the proposed work) would include detailed forecasting of how AI development will unfold under different policy options.

This agenda is very broad, and open to multiple possible interpretations.

Research agendas have also been proposed by the Machine Intelligence Research Institute (MIRI) and Stanford One Hundred Year Study on Artificial Intelligence (AI100). MIRI’s research tends to involve more mathematics, formal logic, and formal philosophy than much work in machine learning.



Some specific research areas highlighted by our scientific advisors Dario Amodei and Jacob Steinhardt include:

Improving the ability of algorithms to learn values, goal systems, and utility functions, rather than requiring them to be hand-coded. Work on inverse reinforcement learning and weakly supervised learning could potentially contribute to this goal. Improving the calibration of machine learning systems, i.e., their ability to accurately distinguish between predictions that are highly likely to be right vs. predictions that are based on potentially confusing data and could be dramatically wrong. Making decisions/conclusions made by machine learning systems easier for humans to understand. Making the performance of machine learning systems more robust to changes in context. Improving the user interfaces of machine learning systems.

Sustained progress in these areas could potentially reduce risks from unintended consequences—including loss of control—of future artificial intelligence systems.

Is it possible to make progress in this area today?

It seems hard to know in advance whether work on the problems described here will ultimately reduce risks posed by advanced artificial intelligence. At this point, we feel the case comes down to the following:

Currently, work in this field receives very little attention from researchers dedicated to addressing the issues we have described (see “Who else is working on this?”), and very little of this attention has come from researchers with substantial expertise in machine learning.

However, as mentioned above, many researchers in machine learning have recommended expanded research in this field. Moreover, FLI has received over 300 grant applications, requesting a total of nearly $100 million for research in this area. .

It’s intuitively plausible to us (and to our main advisors on the topic at this time, Dario Amodei and Jacob Steinhardt) that success on some items on the above research agendas could result in decreased risk.

Because the largest potential risks are probably still at least couple of decades away, a substantial risk of working in this area is that, regardless of what we do today, the most important work will be done by others when the risks become more imminent and comprehensible, making early efforts to prepare for the problem redundant or comparably inefficient. At the same time, it seems possible that some risks could come on a faster timeline. For example, many researchers in Bostrom’s survey described above assigned a 10% subjective probability to the creation of machine intelligences “that can carry out most human professions at least as well as a typical human” within 10 years. We have not vetted these judgments and believe it would be challenging to do so. We are highly uncertain about how much weight to put on the specific details of these judgments, but they suggest to us that very powerful artificial intelligence systems could exist relatively soon. Moreover, it may be important to have a mature safety-oriented research effort underway years or longer before advanced artificial intelligence (including advanced narrow artificial intelligence) is created, and nurturing that research effort could be a long-term project. Alternatively, even if advanced AI will not be created for decades, it’s possible that building up and shaping the field could have consequences decades later. So it seems possible that work today could potentially increase overall levels of preparation for advanced artificial intelligence.

Finally, much of the research relevant to long-term problems may overlap with short-term problems, such as the role of artificial intelligence in surveillance, autonomous weapons systems, and unemployment. Even if work done in this field today does not affect very long-term outcomes with artificial intelligence, it could potentially affect these issues in the shorter term.

Could supporting this field lead to unwarranted or premature regulation?

Our opinion is that the potential risks and policy options in this field are currently poorly understood, and advocating for regulation would be premature. A potential risk of working in this field is that it could cause unwarranted or premature regulation to occur, which could be counterproductive. While supporting work in this space could potentially have that result, we would guess that working in this field would be more likely to reduce the risk of premature or unwarranted regulation for the following reasons:

We would guess that more thoughtful attention to policy options would reduce the risk of unwarranted regulation, and make regulation more likely to occur only if it turns out to be needed.

The field may eventually be regulated regardless of whether funders pay additional attention to it, and additional attention to this set of issues could potentially make the regulation more likely to be thoughtful and effective.

We would guess that technical research (in contrast with social science, policy, law, and ethics research) on these issues would be particularly unlikely to increase regulation. While such work could potentially draw attention to potential safety issues and thereby make regulation more likely, it seems more plausible that if computer science researchers were perceived to pay greater attention to the relevant potential risks, this would decrease the perceived need for regulation.

Who else is working on this?

Funders

In 2015, Elon Musk announced a $10 million donation to support “a global research program aimed at keeping AI beneficial to humanity.” The program is being administered by the Future of Life Institute, a non-profit research institute in Boston led by MIT professor Max Tegmark. FLI issued a first call for proposals from researchers at the beginning of the year; we have a forthcoming write-up that further discusses this work. The sort of research they are funding is described above (see “What are the possible interventions?).

Organizations working in this space

A few small non-profit/academic institutes work on risks from artificial intelligence, including:

Organization Mission Revenue or budget figure Cambridge Center for the Study of Existential Risk “CSER is a multidisciplinary research centre dedicated to the study and mitigation of risks that could lead to human extinction.” Not available, new organization Future of Humanity Institute “The Future of Humanity Institute is a leading research centre looking at big-picture questions for human civilization. The last few centuries have seen tremendous change, and this century might transform the human condition in even more fundamental ways. Using the tools of mathematics, philosophy, and science, we explore the risks and opportunities that will arise from technological change, weigh ethical dilemmas, and evaluate global priorities. Our goal is to clarify the choices that will shape humanity’s long-term future.” About $1 million annual budget for 2013 Future of Life Institute “We are a volunteer-run research and outreach organization working to mitigate existential risks facing humanity. We are currently focusing on potential risks from the development of human-level artificial intelligence.” Not available, new organization Machine Intelligence Research Institute “We do foundational mathematical research to ensure smarter-than-human artificial intelligence has a positive impact.” $1,237,557 in revenue for 2014 One Hundred Year Study on Artificial Intelligence (AI100) “Stanford University has invited leading thinkers from several institutions to begin a 100-year effort to study and anticipate how the effects of artificial intelligence will ripple through every aspect of how people work, live and play.” Not available, new organization

CSER, FHI, and FLI work on existential risks to humanity in general, but all are significantly interested in risks from artificial intelligence.

Questions for further investigation

Amongst other topics, our further research on this cause might address:

Is it possible to get a better sense of how imminent advanced artificial intelligence is likely to be and the specifics of what risks it might pose?

What kinds of technical research are most important for reducing the risk of unexpected/undesirable outcomes from progress in artificial intelligence? Who are the best people to do this research?

What could be done—especially in terms of policy research or advocacy—to reduce risks from the weaponization/misuse of artificial intelligence?

Could a philanthropist help relevant fields develop by supporting PhD, postdoctoral, and/or fellowship programs? What would be the best form for such efforts to take?

To what extent could approaches and funding models for other fields—such as international peace and security or nuclear weapons policy—successfully be adapted to the risks posed by artificial intelligence?

What is the comparative size of the risk from intentional misuse of artificial intelligence (e.g. through weaponization) vs. loss of control of an advanced artificial intelligence agent with misaligned values?

Sources