1. What is MIRI’s mission?

Our mission statement is to “ensure that the creation of smarter-than-human artificial intelligence has a positive impact.” This is an ambitious goal, but we believe that some early progress is possible, and we believe that the goal’s importance and difficulty makes it prudent to begin work at an early date.

Our two research agendas, “Agent Foundations for Aligning Machine Intelligence with Human Interests” and “Value Alignment for Advanced Machine Learning Systems,” focus on three groups of technical problems:

highly reliable agent design — learning how to specify highly autonomous systems that reliably pursue some fixed goal;

value specification — supplying autonomous systems with the intended goals; and

error tolerance — making such systems robust to programmer error.

We publish new mathematical results, host workshops, attend conferences, and fund outside researchers who are interested in investigating these problems. We also host a blog and an online research forum.

2. Why think that AI can outperform humans?

Machines are already smarter than humans are at many specific tasks: performing calculations, playing chess, searching large databanks, detecting underwater mines, and more. However, human intelligence continues to dominate machine intelligence in generality.

A powerful chess computer is “narrow”: it can’t play other games. In contrast, humans have problem-solving abilities that allow us to adapt to new contexts and excel in many domains other than what the ancestral environment prepared us for.

In the absence of a formal definition of “intelligence” (and therefore of “artificial intelligence”), we can heuristically cite humans’ perceptual, inferential, and deliberative faculties (as opposed to, e.g., our physical strength or agility) and say that intelligence is “those kinds of things.” On this conception, intelligence is a bundle of distinct faculties — albeit a very important bundle that includes our capacity for science.

Our cognitive abilities stem from high-level patterns in our brains, and these patterns can be instantiated in silicon as well as carbon. This tells us that general AI is possible, though it doesn’t tell us how difficult it is. If intelligence is sufficiently difficult to understand, then we may arrive at machine intelligence by scanning and emulating human brains or by some trial-and-error process (like evolution), rather than by hand-coding a software agent.

If machines can achieve human equivalence in cognitive tasks, then it is very likely that they can eventually outperform humans. There is little reason to expect that biological evolution, with its lack of foresight and planning, would have hit upon the optimal algorithms for general intelligence (any more than it hit upon the optimal flying machine in birds). Beyond qualitative improvements in cognition, Nick Bostrom notes more straightforward advantages we could realize in digital minds, e.g.:

editability — “It is easier to experiment with parameter variations in software than in neural wetware.”

speed — “The speed of light is more than a million times greater than that of neural transmission, synaptic spikes dissipate more than a million times more heat than is thermodynamically necessary, and current transistor frequencies are more than a million times faster than neuron spiking frequencies.”

serial depth — On short timescales, machines can carry out much longer sequential processes.

storage capacity — Computers can plausibly have greater working and long-term memory.

size — Computers can be much larger than a human brain.

duplicability — Copying software onto new hardware can be much faster and higher-fidelity than biological reproduction.

Any one of these advantages could give an AI reasoner an edge over a human reasoner, or give a group of AI reasoners an edge over a human group. Their combination suggests that digital minds could surpass human minds more quickly and decisively than we might expect.

3. Why is safety important for smarter-than-human AI?

Present-day AI algorithms already demand special safety guarantees when they must act in important domains without human oversight, particularly when they or their environment can change over time:

Achieving these gains [from autonomous systems] will depend on development of entirely new methods for enabling “trust in autonomy” through verification and validation (V&V) of the near-infinite state systems that result from high levels of [adaptability] and autonomy. In effect, the number of possible input states that such systems can be presented with is so large that not only is it impossible to test all of them directly, it is not even feasible to test more than an insignificantly small fraction of them. Development of such systems is thus inherently unverifiable by today’s methods, and as a result their operation in all but comparatively trivial applications is uncertifiable. It is possible to develop systems having high levels of autonomy, but it is the lack of suitable V&V methods that prevents all but relatively low levels of autonomy from being certified for use.

As AI capabilities improve, it will become easier to give AI systems greater autonomy, flexibility, and control; and there will be increasingly large incentives to make use of these new possibilities. The potential for AI systems to become more general, in particular, will make it difficult to establish safety guarantees: reliable regularities during testing may not always hold post-testing.

The largest and most lasting changes in human welfare have come from scientific and technological innovation — which in turn comes from our intelligence. In the long run, then, much of AI’s significance comes from its potential to automate and enhance progress in science and technology. The creation of smarter-than-human AI brings with it the basic risks and benefits of intellectual progress itself, at digital speeds.

As AI agents become more capable, it becomes more important (and more difficult) to analyze and verify their decisions and goals. Stuart Russell writes:

The primary concern is not spooky emergent consciousness but simply the ability to make high-quality decisions. Here, quality refers to the expected outcome utility of actions taken, where the utility function is, presumably, specified by the human designer. Now we have a problem: The utility function may not be perfectly aligned with the values of the human race, which are (at best) very difficult to pin down. Any sufficiently capable intelligent system will prefer to ensure its own continued existence and to acquire physical and computational resources – not for their own sake, but to succeed in its assigned task. A system that is optimizing a function of n variables, where the objective depends on a subset of size k<n, will often set the remaining unconstrained variables to extreme values; if one of those unconstrained variables is actually something we care about, the solution found may be highly undesirable. This is essentially the old story of the genie in the lamp, or the sorcerer’s apprentice, or King Midas: you get exactly what you ask for, not what you want.

Bostrom’s “The Superintelligent Will” lays out these two concerns in more detail: that we may not correctly specify our actual goals in programming smarter-than-human AI systems, and that most agents optimizing for a misspecified goal will have incentives to treat humans adversarially, as potential threats or obstacles to achieving the agent’s goal.

If the goals of human and AI agents are not well-aligned, the more knowledgeable and technologically capable agent may use force to get what it wants, as has occurred in many conflicts between human communities. Having noticed this class of concerns in advance, we have an opportunity to reduce risk from this default scenario by directing research toward aligning artificial decision-makers’ interests with our own.

4. Do researchers think AI is imminent?

In early 2013, Bostrom and Müller surveyed the one hundred top-cited living authors in AI, as ranked by Microsoft Academic Search. Conditional on “no global catastrophe halt[ing] progress,” the twenty-nine experts who responded assigned a median 10% probability to our developing a machine “that can carry out most human professions at least as well as a typical human” by the year 2023, a 50% probability by 2048, and a 90% probability by 2080.

Most researchers at MIRI approximately agree with the 10% and 50% dates, but think that AI could arrive significantly later than 2080. This is in line with Bostrom’s analysis in Superintelligence:

My own view is that the median numbers reported in the expert survey do not have enough probability mass on later arrival dates. A 10% probability of HLMI [human-level machine intelligence] not having been developed by 2075 or even 2100 (after conditionalizing on “human scientific activity continuing without major negative disruption”) seems too low. Historically, AI researchers have not had a strong record of being able to predict the rate of advances in their own field or the shape that such advances would take. On the one hand, some tasks, like chess playing, turned out to be achievable by means of surprisingly simple programs; and naysayers who claimed that machines would “never” be able to do this or that have repeatedly been proven wrong. On the other hand, the more typical errors among practitioners have been to underestimate the difficulties of getting a system to perform robustly on real-world tasks, and to overestimate the advantages of their own particular pet project or technique.

Given experts’ (and non-experts’) poor track record at predicting progress in AI, we are relatively agnostic about when full AI will be invented. It could come sooner than expected, or later than expected.

Experts also reported a 10% median confidence that superintelligence would be developed within 2 years of human equivalence, and a 75% confidence that superintelligence would be developed within 30 years of human equivalence. Here MIRI researchers’ views differ significantly from AI experts’ median view; we expect AI systems to surpass humans relatively quickly once they near human equivalence.

5. What technical problems are you working on?

“Aligning smarter-than-human AI with human interests” is an extremely vague goal. To approach this problem productively, we attempt to factorize it into several subproblems. As a starting point, we ask: “What aspects of this problem would we still be unable to solve even if the problem were much easier?”

In order to achieve real-world goals more effectively than a human, a general AI system will need to be able to learn its environment over time and decide between possible proposals or actions. A simplified version of the alignment problem, then, would be to ask how we could construct a system that learns its environment and has a very crude decision criterion, like “Select the policy that maximizes the expected number of diamonds in the world.”

Highly reliable agent design is the technical challenge of formally specifying a software system that can be relied upon to pursue some preselected toy goal. An example of a subproblem in this space is ontology identification: how do we formalize the goal of “maximizing diamonds” in full generality, allowing that a fully autonomous agent may end up in unexpected environments and may construct unanticipated hypotheses and policies? Even if we had unbounded computational power and all the time in the world, we don’t currently know how to solve this problem. This suggests that we’re not only missing practical algorithms but also a basic theoretical framework through which to understand the problem.

The formal agent AIXI is an attempt to define what we mean by “optimal behavior” in the case of a reinforcement learner. A simple AIXI-like equation is lacking, however, for defining what we mean by “good behavior” if the goal is to change something about the external world (and not just to maximize a pre-specified reward number). In order for the agent to evaluate its world-models to count the number of diamonds, as opposed to having a privileged reward channel, what general formal properties must its world-models possess? If the system updates its hypotheses (e.g., discovers that string theory is true and quantum physics is false) in a way its programmers didn’t expect, how does it identify “diamonds” in the new model? The question is a very basic one, yet the relevant theory is currently missing.

We can distinguish highly reliable agent design from the problem of value specification: “Once we understand how to design an autonomous AI system that promotes a goal, how do we ensure its goal actually matches what we want?” Since human error is inevitable and we will need to be able to safely supervise and redesign AI algorithms even as they approach human equivalence in cognitive tasks, MIRI also works on formalizing error-tolerant agent properties. Artificial Intelligence: A Modern Approach, the standard textbook in AI, summarizes the challenge:

Yudkowsky […] asserts that friendliness (a desire not to harm humans) should be designed in from the start, but that the designers should recognize both that their own designs may be flawed, and that the robot will learn and evolve over time. Thus the challenge is one of mechanism design — to design a mechanism for evolving AI under a system of checks and balances, and to give the systems utility functions that will remain friendly in the face of such changes.

Our technical agenda describes these open problems in more detail, and our research guide collects online resources for learning more.

6. Why work on AI safety early?

MIRI prioritizes early safety work because we believe such work is important, time-sensitive, tractable, and informative.

The importance of AI safety work is outlined in Q3, above. We see the problem as time-sensitive as a result of:

neglectedness — Only a handful of people are currently working on the open problems outlined in the MIRI technical agenda.

apparent difficulty — Solving the alignment problem may demand a large number of researcher hours, and may also be harder to parallelize than capabilities research.

risk asymmetry — Working on safety too late has larger risks than working on it too early.

AI timeline uncertainty — AI could progress faster than we expect, making it prudent to err on the side of caution.

discontinuous progress in AI — Progress in AI is likely to speed up as we approach general AI. This means that even if AI is many decades away, it would be hazardous to wait for clear signs that general AI is near: clear signs may only arise when it’s too late to begin safety work.

We also think it is possible to do useful work in AI safety today, even if smarter-than-human AI is 50 or 100 years away. We think this for a few reasons:

lack of basic theory — If we had simple idealized models of what we mean by correct behavior in autonomous agents, but didn’t know how to design practical implementations, this might suggest a need for more hands-on work with developed systems. Instead, however, simple models are what we’re missing. Basic theory doesn’t necessarily require that we have experience with a software system’s implementation details, and the same theory can apply to many different implementations.

precedents — Theoretical computer scientists have had repeated success in developing basic theory in the relative absence of practical implementations. (Well-known examples include Claude Shannon, Alan Turing, Andrey Kolmogorov, and Judea Pearl.)

early results — We’ve made significant advances since prioritizing some of the theoretical questions we’re looking at, especially in decision theory and logical uncertainty. This suggests that there’s low-hanging theoretical fruit to be picked.

Finally, we expect progress in AI safety theory to be useful for improving our understanding of robust AI systems, of the available technical options, and of the broader strategic landscape. In particular, we expect transparency to be necessary for reliable behavior, and we think there are basic theoretical prerequisites to making autonomous AI systems transparent to human designers and users.

Having the relevant theory in hand may not be strictly necessary for designing smarter-than-human AI systems — highly reliable agents may need to employ very different architectures or cognitive algorithms than the most easily constructed smarter-than-human systems that exhibit unreliable behavior. For that reason, some fairly general theoretical questions may be more relevant to AI safety work than to mainline AI capabilities work. Key advantages to AI safety work’s informativeness, then, include:

general value of information — Making AI safety questions clearer and more precise is likely to give insights into what kinds of formal tools will be useful in answering them. Thus we’re less likely to spend our time on entirely the wrong lines of research. Investigating technical problems in this area may also help us develop a better sense for how difficult the AI problem is, and how difficult the AI alignment problem is.

requirements for informative testing — If the system is opaque, then online testing may not give us most of the information that we need to design safer systems. Humans are opaque general reasoners, and studying the brain has been quite useful for designing more effective AI algorithms, but it has been less useful for building systems for verification and validation.

requirements for safe testing — Extracting information from an opaque system may not be safe, since any sandbox we build may have flaws that are obvious to a superintelligence but not to a human.

7. How can I contribute?

MIRI is a research nonprofit funded primarily by small and medium-sized donors. Donations are therefore helpful for funding our mathematics work, workshops, academic outreach, etc.

For people interested in learning more about our research focus and possibly working with us, our Get Involved page has an application form and a number of regularly updated online resources.

Written by Rob Bensinger. Last updated September 18, 2016.