Project Summaries

Project Summary: Artificial Intelligence (AI) is a broad and open-ended research area, and the risks that AI systems will pose in the future are extremely hard to characterize. However, it seems likely that any AI system will involve substantial software complexity, will depend on advanced mathematics in both its implementation and justification, and will be naturally flexible and seem to degrade gracefully in the presence of many types of implementation errors. Thus we face a fundamental challenge in developing trustworthy AI: how can we build and maintain complex software systems that require advanced mathematics in order to implement and understand, and which are all but impossible to verify empirically? We believe that it will be possible and desirable to formally state and prove that the desired mathematical properties hold with respect to the underlying programs, and to maintain such proofs as part of the software artifacts themselves. We propose to demonstrate the feasibility of this methodology by building a system that takes beliefs about the world in the form of probabilistic models, synthesizes inference algorithms to update those beliefs in the presence of observations, and provides formal proofs that the inference algorithms are correct with respect to the laws of probability.

Technical Abstract: here

Project Summary: For society to enjoy many of the benefits of advanced artificial intelligence (AI) and robotics, it will be necessary to deal with situations that arise in which autonomous artificial agents violate laws or cause harm. If we want to allow AIs and robots to roam the internet and the physical world and take actions that are unsupervised by humans — as may be necessary for, e.g. personal shopping assistants, self-driving cars, and host of other applications — we must be able to manage the liability for the harms they might cause to individuals and property. Resolving this issue will require untangling a set of theoretical and philosophical issues surrounding causation, intention, agency, responsibility, culpability and compensation, and distinguishing different varieties of agency, such as causal, legal and moral. With a clearer understanding of the central concepts and issues, this project will provide a better foundation for developing policies which will enable society to utilize artificial agents as they become increasingly autonomous, and ensuring that future artificial agents can be both robust and beneficial to society, without stifling innovation.

Technical Abstract: This project addresses a central issue — “the liability problem” — facing the regulation of artificial computational agents, including artificial intelligence (AI) and robotic systems, as they become increasingly autonomous, and supersede current capabilities. In order for society to benefit from advances in AI technology, it will be necessary to develop regulatory policies which manage the risk and liability of deploying systems with increasingly autonomous capabilities. However, current approaches to liability have difficulties when it comes to dealing with autonomous artificial agents because their behavior may be unpredictable to those who create and deploy them, and they will not be proper legal agents. The project will explore the fundamental concepts of autonomy, agency and liability; clarify the different varieties of agency that artificial systems might realize, including causal, legal and moral; and the illuminate the relationships between these. The project will take a systematic approach by integrating an analysis of fundamental concepts “including autonomy, agency, causation, intention, responsibility and culpability” and their applicability to autonomous artificial agents, surveying current legal approaches to liability, and exploring possible approaches for future regulatory policy. It will deliver a book-length publication containing the theoretical research results and recommendations for policy-making.

Project Summary: Some experts believe that computers could eventually become a lot smarter than humans are. They call it artificial superintelligence, or ASI. If people build ASI, it could be either very good or very bad for humanity. However, ASI is not well understood, which makes it difficult for people to act to enable good ASI and avoid bad ASI. Our project studies the ways that people could build ASI in order to help people act in better ways. We will model the different steps that need to occur for people to build ASI. We will estimate how likely it is that these steps will occur, and when they might occur. We will also model the actions people can take, and we will calculate how much the actions will help. For example, governments may be able to require that ASI researchers build in safety measures. Our models will include both the government action and the ASI safety measures, to learn about how well it all works. This project is an important step towards making sure that humanity avoids bad ASI and, if it wishes, creates good ASI.

Technical Abstract: Artificial superintelligence (ASI) has been proposed to be a major transformative future technology, potentially resulting in either massive improvement in the human condition or existential catastrophe. However, the opportunities and risks remain poorly characterized and quantified. This reduces the effectiveness of efforts to steer ASI development towards beneficial outcomes and away from harmful outcomes. While deep uncertainty inevitably surrounds such a breakthrough future technology, significant progress can be made now using available information and methods. We propose to model the human process of developing ASI. ASI would ultimately be a human creation; modeling this process indicates the probability of various ASI outcomes and illuminates a range of ways to improve outcomes. We will characterize the development pathways that can result in beneficial or dangerous ASI outcomes. We will apply risk analysis and decision analysis methods to quantify opportunities and risks, and to evaluate opportunities to make ASI less risky and more beneficial. Specifically, we will use fault trees and influence diagrams to map out ASI development pathways and the influence that various actions have on these pathways. Our proposed project will produce the first-ever analysis of ASI development using rigorous risk and decision analysis methodology.

Project Summary: Autonomous goal-directed systems may behave flexibly with minimal human involvement. Unfortunately, such systems could also be dangerous if pursuing an incorrect or incomplete goal.

Meaningful human control can ensure that each decision ultimately reflects the desires of a human operator, with AI systems merely providing capabilities and advice. Unfortunately, as AI becomes more capable such control becomes increasingly limiting and expensive.

I propose to study an intermediate approach, where a system’s behavior is shaped by what a human operator would have done if they had been involved, rather than either requiring actual involvement or pursuing a goal without any oversight. This approach may be able to combine the safety of human control with the effi- ciency of autonomous operation. But capturing either of these benefits requires confronting new challenges: to be safe, we must ensure that our AI systems do not cause harm by incorrectly predicting the human operator; to be efficient and flexible, we must enable the human operator to provide meaningful oversight in domains that are too complex for them to reason about unaided. This project will study both of these problems, with the goal of designing concrete mechanisms that can realize the promise of this approach.

Technical Abstract: here

Project Summary: Humans take great pride in being the only creatures who make moral judgments, even though their moral judgments often suffer from serious flaws. Some AI systems do generate decisions based on their consequences, but consequences are not all there is to morality. Moral judgments are also affected by rights (such as privacy), roles (such as in families), past actions (such as promises), motives and intentions, and other morally relevant features. These diverse factors have not yet been built into AI systems. Our goal is to do just that. Our team plans to combine methods from computer science, philosophy, and psychology in order to construct an AI system that is capable of making plausible moral judgments and decisions in realistic scenarios. We hope that this work will provide a basis that leads to future highly-advanced AI systems acting ethically and thereby being more robust and beneficial. Humans, by comparing their own moral judgments to the output of the resulting system, will be able to understand their own moral judgments and avoid common mistakes (such as partiality and overlooking relevant factors). In these ways and more, moral AI might also make humans more moral.

Technical Abstract: Most contemporary AI systems base their decisions solely on consequences, whereas humans also consider other morally relevant factors, including rights (such as privacy), roles (such as in families), past actions (such as promises), motives and intentions, and so on. Our goal is to build these additional morally relevant features into an AI system. We will identify morally relevant features by reviewing theories in moral philosophy, conducting surveys in moral psychology, and using machine learning to locate factors that affect human moral judgments. We will use and extend game theory and social choice theory to determine how to make these features more precise, how to weigh conflicting features against each other, and how to build these features into an AI system. We hope that eventually this work will lead to highly advanced AI systems that are capable of making moral judgments and acting on them. Humans will then be able to compare these outputs to their own moral judgments in order to learn which of these judgments are distorted by biases, partiality, or lack of attention to relevant factors. In such ways, moral AI can also contribute to our own understanding of morality and our moral lives.

Project Summary: What are the most important projects for reducing the risk of harm from superintelligent artificial intelligence? We will probably not have to deal with such systems for many years – and we do not expect they will be developed with the same architectures we use today. That may make us want to focus on developing long-term capabilities in AI safety research. On the other hand, there are forces pushing us towards working on near-term problems. We suffer from ‘near-sightedness’ and are better at finding the answer to questions that are close at hand. Just as important, work on long-term problems can happen in the future and get extra people attending to it, while work on near-term problems has to happen now if it is to happen at all.

This project models the trade-offs we make when carrying out AI safety projects that aim at various horizons, and focused on specific architectures. It estimates crucial parameters – like the time-horizon probability distribution and how near-sighted we tend to be. It uses that model to work out what the AI safety community should be funding, and what it should call on policymakers to do.

Technical Abstract: here

Project Summary: In the early days of AI research, scientists studied problems such as chess and theorem proving that involved “micro worlds” that were perfectly known and predictable. Since the 1980s, AI researchers have studied problems involving uncertainty. They apply probability theory to model uncertainty about the world and use decision theory to represent the utility of the possible outcomes of proposed actions. This allows computers to make decisions that maximize expected utility by taking into account the “known unknowns”. However, when such AI systems are deployed in the real world, they can easily be confused by “unknown unknowns” and make poor decisions. This project will develop theoretical principles and AI algorithms for learning and acting safely in the presence of unknown unknowns. The algorithms will be able to detect and respond to unexpected changes in the world. They will ensure that when the AI system plans a sequence of actions, it takes into account its ignorance of the unknown unknowns. This will lead it to behave cautiously and turn to humans for help. Instead of maximizing expected utility, it will first ensure that its actions avoid unsafe outcomes and only then maximize utility. This will make AI systems much safer.

Technical Abstract: here

Project Summary: As we close the loop between sensing-reasoning-acting, autonomous agents such as self-driving cars are required to act intelligently and adaptively in increasingly complex and uncertain real-world environments. To make sensible decisions under uncertainty, agents need to reason probabilistically about their environments, e.g., estimate the probability that a pedestrian will cross or that a car will change lane. Over the past decades, AI research has made tremendous progress in automated reasoning. Existing technology achieves super-human performance in numerous domains, including chess-playing and crossword-solving. Unfortunately, current approaches do not provide worst-case guarantees on the quality of the results obtained. For example, it is not possible to rule out completely unexpected behaviors or catastrophic failures. Therefore, we propose to develop novel reasoning technology focusing on soundness and robustness. This research will greatly improve the reliability and safety of next-generation autonomous agents.

Technical Abstract: To cope with the uncertainty and ambiguity of real world domains, modern AI systems rely heavily on statistical approaches and probabilistic modeling. Intelligent autonomous agents need to solve numerous probabilistic reasoning tasks, ranging from probabilistic inference to stochastic planning problems. Safety and reliability depend crucially on having both accurate models and sound reasoning techniques. To date, there are two main paradigms for probabilistic reasoning: exact decomposition-based techniques and approximate methods such as variational and MCMC sampling. Neither of them is suitable for supporting autonomous agents interacting with complex environments safely and reliably. Decomposition-based techniques are accurate but are not scalable. Approximate techniques are more scalable, but in most cases do not provide formal guarantees on the accuracy. We therefore propose to develop probabilistic reasoning technology which is both scalable and provides formal guarantees, i.e., “certificates” of accuracy, as in formal verification. This research will bridge probabilistic and deterministic reasoning, drawing from their respective strengths, and has the potential to greatly improve the reliability and safety of AI and cyber-physical systems.

Project Summary: Previous work in economics and AI has developed mathematical models of preferences or values, along with computer algorithms for inferring preferences from observed human choices. We would like to use such algorithms to enable AI systems to learn human preferences by observing humans make real-world choices. However, these algorithms rely on an assumption that humans make optimal plans and take optimal actions in all circumstances. This is typically false for humans. For example, people’s route planning is often worse than Google Maps, because we can’t number-crunch as many possible paths. Humans can also be inconsistent over time, as we see in procrastination and impulsive behavior. Our project seeks to develop algorithms that learn human preferences from data despite humans not being homo-economicus and despite the influence of non-rational impulses. We will test our algorithms on real-world data and compare their inferences to people’s own judgments about their preferences. We will also investigate the theoretical question of whether this approach could enable an AI to learn the entirety of human values.

Technical Abstract: Previous work in economics and AI has developed mathematical models of preferences, along with algorithms for inferring preferences from observed actions. We would like to use such algorithms to enable AI systems to learn human preferences from observed actions. However, these algorithms typically assume that agents take actions that maximize expected utility given their preferences. This assumption of optimality is false for humans in real-world domains. Optimal sequential planning is intractable in complex environments and humans perform very rough approximations. Humans often don’t know the causal structure of their environment (in contrast to MDP models). Humans are also subject to dynamic inconsistencies, as observed in procrastination, addiction and in impulsive behavior. Our project seeks to develop algorithms that learn human preferences from data despite the suboptimality of humans and the behavioral biases that influence human choice. We will test our algorithms on real-world data and compare their inferences to people’s own judgments about their preferences. We will also investigate the theoretical question of whether this approach could enable an AI to learn the entirety of human values.

Project Summary: How can we ensure that powerful AI systems of the future behave in ways that are reliably aligned with human interests?

One productive way to begin study of this AI alignment problem in advance is to build toy models of the unique safety challenges raised by such powerful AI systems and see how they behave, much as Konstantin Tsiolkovsky wrote down (in 1903) a toy model of how a multistage rocket could be launched into space. This enabled Tsiolkovsky and others to begin exploring the specific challenges of spaceflight long before such rockets were built.

Another productive way to study the AI alignment problem in advance is to seek formal foundations for the study of well-behaved powerful Ais, much as Tsiolkovsky derived the rocket equation (also in 1903) which governs the motion of rockets under ideal environmental conditions. This was a useful stepping stone toward studying the motion of rockets in actual environments.

We plan to build toy models and seek formal foundations for many aspects of the AI alignment problem. One example is that we aim to improve our toy models of a corrigible agent which avoids default rational incentives to resist its programmers’ attempts to fix errors in the AI’s goals.

Technical Abstract: The Future of Life Institute’s research priorities document calls for research focused on ensuring beneficial behavior in [AI] systems that can learn from experience with human-like breadth and surpass human performance in most cognitive tasks. We aim to study several sub-problems of this ‘AI alignment problem, by illuminating the key difficulties using toy models, and by seeking formal foundations for robustly beneficial intelligent agents. In particular, we hope to (a) improve our toy models of ‘corrigible agents’ which avoid default rational incentives to resist corrective interventions from the agents’ programmers, (b) continue our preliminary efforts to put formal foundations under the study of naturalistic, embedded agents which avoid the standard agent-environment split currently used as a simplifying assumption throughout the field of AI, and (c) continue our preliminary efforts to overcome obstacles to flexible cooperation in multi-agent settings. We also hope to take initial steps in formalizing several other informal problems related to AI alignment, for example the problem of ‘ontology identification’: Given goals specified with respect to some ontology and a world model, how can the ontology of the goals be identified inside the world model?

Project Summary: Many experts think that within a century, artificial intelligence will be able to do almost anything a human can do. This might mean humans are no longer in control of what happens, and very likely means they are no longer employable. The world might be very different, and the changes that take place could be dangerous.

Very little research has asked when this transition will happen, what will happen, and how we can make it go well. AI Impacts is a project to ask those questions, and to answer them rigorously. We look for research projects that can shed light on the future of AI; especially on questions that matter to people making decisions. We publish the results online, and explain our research to a broad audience.

We are currently working on comparing the power of the brain to that of supercomputers, to help calculate when people will have enough hardware to run something as complex as a brain. We are also checking whether AI progress is likely to see sudden jumps, by looking for jumps in other areas of technological progress.

Technical Abstract: ‘Human-level’ artificial intelligence will have far-reaching effects on society, and is generally anticipated within the coming century. Relatively little is known about the timelines or consequences of this arrival, though increasingly many decisions depend on guesses about it. AI Impacts identifies cost-effective research projects which might shed light on the future of AI, and especially on the parts of it that might guide policy and other decisions. We perform a selection of these research projects, and publish the results as accessible articles in the public domain.

We recently made a preliminary estimate of the computing performance of the brain in terms of traversed edges per second (TEPS), “a supercomputing benchmark” to better judge when computing hardware will be capable of replicating what the brain does, given the right software. We are also collecting case studies of abrupt technological progress to aid in evaluating the probability of discontinuities in AI progress. In the coming year we will continue with both of these projects, publish articles about several projects in progress, and start several new projects.

Project Summary: We are investigating the safety of possible future advanced AI that uses the same basic approach to motivated behavior as that used by the human brain. Neuroscience has given us a rough blueprint of how the brain directs its behavior based on its innate motivations and its learned goals and values. This blueprint may be used to guide advances in artificial intelligence to produce AI that is as intelligent and capable as humans, and soon after, more intelligent. While it is impossible to predict how long this progress might take, it is also impossible to predict how quickly it might happen. Rapidly progress in practical applications is producing rapid increases in funding from commercial and governmental sources. Thus, it seems critical to understand the potential risks of brain-style artificial intelligence before it is actually achieved. We are testing our model of brain-style motivational systems in a highly simplified environment, to investigate how its behavior may change as it learns and becomes more intelligent. While our system is not capable of performing useful tasks, it serves to investigate the stability of such systems when they are integrated with powerful learning systems currently being developed and deployed.

Technical Abstract: We apply a neural network model of human motivated decision-making to an investigation of the risks involved in creating artificial intelligence with a brain-style motivational system. This model uses relatively simple principles to produce complex, goal-directed behavior. Because of the potential utility of such a system, we believe that this approach may see common adoption, and has significant risks. Such a system could provide the motivational core of efforts to create artificial general intelligence (AGI). Such a system has the advantage of leveraging the wealth of knowledge already available and rapidly accumulating on the neuroscience of mammalian motivation and self-directed learning. We employ this model, and non-biological variations on it, to investigate the risks of employing such systems in combination with powerful learning mechanisms that are currently being developed. We investigate the issues of motivational and representational drift. Motivational drift captures how a system will change the motivations it is initially given and trained on. Representational drift refers to the possibility that sensory and conceptual representations will change over the course of training. We investigate whether learning in these systems can be used to produce a system that remains stable and safe for humans as it develops greater intelligence.

Project Summary: One path to significantly smarter-than-human artificial agents involves self-improvement, i.e., agents doing artificial intelligence research to make themselves even more capable. If such an agent is designed to be robust and beneficial, it should only execute self-modifying actions if it knows they are improvements, which, at a minimum, means being able to trust that the modified agent only takes safe actions. However, trusting the actions of a similar or smarter agent can lead to problems of self-reference, which can be seen as sophisticated versions of the liar paradox (which shows that the self-referential sentence “this sentence is false” cannot be consistently true or false). Several partial solutions to these problems have recently been proposed. However, current software for formal reasoning does not have sufficient support for self-referential reasoning to make these partial solutions easy to implement and study. In this project, we will implement a toy model of agents using these partial solutions to reason about self-modifications, in order to improve our understanding of the challenges of implementing self-referential reasoning, and to stimulate work on tools suitable for it.

Technical Abstract: Artificially intelligent agents designed to be highly reliable are likely to include a capacity for formal deductive reasoning to be applied in appropriate situations, such as when reasoning about computer programs including other agents and future versions of the same agent. However, it will not always be possible to model other agents precisely: considering more capable agents, only abstract reasoning about their architecture is possible. Abstract reasoning about the behavior of agents that justify their actions with proofs lead to problems of self-reference and reflection: Godel’s second incompleteness theorem shows that no sufficiently strong proof system can prove its own consistency, making it difficult for agents to show that actions their successors have proven to be safe are in fact safe (since an inconsistent proof system would be able to prove any action “safe”). Recently, some potential approaches to circumventing this obstacle have been proposed in the form of pen-and-paper proofs.

We propose building and studying implementations of agents using these approaches, to better understand the challenges of implementing tools that are able to support this type of reasoning, and to stimulate work in the interactive theorem proving community on this kind of tools.

Project Summary: Deep learning architectures have fundamentally changed the capabilities of machine learning and benefited many applications such as computer vision, speech recognition, natural language processing, with many more influences to other problems coming along. However, very little is understood about those networks. Months of manual tuning is required for obtaining excellent performance, and the trained networks are often not robust: recent studies have shown that the error rate increases significantly with just slight pixel-level perturbations in image that are not even perceivable by human eyes.

In this proposal, The PI propose to thoroughly study the optimization and robustness of deep convolutional networks in visual object recognition, in order to gain more understanding about deep learning. This includes training procedures that will make deep learning more automatic and lead to less failures in training, as well as confidence estimates when the deep network is utilized to predict on new data. The confidence estimates can be used to control the behavior of a robot employing deep learning so that it will not go on to perform maneuvers that could be dangerous because of erroneous predictions. Understanding these aspects would also be helpful in designing potentially more robust networks in the future.

Technical Abstract: This work will focus on predicting whether a deep convolutional neural network (CNN) has succeeded. This includes two aspects, first, to find an explanation of why and when can the stochastic optimization in a deep CNN succeed without overfitting and obtain high accuracy. Second, to establish an estimate of confidence of the predictions of the deep learning architecture. Those estimates of confidence can be used as safeguards when utilizing those networks in real life. In order to establish those estimates, this work proposes to start from intuitions drawn from empirical analyses from the training procedure and model structures of deep learning. In-depth analyses will be completed for the mini-batch training procedure and model structures, by illustrating the differences each mini-batch size provides for the training, as well as the low-dimensional manifold structure in the classification. From those analyses, this work will result in approaches to design and control a proper training procedure with less human intervention, as well as confidence estimates by estimating the distance of the testing data to the submanifold that the trained network is effective on.

Project Summary: In order for AI to be safely deployed, the desired behavior of the AI system needs to be based on well-understood, realistic, and empirically testable assumptions. From the perspective of modern machine learning, there are three main barriers to this goal. First, existing theory and algorithms mainly focus on fitting the observable outputs in the training data, which could lead, for instance, to an autonomous driving system that performs well on validation tests but does not understand the human values underlying the desired outputs. Second, existing methods are designed to handle a single specified set of testing conditions, and thus little can be said about how a system will behave in a fundamentally new setting; e.g., an autonomous driving system that performs well in most conditions may still perform arbitrarily poorly during natural disasters. Finally, most systems have no way of detecting whether their underlying assumptions have been violated: they will happily continue to predict and act even on inputs that are completely outside the scope of the system.

In this proposal, we detail a research program for addressing all three of the problems above. Just as statistical learning theory (e.g., the work of Vapnik) laid down the foundations of existing machine learning and AI techniques, allowing the field to flourish over the last 25 years, we aim to lay the groundwork for a new generation of safe-by-design AI systems, which can sustain the continued deployment of AI in society

Technical Abstract: here

Project Summary: One goal of artificial intelligence is valid behavior: computers should perform tasks that people actually want them to do. The current model of programming hinders validity, largely because it focuses on the minutae of how to compute rather than the goal of what to compute. An alternative model offers hope for validity: program synthesis. Here, the user specifies what by giving a small description of their goal (e.g., input-output examples). The synthesizer then infers candidate programs matching that description, which the user selects from.

One shortcoming of synthesizers is that they are truthful rather than helpful: they return answers that are literally consistent with user requirements but no more (e.g., a requirement of “word that starts with the letter a” might return just “a”). By contrast, human read more deeply into requirements, divining the underlying intentions. Helpfulness of this kind has been intensely studied in the linguistic field called pragmatics. This project will investigate how recent developments into computational modeling of pragmatics can be leveraged to improve program synthesis, making it easier to write programs that do what we want with little to no special knowledge.

Technical Abstract: here

Project Summary: Economics models the behavior of people, firms, and other decision makers, as a means to understand how these decisions shape the pattern of activities that produce value and ultimately satisfy (or fail to satisfy) human needs and desires. The field adopts rational models of behavior, either of individuals or of behavior in the aggregate.

Artificial Intelligence (AI) research is also drawn to rationality concepts, which provide an ideal for the computational agents that it seeks to create. Although perfect rationality is not achievable, the capabilities of AI are rapidly advancing, and AI can already surpass human-level capabilities in narrow domains.

We envision a future with a massive number of AIs, these AIs owned, operated, designed, and deployed by a diverse array of entitites. This multiplicity of interacting AIs, apart or together with people, will constitute a social system, and as such economics can provide a useful framework for understanding and influencing the aggregate. In turn, systems populated by AIs can benefit from explicit design of the frameworks within which AIs exist. The proposed research looks to apply the economic theory of mechanism design to the coordination of behavior in systems of multiple AIs, looking to promote beneficial outcomes.

Technical Abstract: When a massive number of AIs are owned, operated, designed, and deployed by a diverse array of firms, individuals, and governments, this multi-agent AI constitutes a social system, and economics provides a useful framework for understanding and influencing the aggregate. In particular, we need to understand how to design multi-agent systems that promote beneficial outcomes when AIs interact with each other. A successful theory must consider both incentives and privacy considerations.

Mechanism design theory from economics provides a framework for the coordination of behavior, such that desirable outcomes are promoted and less desirable outcomes made less likely because they are not in the self-interest of individual actors. We propose a program of fundamental research to understand the role of mechanism design, multi-agent dynamical models, and privacy-preserving algorithms, especially in the context of multi-agent systems in which the AIs are built through reinforcement learning (RL). The proposed research considers two concrete AI problems: the first is experiment design, typically formalized as a multi-armed bandit process, which we study in a multi-agent, privacy-preserving setting. The second is the more general problem of learning to act in Markovian dynamical systems, including both planning and RL agents.

Project Summary: The most exciting and impactful uses of artificial intelligence (AI) that will affect everybody in crucial ways involve smart decision making to help people in the real world, such as when driving cars or flying aircraft, or help from robots engaging with humans. All of these have a huge potential for making the world a better place but also impose considerable responsibility on the system designer to ensure it will not do more harm than good because its decisions are sometimes systematically unsafe. Responsibly allowing mankind to rely on such technology requires stringent assurance of its safety. A nontrivial enterprise for the decision flexibility in AI. The goal of this research project is to develop formal verification and validation technology that helps ensure safety, robustness, and reliability of AI-based system designs. The world is an uncertain place. So perfect performance cannot always be guaranteed in all respects. But good system designs do not compromise safety when their performance degrades. The PI is proposing to advance his verification tool KeYmaera that has been used successfully for systems known as cyber-physical systems (combining computer decisions with physics or motion) toward the additional challenges that a deep integration of AI into those systems provides.

Technical Abstract: The most important and most impactful AI-based systems are those that directly interface with the physical world. Cyber-physical systems (CPS) combine computers (for decisions) and physics (motion) and play a prominent role, e.g., in cars, aircraft, robots. Due to their impact on people, they come with stringent safety requirements in order to make sure they make the world a better place. In order to enable sophisticated automation for these systems, AI-based systems become more prominent but their impact on the safety of the system is not understood well so far.

This project studies ways of extending safety analysis and verification technology for cyber-physical systems with ways of addressing the additional challenges that AI-based CPS provide.

The PI developed a verification tool, KeYmaera, for CPS, which has had quite some success in verifying CPS. KeYmaera has been used successfully for a safety analysis of an AI-intensive CPS, the Airborne Collision Avoidance System ACAS X, albeit with non-negligible effort. For a system of such a world-wide impact as ACAS X, the effort amortizes. This project proposes to develop verification technology that reduces the effort needed to verify AI-based systems in order to achieve more widespread adoption of safety analysis for AI-based CPS.

Project Summary: There is a growing concern over the deployment of autonomous weapons systems, and how the partnering of artificial intelligence (AI) and weapons will change the future of conflict. The United Nations recently took up the subject of autonomous weapons, and many governments and key international organizations are arguing that such systems require meaningful human control to be acceptable. However, what is human control, and how do we ensure that it is meaningful? This project helps the international community, scholars and practitioners by providing answers those questions and helping to protect the essential elements of human control over the application of force. Bringing together computer scientists, roboticists, ethicists, lawyers and diplomats, the project will produce a conceptual framework that can shape new research and international policy for the future. Moreover, it will create a freely downloadable dataset on existing and emerging semi-autonomous weapons. Through this data, we can gain clarity on how and where autonomous functions are already deployed and on how such functions are kept under human control. A focus on current and emerging technologies makes it clear that the relationship between AI and weapons is not a problem for the distant future, but is a pressing issue now.

Technical Abstract: The project addresses the relationships between artificial intelligence (AI), weapons systems and society. In particular, the project provides a framework for meaningful human control (MHC) of autonomous weapons systems. In international discussions, a number of governments and organizations adopted MHC as a tool for approaching problems and potential solutions raised by autonomous weapons. However, the content of MHC was left open. While useful for policy reasons, the international community, academics and practioners are calling for further work on this issue. This project responds to that call by bringing together a multidisciplinary and multi-stakeholder team to address key questions. For example, we question the values associated with MHC, what rules should inform the design of the systems “both in software and hardware” and how existing and currently developing weapons systems advance possible relationships between human control, autonomy and AI. To achieve impact across academic, industry and policy arenas, we will produce academic publications, policy briefs, an open access database on ‘semi-autonomous’ weapons, and will sponsor multi-sector stakeholder discussions on how human values can be maintained as systems develop. Furthermore, the organization Article 36 will channel outputs directly into the international diplomatic community to achieve impact in international legal and policy forums.

Project Summary: The future will see autonomous machines acting in the same environment as humans, in areas as diverse as driving, assistive technology, and health care. Think of self-driving cars, companion robots, and medical diagnosis support systems. We also believe that humans and machines will often need to work together and agree on common decisions. Thus hybrid collective decision making systems will be in great need.

In this scenario, both machines and collective decision making systems should follow some form of moral values and ethical principles (appropriate to where they will act but always aligned to humans’), as well as safety constraints. In fact, humans would accept and trust more machines that behave as ethically as other humans in the same environment. Also, these principles would make it easier for machines to determine their actions and explain their behavior in terms understandable by humans. Moreover, often machines and humans will need to make decisions together, either through consensus or by reaching a compromise. This would be facilitated by shared moral values and ethical principles.

We will study the embedding and learning of safety constraints, moral values, and ethical principles in collective decision making systems for societies of machines and humans.

Technical Abstract: The future will see autonomous agents acting in the same environment as humans, in areas as diverse as driving, assistive technology, and health care. In this scenario, collective decision making will be the norm. We will study the embedding of safety constraints, moral values, and ethical principles in agents, within the context of hybrid human/agents collective decision making. We will do that by adapting current logic-based modelling and reasoning frameworks, such as soft constraints, CP-nets, and constraint-based scheduling under uncertainty. For ethical principles, we will use constraints specifying the basic ethical “laws”, plus sophisticated prioritised and possibly context-dependent constraints over possible actions, equipped with a conflict resolution engine. To avoid reckless behavior in the face of uncertainty, we will bound the risk of violating these ethical laws. We will also replace preference aggregation with an appropriately developed constraint/value/ethics/preference fusion, an operation designed to ensure that agents’ preferences are consistent with the system’s safety constraints, the agents’ moral values, and the ethical principles of both individual agents and the collective decision making system. We will also develop approaches to learn ethical principles for artificial intelligent agents, as well as predict possible ethical violations.

Project Summary: Machine Learning and Artificial Intelligence underpin technologies that we rely on daily, from consumer electronics (smart phones), medical implants (continuous blood glucose monitors), websites (Facebook, Google), to the systems that defend critical infrastructure. The very characteristic that makes these systems so beneficial — adaptability — can also be exploited by sophisticated adversaries wishing to breach system security or gain an economic advantage. This project will develop usable software tools for evaluating vulnerabilities in learning systems, a first step towards general-purpose, secure machine learning.

Technical Abstract: This project aims to develop systems for the analysis of machine learning algorithms in adversarial environments. Today Machine Learning and Statistics are employed in many technologies where participants have an incentive to game the system, for example internet ad placement, cybersecurity, credit risk in finance, health analytics, and smart utility grids. However little is known about how well state-of-the-art inference techniques fare when data is manipulated by a malicious adversary. By formulating the process of evading a learned model, or manipulating training data to poison learning, as an optimization program, our approach to evaluating security reduces to one a projected subgradient descent. Our main method for solving such iterative optimizations generically, will be to employ the dynamic code analysis represented by automatic differentiation. A key output of this project will be usable software tools for evaluating the security of learning systems in general.

Project Summary: Developing AI systems that are benevolent towards humanity requires making sure that those systems know what humans want. People routinely make inferences about the preferences of others and use those inferences as the basis for helping one another. This project aims to provide AI systems a similar ability to learn from observations, in order to better align the values of those systems with those of humans. Doing so requires dealing with some significant challenges: If we ultimately develop AI systems that can reason better than humans, how do we make sure that those AI systems are able to take human limitations into account? The fact that we haven’t yet cured cancer shouldn’t be taken as evidence that we don’t really care about it. Furthermore, once we have made an AI system that can reason about human preferences, that system then has to trade off time spent in deliberating about the right course of action with the need to act as quickly as possible – it needs to deal with its own computational limitations as it makes decisions. We aim to address both these challenges by examining how intelligent agents (be they humans or computers) should make these tradeoffs.

Technical Abstract: here

Project Summary: There is general consensus within the AI research community that progress in the field is accelerating: it is believed that human-level AI will be reached within the next one or two decades. A key question is whether these advances will accelerate further after general human level AI is achieved, and, if so, how rapidly the next level of AI systems (?super-human?) will be achieved.

Since the mid 1970s, Computer scientists have developed a rich theory about the computational resources that are needed to solve a wide range of problems. We will use these methods to make predictions about the feasibility of super-human level cognition.

Technical Abstract: There is general consensus within the AI research community that progress in the field is accelerating: it is believed that human-level AI will be reached within the next one or two decades on a range of cognitive tasks. A key question is whether these advances will accelerate further after general human level AI is achieved, and, if so, how rapidly the next level of AI systems (‘super-human’) will be achieved. Having a better understanding of how rapidly we may reach this next phase will be useful in preparing for the advent of such systems.

Computational complexity theory provides key insights into the scalability of computational systems. We will use methods from complexity theory to analyze the possibility of the scale-up to super-human intelligence and the speed of such scale-up for different categories of cognition.

Project Summary: AI systems will need to understand human values in order to respect them. This requires having similar concepts as humans do. We will research whether AI systems can be made to learn their concepts in the same way as humans learn theirs. This will involve a literature review of the relevant fields, as well as experimental work.

We are particularly interested in a branch of machine learning called deep learning. The concepts learned by deep learning agents seem to be similar as the ones that have been documented in psychology. We will attempt to apply existing deep learning methodologies for learning what we call moral concepts, concepts through which moral values are defined. In addition, we will investigate a particular hypothesis of how we develop our concepts and values in the first place.

Technical Abstract: Autonomous AI systems will need to understand human values in order to respect them. This requires having similar concepts as humans do. We will research whether AI systems can be made to learn their concepts in the same way as humans learn theirs. This will involve a literature review of the relevant fields, as well as experimental work.

Both human concepts and the representations of deep learning models seem to involve a hierarchical structure, among other similarities. For this reason, we will attempt to apply existing deep learning methodologies for learning what we call moral concepts, concepts through which moral values are defined. In addition, we will investigate the extent to which reinforcement learning affects the development of our concepts and values.

Project Summary: As it becomes ever clearer how machines with a human level of intelligence can be built — and indeed that they will be built — there is a pressing need to discover ways to ensure that such machines will robustly remain benevolent, especially as their intellectual and practical capabilities come to surpass ours. Through self-modification, highly intelligent machines may be capable of breaking important constraints imposed initially by their human designers. The currently prevailing technique for studying the conditions for preventing this danger is based on forming mathematical proofs about the behavior of machines under various constraints. However, this technique suffers from inherent paradoxes and requires unrealistic assumptions about our world, thus not proving much at all.

Recently a class of machines that we call experience-based artificial intelligence (EXPAI) has emerged, enabling us to approach the challenge of ensuring robust benevolence from a promising new angle. This approach is based on studying how a machine’s intellectual growth can be molded over time, as the machine accumulates real-world experience, and putting the machine under pressure to test how it handles the struggle to adhere to imposed constraints.

The Swiss AI lab IDSIA will deliver a widely applicable EXPAI growth control methodology.

Technical Abstract: Whenever one wants to verify that a recursively self-improving system will robustly remain benevolent, the prevailing tendency is to look towards formal proof techniques, which however have several issues: (1) Proofs rely on idealized assumptions that inaccurately and incompletely describe the real world and the constraints we mean to impose. (2) Proof-based self-modifying systems run into logical obstacles due to Lob’s theorem, causing them to progressively lose trust in future selves or offspring. (3) Finding nontrivial candidates for provably beneficial self-modifications requires either tremendous foresight or intractable search.

Recently a class of AGI-aspiring systems that we call experience-based AI (EXPAI) has emerged, which fix/circumvent/trivialize these issue. They are self-improving systems that make tentative, additive, reversible, very fine-grained modifications, without prior self-reasoning; instead, self-modifications are tested over time against experiential evidences and slowly phased in when vindicated or dismissed when falsified. We expect EXPAI to have high impact due to its practicality and tractability. Therefore we must now study how EXPAI implementations can be molded and tested during their early growth period to ensure their robust adherence to benevolence constraints.

In this project, the Swiss AI lab IDSIA will deliver an EXPAI growth control methodology that shall be widely applicable.

Project Summary: We focus on current and future complex AI autonomous systems that integrate sensors, computation, and actuation to perform tasks of benefit to humans. Examples of such systems are auto-pilots, medical assistants, internet-of-things components, and mobile service robots. One of the key aspects to bring such complex AI systems to safe and acceptable existence is the ability for such systems to provide transparency on their representations, interpretations, choices, and decisions, in summary, their internal state.

We believe that, to build AI systems that are safe, as well as accepted and trusted by humans, we need to equip them with the capability to explain their actions, recommendations, and inferences. Our proposed project aims at researching on the specification, formalization, and generation of explanations, with a concrete focus on seamlessly integrated AI systems that sense and reason about multi-modal information in symbiosis with humans. As a result, humans will be able to query robots for explanations about their recommendations or actions, and carry any needed corrections.

Technical Abstract: AI systems have long been challenged with providing explanations about their reasoning. Automated theorem provers, explanation-based learning systems, and conflict-based constraint solvers are examples where inference is supplemented by the underlying processed knowledge and rules.

We focus on current and future complex AI autonomous systems that integrate perception, cognition, and action, in tasks to service humans. These systems can be viewed as cyber-physical-social systems, such as auto-pilots, medical assistants, internet-of-things components, and mobile service robots.

We propose to research on bringing such complex AI systems to safe and acceptable existence by providing transparency on their representations, interpretations, choices, and decisions. We will develop mining techniques to enable the analysis and explanation of temporally-logged sensory and execution data, constrained by the underlying behavior architecture, as well as the uncertainty of the sensed environment. We will address the need for probabilistic and knowledge-based inference; the variety of input data modalities; and the coordination of multiple reasoning agents.

We will concretely research on autonomous mobile service robots, such as CoBots, as well as quadrotors. We envision humans setting queries about the robots performance and the choice of their actions. Our generated explanations will increase the understanding, and robot safety.

Project Summary: Progress towards a fully-automated economy suffers from a profound tension. On the one hand, technological progress depends on human effort. Human effort is, in general, decreasing in the amount that effort is taxed. On the other hand, the more the economy is automated, the more redistribution could be required to support the living standards of the less skilled. The less skilled could even become unemployed, and the unemployed could eventually comprise the majority of the population. The higher the fraction unemployed, the higher must be the tax burden on those who are productive in this new economy.

At first glance, then, the more technological progress we make, the more we will be forced to disincentivize further progress. Yet, it is possible that some paths of tax and subsidy policy could lead to vastly improved social welfare a few decades hence compared to others. Some paths might avoid altogether the scenario sketched above. This project seeks to characterize the path of optimal policy in the transition to a fully-automated economy. In doing so, it would answer directly the question of how we maximize the societal benefit of AI.

Technical Abstract: Progress towards a fully-automated economy suffers from a profound tension. On the one hand, technological progress depends on human effort. Human effort is, in general, decreasing in the amount that effort is taxed. On the other hand, the more the economy is automated, the more redistribution could be required to support the living standards of the less skilled. The less skilled could even become unemployed, and the unemployed could eventually comprise the majority of the population. The higher the fraction unemployed, the higher must be the tax burden on those who are productive in this new economy.

At first glance, then, the more technological progress we make, the more we will be forced to disincentivize further progress. Yet, it is possible that some paths of tax and subsidy policy could lead to vastly improved social welfare a few decades hence compared to others. Some paths might avoid altogether the scenario sketched above. This project seeks to characterize the path of optimal policy in the transition to a fully-automated economy. In doing so, it would answer directly the question of how we maximize the societal benefit of AI.

Project Summary: AI systems, whether robotic or conversational software agents, use planning algorithms to achieve high-level goals by exhaustively considering all possible sequences of actions. While these methods are increasingly powerful and can even generate seemly creative solutions, they have no understanding of ethics: they don’t understand harm nor can they distinguish between good and bad side effects of their actions. We propose to develop representations and algorithms fill this gap.

Technical Abstract: Recent advances in probabilistic planning and reinforcement learning have resulted in impressive performance at tasks as varied as mobile robotics, self-driving cars, and playing Atari video games. As these algorithms get deployed in real-world environments, it becomes critical to ensure that their utility-seeking behavior does not result in unintended, harmful side-effects. We need a way to specify a set of agent ethics: social norms that we can trust the agent will not knowingly violate. Developing mechanisms for defining and enforcing such ethical constraints requires innovations ranging from improved vocabulary grounding to more robust planning and reinforcement learning algorithms.

Project Summary: We are unsure about what moral system is best for humans, let alone for potentially super-intelligent machines. It is likely that we shall need to create artificially intelligent agents to provide moral guidance and police issues of appropriate ethical values and best practice, yet this poses significant challenges. Here we propose an initial evaluation of the strengths and weaknesses of one avenue by investigating self-policing intelligent agents. We shall explore two themes: (i) adding a layer of AI agents whose express purpose is to police other AI agents and report unusual or undesirable activity (potentially this might involve setting traps to catch misbehaving agents, and may consider if it is wise to allow policing agents to take corrective action against offending agents); and (ii) analyzing simple models of evolving adaptive agents to see if robust conclusions can be learned. We aim to survey related literature, identify key areas of hope and concern for future investigation, and obtain preliminary results for possible guarantees. The proposal is for a one year term to explore the ideas and build initial models, which will be made publicly available, ideally in journals or at conferences or workshops, with extensions likely if progress is promising.

Technical Abstract: We are unsure about what moral system is best for humans, let alone for potentially super-intelligent machines. It is likely that we shall need to create artificially intelligent agents to provide moral guidance and police issues of appropriate ethical values and best practice, yet this poses significant challenges. Here we propose an initial evaluation of the strengths and weaknesses of one avenue by investigating self-policing intelligent agents. We shall explore two themes: (i) adding a layer of AI agents whose express purpose is to police other AI agents and report unusual or undesirable activity (potentially this might involve setting traps to catch misbehaving agents, and may consider if it is wise to allow policing agents to take corrective action against offending agents); and (ii) analyzing simple models of evolving adaptive agents to see if robust conclusions can be learned. We aim to survey related literature, identify key areas of hope and concern for future investigation, and obtain preliminary results for possible guarantees. The proposal is for a one year term to explore the ideas and build initial models, which will be made publicly available, ideally in journals or at conferences or workshops, with extensions likely if progress is promising.

Project Summary: The devastation of the 2008 financial crisis remains a fresh memory seven years later, and its effects still reverberate in the global economy. The loss of trillions of dollars in output, and associated tragedy of displacement for millions of people demonstrate in the most vivid way the crucial role of a functional financial system for modern civilization. Unlike physical disasters, financial crises are essentially information events: shocks in the beliefs and expectations of individuals and organizations–about asset values, ability of counterparties to meet obligations, etc.–that nevertheless have real consequences for everyone.

This pivotal and fragile sector also happens to be at the leading edge of autonomous computational (AI) decision making. For large classes of financial assets, trading is dominated by algorithms, or “bots”, operating at speeds well beyond the scale of human reaction times. This regime change is a fait accompli, despite our unresolved debates and generally poor understanding of its implications for fundamental market stability as well as performance and efficiency.

We propose a systematic in-depth study of AI risks to the financial system. Our goals are to identify the main pathways of concern and generate constructive solutions for making financial infrastructure more robust to interaction with AI participants.

Technical Abstract: The financial system presents a critical sector of our society, at the leading-edge of AI engagement and especially vulnerable to impact from near-term AI advances. Algorithmic and high-frequency trading now dominate financial markets, yet their implications for market stability are poorly understood. In this project we undertake a systematic investigation of how AI traders can impact market stability, and how extreme movements in securities markets in turn can impact the real economy. We develop a general framework for automated trading based on a flexible architecture for arbitrage reasoning. Through agent-based simulation combined with game-theoretic strategy selection, we search for vulnerabilities in financial markets, and characterize the conditions that enable or prevent their exploitation. A new approach to modeling complex networks of financial obligations is applied to the study of contagion between asset-pricing anomalies and panics in the broader financial system. Results from this study will be employed to design market rules, monitoring technologies, and regulation techniques that promote stability in a world of algorithmic traders.

Project Summary: Codes of ethics play an important role in many sciences. Such codes aim to provide a framework within which researchers can understand and anticipate the possible ethical issues that their research might raise, and to provide guidelines about what is, and is not, regarded as ethical behaviour. In the medical sciences, for example, codes of ethics are fundamentally embedded within the research culture of the discipline, and explicit consideration of ethical issues is a standard expectation when research projects are planned and undertaken. In this project, we aim to start developing a code of ethics for AI research by learning from this interdisciplinary experience and extending its lessons into new areas. The project will bring together three Oxford researchers with expertise in artificial intelligence, philosophy, and applied ethics.

Technical Abstract: Codes of ethics play an important role in many sciences. Such codes aim to provide a framework within which researchers can understand and anticipate the possible ethical issues that their research might raise, and to provide guidelines about what is, and is not, regarded as ethical behaviour. In the medical sciences, especially, codes of ethics are fundamentally embedded within the research culture, and explicit consideration of ethical issues is a standard expectation when research projects are planned and undertaken. The aim of this project is to develop a solid basis for a code of artificial intelligence (AI) research ethics, learning from the scientific and medical community’s experience with existing ethical codes, and extending its lessons into three important and representative areas where artificial intelligence comes into contact with ethical concerns: AI in medicine and biomedical technology, autonomous vehicles, and automated trading agents. We will also explore whether the design of ethical research codes might usefully anticipate, and potentially ameliorate, the risks of future research into superintelligence. The project brings together three Oxford researchers with highly relevant expertise in artificial intelligence, philosophy, and applied ethics, and will also draw strongly on other research activity within the University of Oxford.

Project Summary: “I don’t know” is a safe and appropriate answer that people provide to many posed questions. To appropriately act in a variety of complex tasks, our artificial intelligence systems should incorporate similar levels of uncertainty. Instead, state-of-the-art statistical models and algorithms that enable computer systems to answer such questions based on previous experience often produce overly confident answers. Due to widely used modeling assumptions, this is particularly true when new questions come from situations that differ substantially from previous experience. In other words, exactly when human-level intelligence provides less certainty when generalizing from the known to the unknown, artificial intelligence tends to provide more. Rather than trying to engineer fixes to this phenomenon into existing methods, We propose a more pessimistic approach based on the question: “What is the worst-case possible for predictive data that still matches with previous experiences (observations)?” We propose to analyze the theoretical benefits of this approach and demonstrate its applied benefits on prediction tasks.

Technical Abstract: Reliable inductive reasoning that uses previous experiences to make predictions of unseen information in new situations is a key requirement for enabling useful artificial intelligence systems.

Tasks ranging over recognizing objects in camera images, predicting the outcomes of possible autonomous system controls, and understanding the intentions of other intelligent entities each depend on this type of reasoning. Unfortunately, existing techniques produce significant unforeseen errors when the underlying statistical assumptions they are based upon do not hold in reality. The nearly ubiquitous assumption that estimated relationships in future situations will be similar to previous experiences (i.e., past and future data is assumed to be exchangeable or independent and identically distributed–IID–according to a common distribution) is particularly brittle when employed within artificial intelligence systems that autonomously interact with the physical world. We propose an adversarial formulation for cost-sensitive prediction under covariate shift—a relaxation of this statistical assumption. This approach provides robustness to data shifts between predictive model estimation and deployment while incorporating mistake-specific costs for different errors that can be tied to application outcomes. We propose theoretical analysis and experimental investigation of this approach for standard and active learning tasks.

Project Summary: We propose the creation of a joint Oxford-Cambridge research center, which will develop policies to be enacted by governments, industry leaders, and others in order to minimize risks and maximize benefit from artificial intelligence (AI) development in the longer term. The center will focus explicitly on the long-term impacts of AI, the strategic implications of powerful AI systems as they come to exceed human capabilities in most domains of interest, and the policy responses that could best be used to mitigate the potential risks of this technology.

There are reasons to believe that unregulated and unconstrained development could incur significant dangers, both from “bad actors” like irresponsible governments, and from the unprecedented capability of the technology itself. For past high-impact technologies (e.g. nuclear fission), policy has often followed implementation, giving rise to catastrophic risks. It is important to avoid this with superintelligence: safety strategies, which may require decades to implement, must be developed before broadly superhuman, general-purpose AI becomes feasible.

This center represents a step change in technology policy: a comprehensive initiative to formulate, analyze, and test policy and regulatory approaches for a transformative technology in advance of its creation.

Technical Abstract: here

Project Summary: It is crucial for AI researchers to be able to reason carefully about the potential risks of AI, and about how to maximize the odds that any superintelligence that develops remains aligned with human values (in what the Future of Life Institute refers to as the “AI alignment problem”).

Unfortunately, cognitive science research has demonstrated that even very high-IQ humans are subject to many biases that are especially likely to impact their judgment on AI alignment. Leaders in the nascent field of AI alignment have found that a deep familiarity with cognitive bias research, and practice overcoming those biases, has been crucial to progress in the field.

We therefore propose to help spread key reasoning skills and community norms throughout the AI community, via the following:

In 2016, we will hold a workshop for 45 of the most promising AI students (graduate, undergraduate, and postdocs), in which we train them in the thinking skills most relevant to AI alignment. We will maintain contact with AI students after the workshop, helping them to stay in contact with the alignment issue and collaborate with each other to spread useful skills throughout the community and discover new ones themselves.

Project Summary: The impact of AI on society depends not only on the technical state of AI research, but also its sociological state. Thus, in addition to current AI safety research, we must also ensure that the next generation of AI researchers is composed of thoughtful, intelligent, safety-conscious individuals. The more the AI community as a whole consists of such skilled, broad-minded reasoners, the more likely AI is to be developed in a safe and beneficial manner.

Therefore, we propose running a summer program for extraordinarily gifted high school students (such as competitors from the International Mathematics Olympiad), with an emphasis on artificial intelligence, cognitive debiasing, and choosing a high-positive-impact career path, including AI safety research as a primary consideration. Many of our classes will be about AI and related technical areas, with two classes specifically about the impacts of AI on society.

Project Summary: We propose to hold a one-day summit (in spring 2017) at Washington, DC, on the subject of artificial intelligence (broadly conceived) and the future of work. The goal is to put this issue on the national agenda in an informed and deliberate manner rather than the typically-alarmist and over-the-top accounts disseminated by the mainstream media. The location is important to ensure attendance by policy makers and leaders of funding agencies. The summit will bring together leading technologists, economists, sociologists, and humanists, who will offer the views on where technology is going, what its impact may be, and what research issues are raised by these projections.

The summit will be sponsored by the Computing Research Association (CRA), whose Government Affairs Committee has extensive experience of reaching out to policy makers. We will also reach out to other relevant societies, such as US-ACM, and AAAS.

Project Summary: Driverless cars, service robots, surveillance drones, computer networks collecting data, and autonomous weapons are just a few examples of increasingly intelligent technologies scientists are developing. As they progress, researchers face a series of questions about whether these machines can be designed and engineered to take morally significant actions previously reserved for human actors. Can they ensure that artificially intelligent systems will always be demonstrably beneficial, safe, controllable, and sensitive to human values? Many individuals and groups have begun tackling the various subprojects entailed in this challenge. They are, however, often unaware of efforts in complementary fields. Thus they lose opportunities for creative collaboration, miss gaps in their own research, and reproduce work being performed by potential colleagues. The Hastings Center proposes to convene a series of three solution-directed workshops with national and international experts in the various pertinent fields. Together they will develop collaborative strategies and research projects, and forge an outline for a comprehensive plan to insure autonomous systems will be demonstrably beneficial, and that this innovative research progresses in a responsible manner. The results of the workshop will be conveyed through a special report, a dedicated edition of a scholarly journal, and two public symposia.

Technical Abstract: The vast array of challenges entailed in designing, engineering, and implementing demonstrably beneficial, safe and controllable AI systems are slowly being addressed by scholars working on distinct research trajectories across many disciplines. They are often unaware of efforts in complementary fields, thus losing opportunities for creative synergies, missing gaps in their own research, and reproducing the work of potential colleagues. The Hastings Center proposes to convene a series of three solution-directed workshops with national and international experts in the varied fields. Together they will address trans-disciplinary questions, develop collaborative strategies and research projects, and forge an outline for a comprehensive plan encompassing the many elements of ensuring autonomous systems will be demonstrably beneficial, and that this innovative research progresses in a responsible manner. The workshops’ research and policy agenda will be published as a Special Report of the journal Hastings Center Report and in short form in a science or engineering journal. Findings will also be presented through two public symposia, one of which will be webcast and available on demand. We anticipate significant progress given the high caliber of the people who are excited by this project and have already committed to join our workshops.