Transcript

Robert Wiblin: Hi listeners, this is the 80,000 Hours Podcast, where each week we have an unusually in-depth conversation about the world’s most pressing problems and how you can use your career to solve them. I’m Rob Wiblin, Director of Research at 80,000 Hours.

Today’s episode is long for a reason. My producer – Keiran Harris – listened to our first recording session and said it was his favourite episode so far, so we decided to go back and add another 90 minutes to cover issues we didn’t make it to first time around.

As a result, the summary can only touch on a fraction of the topics that come up. It really is pretty exciting.

I hope you enjoy the episode as much as we did and if you know someone working near AI or machine learning, please do pass this conversation on to them.

Just quickly before that I want to let you know that last week we released probably our most important article of the year. It’s called “These are the world’s highest impact career paths according to our research”, and it summarises many years of work into a single article that brings you up to date on what 80,000 Hours recommends today.

It outlines our new suggested process which any of you could potentially use to generate a short-list of high-impact career options given your personal situation. It then describes the five key categories of career that we most often recommend, which should be able to produce at least one good option for almost all graduates.

Finally, it lists and explains the top 10 ‘priority paths’ we want to draw attention to, because we think they can enable to right person to do an especially large amount of good for the world.

I definitely recommend checking it out – we’ll link to it from the show-notes and blog post.

Here’s Paul.

Robert Wiblin: Today, I’m speaking with Dr. Paul Christiano. Paul recently completed a PhD in theoretical computer science at UC Berkeley and is now a researcher at Open AI, working on aligning artificial intelligence with human values. He blogs at ai-alignment.com. Thanks for coming on the podcast, Paul.

Paul Christiano: Thanks for having me.

Robert Wiblin: We plan to talk about Paul’s views on how the transition to an AI economy will actually occur and how listeners can contribute to making that transition go better, but first, I’d like to give you a chance to frame the issue of AI alignment in your own words. What is the problem of AI safety and why did you decide to work on it yourself?

The problem of AI safety

Paul Christiano: AI alignment, I see as the problem of building AI systems that are trying to do the thing that we want them to do. So in some sense, that might sound like it should be very easy because we build an AI system, we get to choose all … we get to write the code, we get to choose how the AI system is trained. There are some reasons that it seems kind of hard to train an AI system to do exactly … So we have something we want in the world, for example we want to build an AI, we want it to help us govern better, we want it to help us enforce the law, we want it to help us run a company. We have something we want that AI to do, but technical reasons, it’s not trivial to build the AI that’s actually trying to do the thing we want it to do. That’s the alignment problem.

I care about that problem a lot because I think we’re moving towards a world where most of the decisions are made by intelligent machines and so if those machines aren’t trying to do those things humans want them to do, then the world is going to go off in a bad reaction. If the AI systems we can build are really good at … it’s easy to train them to maximize profits, or to get users to visit the websites, or to get users to press the button saying that the AI did well, then you have a world that’s increasingly optimized for things like making profits or getting users to click on buttons, or getting users to spend time of websites without being increasingly optimized for having good policies, heading in a trajectory that we’re happy with, helping us figure out what we want and how to get it.

So that’s the alignment problem. The safety problem is somewhat more broadly, understand things that might go poorly with AI and what technical work and political work we can do to improve the probability that things go well.

Robert Wiblin: Right, so what concretely do you do at Open AI?

Paul Christiano: So I do machine learning research, which is a combination of writing code and running experiments and thinking about how machine learning systems should work, trying to understand what are the important problems, how could we fix them, plan out what experiments give us interesting information, what capabilities do we need if we want to build aligned AI five years, 10 years, 20 years down the road? What are the capabilities we need, what should we do today to work towards those capabilities? What are the hardest parts? So trying to understand what we need to do and then actually trying to do it.

AI alignment

Robert Wiblin: Makes sense. So the first big topic that I wanted to get into was kind of the strategic landscape of artificial intelligence, safety research, both technical and, I guess, political and strategic. Partly I wanted to do that first because I understand it better than the technical stuff, so I didn’t want to be floundering right off the bat. What basically caused you to form your current views about AI alignment and to regard it as a really important problem? Maybe also, how have your views on this changed over time?

Paul Christiano: So there are a lot of parts on my views on this, it’s like a complicated pipeline from do the most good for the most people, to write this particular machine learning code. I think very broadly speaking, I come in with this utilitarian perspective, I do care about more people more, then you start thinking, you take that perspective and you think that future populations will be very large, you start asking, what are the features of the world today that affect the long run trajectory of civilization? I think if you come in with that question, there’s two very natural categories of things, there’s if we all die then we’re all dead forever, and second, there’s sort of a distribution of values, or optimization in the world, and that can be sticky in the sense that if you create entities that are optimizing something, those entities can entrench themselves and be hard to move. In the same way that humans are kind of hard to remove at this point. You try and kill humans, humans bounce back.

There are a few ways you can change the distribution of values in the world. I think the most natural, or the most likely one, is as we build AI systems, we’re going to sort of pass the torch from humans, who want one set of things, to AI systems, that potentially want a different set of things. So in addition to going extinct, I think bungling that transition is the easiest way to head in a bad direction, or to permanently alter the structure of civilization.

So at a very high level, that’s kind of how I got to thinking about AI many years ago, and then once you have that perspective, one then has to look at the actual character of AI and say how likely is this failure mode? That is what actually determines what AI is trying to optimize, and start thinking in detail about the kinds of techniques people are using to produce AI. I think that after doing that, I became pretty convinced that there are significant problems. So there’s some actual difficulty there of building an AI that’s trying to do the thing that the human who built it wants it to do. If we could resolve that technical problem, that’d be great. Then we dodge this difficulty of humans maybe passing off control to some systems that don’t want the same things we want.

Then, zooming in a little bit more, if the whole world … Right, so this is a problem which some people care about, we also care about a lot of other things though, and we’re also all competing with one another which introduces a lot of pressure for us to build whatever kind of AI works best. So there’s some sort of fundamental tension between building AI that works best for the tasks that we want our AI to achieve, and building AI which robustly shares our values, or is trying to do the same things that we want it to do.

So it seems like the current situation is we don’t know how to build AI that is maximally effective but still robustly beneficial. If we don’t understand that, then people deploying AI will face some trade-off between those two goals. I think by default, competitive pressures would cause people to push far towards the AI that’s really effective at doing what we want it … Like, really effective at acquiring influence or navigating conflict, or so on, but not necessarily robustly beneficial. So then we would need to either somehow coordinate to overcome that pressure. So we’d have to all agree we’re going to build AI that actually does what we want it to do, rather than building AI which is effective in conflict, say. Or, we need to make technical progress so there’s not that trade-off.

Arms race dynamic

Robert Wiblin: So to what extent do you view the arms race dynamic, the fact that people might try to develop AI prematurely because they’re in a competitive situation, as the key problem that’s driving the lack of safety?

Paul Christiano: So I think the competitive pressure to develop AI, in some sense, is the only reason there’s a problem. I think describing it as an arms race feels somewhat narrow, potentially. That is, the problem’s not restricted to conflicted among states, say. It’s not restricted even to conflict, per se. If we have really secure property, so if everyone owns some stuff and the stuff they owned was just theirs, then it would be very easy to ignore … if individuals could just opt out of AI risk being a thing because they’d just say, “Great, I have some land and some resources and space, I’m just going to chill. I’m going to take things really slow and careful and understand.” Given that’s not the case, then in addition to violent conflict, there’s … just faster technological progress tends to give you a larger share of the stuff.

Most resources are just sitting around unclaimed, so if you go faster you get more of them, where if there’s two countries and one of them is 10 years ahead in technology, that country will, everyone expects, expand first to space and over the very long run, claim more resources in space. In addition to violent conflict, de facto, they’ll claim more resources on earth, et cetera.

I think the problem comes from the fact that you can’t take it slow because other people aren’t taking it slow. That is, we’re all forced to develop technology fast as we could. I don’t think of it as restricted to arms races or conflict among states, I think there would probably still be some problem, just because people … Even if people weren’t forced to go quickly, I think everyone wants to go quickly in the current world. That is, most people care a lot about having nicer things next year and so even if there were no competitive dynamic, I think that many people would be deploying AI the first time it was practical, to become much richer, or advance technology more rapidly. So I think we would still have some problem. Maybe it would be a third as large or something like that.

Attention

Robert Wiblin: How much attention are people paying to these kind of problems now? My perception is that the amount of interest has ramped up a huge amount, but of course, I guess the amount the number of resources going into just increasing the capabilities of AI has also been increasing a lot, so it’s unclear whether safety has become a larger fraction of the whole.

Paul Christiano: So I think in terms of profile of the issue, how much discussion there is of the problem, safety has scaled up faster than AI, broadly. So it’s a larger fraction of discussion now. I think that more discussion of the issue doesn’t necessarily translate to anything super productive. It definitely translates to people in machine learning maybe being a little bit annoyed about it. So it’s a lot of discussion, discussion’s scaled up a lot. The number of people doing research has also scaled up significantly, but I think that’s maybe more in line with the rate of progress in the field. I’m not sure if fraction of people working on, “I’m full time …” Actually, no I think that’s also scaled up, maybe by a factor of two relatively, or something.

So if one were to look at publications and taught machine learning conferences, there’s an increasing number, maybe a few in the last NIPS, that are very specifically directed at the problem, “We want our AI to be doing the thing that we want it to be doing and we don’t have a way to do that right now. Let’s try and push technology in that direction. To build AI to understand what we want and help us get it.” So now we’re at the point where there’s a few papers in each conference that are very explicitly targeted at that goal, up from zero to one.

At the same time, there’s aspects of the alignment problem that are more clear, so things like building AI that’s able to reason about what humans want, and there’s aspects that are maybe a little bit less clear, like more arcane seeming. So for example, thinking about issues distinctive to AI which exceeds human capabilities in some respect. I think the more arcane issues are also starting to go from basically nothing to discussed a little bit.

Robert Wiblin: What kind of arcane issues are you thinking off?

Paul Christiano: So there’s some problem with building weak AIs, say, that want to do what humans want them to do. There’s then a bunch of additional difficulties that appear when you imagine the AI that you’re training is a lot smarter than you are in some respect. So then you need some other strategy. So in that regime, it becomes … When you have a weak AI, it’s very easy to say what the goal is, what you want the AI to do. You want it do something that looks good to you. If you have a very strong AI, then you actually have a philosophical difficulty of what is the right behavior for such a system. It means that all the answers … there can be no very straightforward technical answer if we prove a theorem and say this is the right … or you can’t nearly prove a theorem. You have to do some work to say, we’re happy with what this AI is doing, even though, no human understands, say, what this AI’s doing.

Same parallel with device specification stuff. Another big part of alignment is understanding training models that continue to do … you train your model to do something. On the training distribution, you’ve trained your AI, on the training distribution, it does what you want. There’s a further problem of maybe when you deploy it, or on the test distribution it does something catastrophically different from what you want, and that’s also … on that problem, I think interest has probably scaled up even more rapidly. So the number of people thinking about adversarial machine learning, can an adversary find some situation in which your AI does something very bad, then people working on that problem has scaled up. I think it’s more than doubled as a fraction of the field, although it’s still in absolute terms, kind of small.

Robert Wiblin: What do you think would cause people to seriously scale up their work on this topic and do you think it’s likely to come in time to solve the problem, if you’re right that there are serious risks here?

Paul Christiano: Yeah, so I think that where we’re currently at, it seems clear that there is a real problem. There is this technical difficulty of building AI that does what we want it to do. It’s not yet clear if that problem is super hard, so I think we’re really uncertain about that. I’m working on it, not because I’m confident it’s super hard, but because it seems pretty plausible that it’s hard. I think that the machine learning community would be much, much motivated to work on the problem if it became clear that this was going to be a serious problem. If you aren’t super good at coping with, “Well there’s a 30% chance this is going to be a huge problem,” or something like that. I think one big thing is as it becomes more clear, then I think many more people will work on the problem.

So when I talk about these issues of training weaker AI systems to do what humans want them to do, I think it is becoming more clear that that’s a big problem. So for example, we’re getting to the point where robotics is getting good enough that it’s going to be limited by, or starting to be limited by, who communicates to the robot what it actually ought to be doing. Or people are becoming very familiar with … YouTube has an algorithm that decides what video it will show you. People have some intuitive understanding, they’re like, “That algorithm has a goal and if that goal is not the goal that we collectively has school and the users of YouTube would want, that’s going to push the world in this annoying direction.” It’s going to push the world towards people spending a bunch of time on YouTube rather than their lives being better.

So you think, we are currently at the stage where some aspects of these problems are becoming more obvious, and that makes it a lot easier for people to work on those aspects. As we get closer to AI, assuming that these problems are serious, it’s going to become more and more obvious that the problems are serious. That is, we’ll be building AI systems which, humans don’t understand what they do, and the fact that their values are not quite right is causing serious problems.

I think that’s one axis and then the other axis is … So, I’m particularly interested in the possibility of transformative AI that has a very large effect on the world. So the AI that starts replacing humans in the great majority of economically useful work. I think that right now, we’re very uncertain about what the timelines are for that. I think there’s a reasonable chance within 20 years, say, but certainly there’s not compelling evidence that it’s going to be within 20 years. I think as that becomes more obvious, then many more people will start thinking about catastrophic risks in particular, because those will become more plausible.

Robert Wiblin: So your concerns about how transformative AI could go badly have become pretty mainstream but not everyone is convinced. How compelling do you think the arguments are that people should be worried about this and is there anything that you think that you’d like to say to try to persuade skeptics who might be listening?

Paul Christiano: I think almost everyone is convinced that there is … or almost everyone in machine learning, is convinced that there’s a problem. That there’s an alignment problem. There’s the problem of trying to build AI to do what you want it to do and that that requires some amount of work. I think the point of disagreement … there’s few points of disagreement within the machine learning community. So one is, is that problem hard enough that it’s a problem that’s worth trying to focus on and trying to push differentially? Or is that the kind of problem that should get solved in the normal business of doing AI research? So that’s one point of disagreement. I think on that point, I think in order to be really excited about working that problem, you have to be thinking, what can we do to affect how AI goes better?

If you’re just asking how can we have really powerful AI that does good things as soon as possible, then I think it’s actually not that compliant an argument to work on alignment. But I think if you’re asking the question how do we actually maximize so probably this goes well, then it doesn’t really matter whether that ought to be part of the job of AI researchers, we should be really excited about putting more resources into that to make it go faster and I think if someone really takes seriously the goal of trying to make AI go well instead of just trying to push on AI and trying to make cool stuff happen sooner, or trying to realize benefits over the next five years, then I think that case is pretty strong right now.

Another place there’s a lot of disagreement in the ML community is, maybe it’s more an issue of framing than an issue of substance, which is the kind of thing I find pretty annoying. There’s one frame where you’re like, “AI’s very likely to kill everyone, there’s going to be some robot uprising. It’s going to be a huge mess, this should be on top of our list of problems.” And there’s another framing where it’s like, “Well, if we, as the AI community, fail to do our jobs, then yes something bad would happen.” But it’s kind of offensive for you to say that we as the AI community are going to fail to do our jobs. I don’t know if I would really need to … it doesn’t seem like you should really have to convince anyone on the second issue.

You should be able to be like, “Yes, it’d be really bad if we failed to do our jobs.” Now, this discussion we’re currently having is not part of us trying to argue that everyone should be freaking out, this is us trying to argue like … this is us doing our jobs. This discussion we’re having right now. You can’t have a discussion about us trying to do our jobs and be like, “Yes, it’s going to be fine because we’re going to do our jobs.” That is an appropriate response in some kinds of discussion, maybe …

Robert Wiblin: But when you’re having the conversation about are we going to spend some money of this now, then …

Paul Christiano: Yeah, then I think it’s not such a great response. I think safety’s a really unfortunate word. Lot’s of people don’t like safety, it’s kind of hard to move away from. If you describe the problem, like with training AI to do what we want it to do to people, they’re like, “Why do you call that safety?” That’s the problem with building good AI, and that’s fine, I’m happy with that. I’m happy saying, “Yep, this is just doing AI reasonably well.” But then, yeah, it’s not really an argument about why one shouldn’t push more money into that area, or shouldn’t push more effort into that area. It’s a part of AI that’s particularly important to whether AI’s a positive or negative effect.

Yeah, I think in my experience, those are the two biggest disagreements. The biggest substantive disagreement is on the, “Is this a thing that’s going to get done easily anyway?” I think there people tend to have … maybe it’s just a normal level of over-confidence about how easy problems will end up being, together with not having a real … I think there aren’t that many people who are really prioritizing the question, “How do you make AI go well?” Instead of just, “How do make …” Like, choose some cool thing they want to happen. “How do I make that cool thing happen as soon as possible in calendar time?” I think that’s unfortunate, it’s a hard thing to convince people on, in part because values discussions are always a little bit hard.

Best arguments against being concerned

Robert Wiblin: So what do you think are the best arguments against being concerned about this issue, or at least, wanting to prioritize directing resources towards it, and why doesn’t it persuade you?

Paul Christiano: So I think there’s a few classes of arguments. Probably the ones I find most compelling are opportunity cost arguments where someone says, “Here’s a concrete alternative. Yeah, you’re concerned about x, have you considered that y’s even more concerning?” I can imagine someone saying, “Look, the risk of bioterrorism killing everyone is high enough that you should … on the margin, returns to that are higher than returns to AI safety.” At least they’re not compelled by those arguments as well, part of that is competitive advantage thing where like, “I don’t really have to evaluate those arguments because it’s clear what my competitive advantage is.” In part, I have a different reason, I’m not compelled for every argument of that form. So that’s one class of arguments against.

In terms of the actual value of working on AI safety, I think the biggest concern is this, “Is this an easy problem that will get solved anyway?” Maybe the second biggest concern is, “Is this a problem that’s so difficult that one shouldn’t bother working on it or one should be assuming that we need some other approach?” You could imagine, the technical problem is hard enough that almost all the bang is going to come from policy solutions rather than from technical solutions.

And you could imagine, those two concerns maybe sound contradictory, but aren’t necessarily contradictory, because you could say, “We have some uncertainty about this parameter of how hard this problem is.” Either it’s going to be easy enough that it’s solved anyway, or it’s going to be hard enough that working on it now isn’t going to help that much and so what mostly matters is getting our policy response in order. I think I don’t find that compelling, in part because one, I think the significant probability on the range … like the place in between those, and two, I just think working on this problem earlier will tell us what’s going on. If we’re in the world where you need a really drastic policy response to cope with this problem, then you want to know that as soon as possible.

It’s not a good move to be like, “We’re not going to work on this problem because if it’s serious, we’re going to have a dramatic policy response.” Because you want to work on it earlier, discover that it seems really hard and then have significantly more motivation for trying the kind of coordination you’d need to get around it.

Robert Wiblin: It seems to me like it’s just too soon to say whether it’s very easy, moderately difficult or very difficult, does that seem right?

Paul Christiano: That’s definitely my take. So I think people make some arguments in both directions and we could talk about particular arguments people make. Overall, I find them all just pretty unconvincing. I think a lot of the like, “It seems easy,” comes from just the intuitive, “Look, we get to build the AI, we get to choose the training process. We get to look at all the competition AI is doing as it thinks. How hard can it be to get the AI to be trying to do …” or maybe not, maybe it’s hard to get it to do exactly what you want but how hard can it be to get it to not try and kill everyone?

That sounds like a pretty … there’s a pretty big gap between the behavior we want and the behavior reasoning about what output is going to most lead to humans being crushed. That’s a pretty big gap. Feels like you ought to be able to distinguish those, but I think that’s not … There’s something to that kind of intuition. It is relevant to have a reasoning about how hard a problem is but it doesn’t carry that much weight on it’s own. You really have to get into the actual details of how we’re producing AI systems, how is that likely to work? What is the distribution of possible outcomes in order to actually sustain anything with confidence? I think once you do that, the picture doesn’t look quite as rosy.

Robert Wiblin: You mentioned that one of the most potentially compelling counter arguments was that there’s just other really important things for people to be doing that might be even more pressing. Yeah, what things other than AI safety do you think are among the most important things for people to be working on?

Paul Christiano: So I guess I have two kinds of answers to this question. One kind of answer is what’s the standard list of things people would give? Which I think are the most likely things to be good alternatives. So for example, amongst the utilitarian crowd, I think the talking about an essential risk from engineered pandemics is a very salient option, there’s a somewhat broader bioterror category. I think off other things in this genre, one could also look at the world more broadly, so intervening on political process, improve political institutions, or just push governance in a particular direction that we think is conducive to a good world, or a world on a good longterm trajectory.

Those are examples of problems that lots of people would advocate for and therefore, I think if lots of people think x is important, that’s good evidence that x is important. The second kind of answer, which is the problems that I find most tempting to work, which is going to be related to … it’s going to tend to be systematically be things that other people don’t care about, I also think there’s a lot of value. Yeah, one can add a lot of value if there’s a thing that’s important, if you care about the ratio of how important it actually is. Or how important other people think it is and how important it actually is.

So at that level, things that I’m like … I’m particularly excited about very weird utilitarian arguments. So I’m particularly excited about people doing more thinking about what actual features of the world affect, whether on a positive or negative trajectory. So thinking about things … There’s a lot of considerations that are extremely important, from the long run utilitarian perspective, that are just not very important according to people’s normal view of the world, or normal values. So you find one big area is just thinking about and acting on, sort of that space of considerations.

So an example, which is a kind of weird example, but hopefully illustrates the point, is normal people care a ton about whether humanity … they care a ton about catastrophic risks. They would really care if everyone died. I think to a weird utilitarian, you’re like, “Well, it’d be bad if everyone died, but even in that scenario, there was a bunch of weird stuff you would do to try and improve the probability that things turn out okay in the end.” So these include things like working on extremely robust bunkers that are capable of repopulating the world, or trying to … in the extreme case where all humans die, you’re like, “Well we’d like some other animal later to come along and if all intelligent life began and colonize the stars.” Those are weird scenarios, the scenarios that basically no one tries to push on … No one is asking, “What could we do as a civilization to make it better for the people who will come after us if we manage to blow ourselves up?”

So because no one is working on them, even though they’re not that important in absolute terms, I think it’s reasonably likely that they’re good things to work on. Those are examples of kind of weird things. There’s a bunch of not as weird things that also seem pretty exciting to me. Especially things about improving how well people are able to think, or improving how well institutions function, which I’d be happy to get into more detail on, but are not things I’m expert in.

Robert Wiblin: Yeah, maybe just want to list off a couple of those?

Paul Christiano: So just all the areas that seem … are high level areas that seem good to me, so a list of … Thinking about the utilitarian picture and what’s important to our future focused utilitarian, there’s thinking about extinction risks. Maybe extinction risks that are especially interesting to people who care about extinction. So things like bunkers, things like repopulation of the future, things like understanding the tails of normal risks. So understanding the tails of climate change, understanding the tails of nuclear war.

More normal interventions like pushing on peace, but especially with an eye to avoiding the most extreme forms of war, or mitigating the severity of all out war. Pushing on institutional equality, so experimenting with institutions like prediction markets, different ways of aggregating information, or making decisions across people. Just running tons of experiments and understanding what factors influence individual cognitive performance, or individual performance within organizations, or for decision making.

An example of a thing that I’m kind of shocked by is how little study there is of nootropics and cognitive enhancement broadly. I think that’s a kind of thing that’s relatively cheap and seems such good bang for your buck and expectation, that it’s pretty damning for civilization that we haven’t invested in it. Yeah, those are a few examples.

Importance of location of the best AI safety team

Robert Wiblin: Okay, great. Coming back to AI, how important is it to make sure that the best AI safety team ends up existing within the organization, that has the best general machine learning firepower behind it?

Paul Christiano: So you could imagine splitting up the functions of people who work on AI safety into two categories. One category is developing technical understanding, which is sufficient to build aligned AI. So this is doing research saying, “Here are some algorithms, here’s some analysis that seems important.” Then a second function is actually affecting the way that an AI project is carried out, to make sure it reflects our understanding of how to build an aligned AI. So for the first function, it’s not super important. For the first function, if you want to be doing research on alignment, you want to have access to machine learning expertise, so you need to be somewhere that’s doing reasonably good machine learning research but it’s not that important that you be at the place that’s actually at the literal cutting edge.

From the perspective of the second function, it’s quite important. So if you imagine someone actually building very, very powerful AI systems, I think the only way in practice that society’s expertise about how to build aligned AI is going to affect the way that we build AGI, is by having a bunch of people who have made it their career to understand those considerations and work on those considerations, who are involved in the process of creating AGI. So for that second function it’s quite important that if you want an AI to be safe, you want people involved in development of that AI to basically be alignment researchers.

Robert Wiblin: Do you think we’re heading towards a world where we have the right distribution of people?

Paul Christiano: Yeah so I think things are currently okay on that front. I think as we get closer … so we’re currently in a mode where we can imagine … we’re somewhat confident there will be powerful AI systems within two or three years and so for the short term, there’s not as much pressure as there will be closer to the day to consolidate behind projects that are posing a catastrophic risk. It would optimistic that if we were in that situation where we actually faced significant prospect of existential risk from AI over the next two years, then there would be significantly more pressure for … both pressure for safety researchers to really follow wherever that AI was being built or be allocated across the organizations that are working on AI that poses an existential risk, and also a lot of pressure within such organizations to be actively seeking safety researchers.

My hope would be that you don’t have to really pick. Like the safety researchers don’t have to pick a long time in advance what organizations you think will be doing that development, you can say, “We’re going to try and develop the understanding that is needed to make this AI safe. We’re going to work in an organization that is amongst those that might be doing development of dangerous AI and then we’re going to try and live in the kind of world where as we get very close, there’s a lot of … people understand the need for and are motivated to concentrate more expertise on alignment and safety,” and that that occurs at that time.

Robert Wiblin: It seems like there’s some risks to creating new organizations because you get a splintering of the effort and also potential coordination problems between the different groups. How do you feel we should split additional resources between just expanding existing research organizations versus creating new projects?

Paul Christiano: So I agree that to the extent that we have a coordination problem amongst developers of AI, to the extent that the field is hard to reach agreements or regulate, as there are more and more actors, then almost equally prefer not to have a bunch of new actors. I think that’s mostly the case for people doing AI development, so for example, for projects that are doing alignment per se, I don’t think it’s a huge deal and should mostly be determined by other considerations, whether to contribute to existing efforts or create new efforts.

I think in the context of AI projects, I think almost equal, one should only be creating new AI … if you’re interested in alignment, you should only be creating new AI projects where you have some very significant interest in doing so. It’s not a huge deal, but it’s nicer to have a smaller number of more pro-social actors than to have a larger number of actors with uncertain … or even a similar distribution of motivations.

Variance in outcomes

Robert Wiblin: So how much of the variance in outcomes from artificial general intelligence, in your estimates, comes from uncertainty about how good we’ll be at actually working on the technical AI alignment problem, versus uncertainty about how firms that are working to develop AGI will behave potentially, the governments in the countries where they’re operating, how they’re going to behave?

Paul Christiano: Yeah, I think the largest source of variance isn’t either of those but is instead just how hard is problem? What is the character of the problem? So after that, I think the biggest uncertainty, though not necessarily the highest place to push, is about how people behave. It’s how much investment do they make? How well are they able to reach agreements? How motivated are they in general to change what they’re doing in order to make things go well? So I think that’s a larger source of variance than technical research that we do in advance. I think it’s potentially a harder thing to push on in advance. Pushing on how much technical research we do in advance is very easy. If we want to increase that amount by 10%, that’s incredibly cheap, whereas having a similarly big change on how people behave would be a kind of epic project. But I think that more of the variance comes from how people behave.

I’m very, very, uncertain about the institutional context in which that will be developed. Very uncertain about how much each particular actor really cares about these issues, or when push came to shove, how far out of their way they would go to avoid catastrophic risk. I’m very uncertain about how feasible it will be to make agreements to avoid race to the bottom on safety.

Robert Wiblin: Another question that came in from a listener was, I guess a bit of a hypothetical, but it’s interesting to prod your intuitions here. What do you think would happen if several different firms or countries simultaneously made a very powerful general AI? Some of which were aligned but some of which weren’t and potentially went rogue with their own agenda. Do you think that would be a very bad expectation, situation expectation?

Paul Christiano: My normal model does not involve a moment where you’re building powerful AI. So that is, instead of having a transition from nothing to very powerful AI, you have a bunch of actors gradually rushing up the capacity of the systems they’re able to build. But even if that’s false, I expect developers to generally be really well financed groups that are quite large. So if they’re smaller groups, I do generally expect them to divide up the task and effectively pool resources in one way or another. Either by explicitly resource sharing or by merging or by normal trading with each other. But we can still imagine … I say, in general, this was distributed across the world, it would be a bunch of powerful AI systems, some of which are aligned, some of which aren’t aligned. I think my default guess about what happens in that world is similar to saying if 10% of the AIs are aligned, then we capture 10% as much value as if 100% of them are aligned. It’s roughly in that ballpark.

Robert Wiblin: Does that come from the fact that there’s a 10% chance that one out of 10 AGIs would, in general, take over? You have more of a view where there’s going to be a power sharing, or each group gets a fraction of the influence, as in the world today?

Paul Christiano: Yeah. I don’t have a super strong view on this, and in part, I don’t have a strong view because I end up at the same place, regardless of how much stochasticity there is. Like whether you get 10% of the stuff all time, or all the stuff 10% of the time, I don’t have an incredibly strong preference between those, for kind of complicated reasons. I think I would guess … so, in general, if there’s two actors who are equally powerful, they could fight it out and then just see what happened and then behind a veil of ignorance, each of them wins half the time and crushes the other.

I think normally, people would prefer to reach comprises short of that. So that is, imagine how that conflict would go and say, “Well if you’re someone who would be more likely to win, then you’ll extract a bunch of concessions from the weaker party.’ But everyone is incentivized to reach an agreement where they don’t have an all out war. In general, that’s how things normally go amongst humans. We’re able to avoid all out war most of the time, though not all the time.

I would, in general, guess that AI systems will be better at that. Certainly in the long run, I think it’s pretty clear AI systems will be better at negotiating to reach positive sum trades, where avoiding war is often a example of a positive sum trade. It’s conceivable in the short term that you have AI systems that are very good at some kinds of tasks and not very good at diplomacy, or not very good at reaching agreement or these kinds of tests. But I don’t have a super strong view about that.

I think that’s the kind of thing that would determine to what extent you should predict there to be war. If people have transferred most of the decision making authority to machines, or a lot of decision making authority to machines, then you care a lot about things like, are machines really good at waging war but not really changing the process of diplomacy? If they have differential responsibility in that kind of respect, then you get an outcome that’s more random and someone will crush everyone else, and if you’re better at striking agreements, then you’re more likely to say like, “Well, look, here’s the allocation of resources … we’ll allocate influence according to the results of what would happen if we fought. Then let’s all not fight.”

Honesty

Robert Wiblin: One topic that you’ve written quite a lot about is credible commitments and the need for organizations to be honest. I guess part of that is because it seems like it’s going to be very important in the future for organizations that are involved in the development of AGI to be able to coordinate around safety and alignment and to avoid getting into races with one another. Or to have a just a general environment of mistrust, where they have reasons to go faster in order to out compete other groups. Has anyone ever attempted to have organizations that are as credible in their commitments as this? Do you have much hope that we’ll be able to do that?

Paul Christiano: So certainly I think in the context of arms control agreements and monitoring, some efforts are made for one organization to be able to credibly commit that they are … credibly demonstrate that they’re abiding by some agreement. I think that the kind of thing I talked about … So I wrote this blog post on honest organizations, I think the kind of measure I’m discussing there is both somewhat more extreme than things that would … like a government would normally be open to and also more tailored for this setting, where you have an organization which is currently not under the spotlight, which is trying to set itself up in such a way that it’s prepared to be trustworthy in the future, if it is under the spotlight.

I’m not aware of any organizations having tried that kind of thing. So a private organization saying, “Well, we expect some day in the future, we might want to coordinate in this way and be regulated in this way so we’re going to try and constitute ourselves such that it’s very easy for someone to verify that we’re complying with an agreement or a law.” I’m not aware of people really having tried that much. I think there’s some things that are implicitly this way and companies can change who they hire, they can try and be more trustworthy by having executives, or having people on the board, or having monitors embedded within the organization that they think stakeholders will trust. Certainly a lot of precedent for that. Yeah, I think the reason you gave for why this seems important to me in this context is basically right.

I’m concerned about the setting where there’s some trade-off between the capability of the AI systems you build and safety. In the context of such a trade-off, you’re reasonably likely to want some agreement that says, “Everyone is going to meet this bar on safety.” Given that everyone has committed to meet that bar, there’s not really an incentive then to cut … or they’re not able to follow the incentive to cut corners on safety, say. So you might want to make that …. That agreement might take place as an informal agreement amongst AI developers, it might take place as domestic regulation or law enforcement would like to allow AI companies to continue operating, but would like to verify they’re not going to take over the world.

It might take the context of agreements among states, which would themselves be largely … An agreement among states about AI would involve the US or China having some unusually high degree of trust or insight into what firms in the other country are doing. So I’m thinking forward to that kind of agreement and seems like you would need machinery in place that’s not currently in place. Or it would be very, very hard at the moment. So anything you could do to make it easier seems like it would be … potentially you could make it quite a lot easier. There’s a lot of room there.

Robert Wiblin: Is this in itself a good reason for anyone who’s involved in AI research to maintain an extremely high level of integrity so that they will be trusted in future?

Paul Christiano: I think having a very high level of integrity sounds good in general. As a utilitarian, I do like it if the people engaged in important projects are mostly in it for their stated goals and want to make the world better. It seems like there’s a somewhat different thing which is how trustworthy are you to the external stakeholders who wouldn’t otherwise have trusted your organization. Which I think is different from the normal … if we were to rate people by integrity, that would be a quite different ranking than ranking them by demonstrable integrity to people very far away who don’t necessarily trust the rest of the organization they’re involved in.

Robert Wiblin: I didn’t quite get that. Can you explain that?

Paul Christiano: So I could say there’s both … If I’m interacting with someone in the context … like I’m interacting with a colleague. I have some sense of how much they conduct themselves with integrity. It’s like, one, I could rank people by that. I’d love it if the people who were actually involved in making AI were people who I’d rank as super high integrity.

Because then a different question, which is suppose you have some firm, and then you have, there’s someone in the Chinese defense establishment reasoning about the conduct of that firm. They don’t really care that much probably, if there’s someone I would judge as high integrity involved in the process because they don’t have the information that I’m using to make that judgment. From their perspective, they care a lot of about the firm being instructed such that they feel that they understand what the firm is doing. They don’t feel any uncertainty about whether, in particular, they have minimal suspicion that a formal agreement is just cover for US firms to be cutting corners and delaying their competitors. They really want to have a lot of insight into what is happening at the firm. They don’t have some confidence that there’s not some unobserved collusion between the US defense establishment and this firm that nominally is complying with some international agreement, to undermine that agreement. That’s the example of states looking into firms.

But also in the example of firms looking into firms, similarly, if I am looking in, there’s some notion of integrity that would be relevant for two researchers at Baidu looking, interacting with each other and thinking about how much integrity they have. Something quite different that would be helpful for me looking into AI research at Baidu actually believing that AI research at Baidu is being conducted, when they make public statements, those statements are an accurate reflection of what they’re doing. They aren’t collaborating. There isn’t behind the scenes a bunch of work to undermine nominal agreements.

Robert Wiblin: Yeah, I think that it is very valuable for people in this industry to be trustworthy for all of these reason, but I guess I am a bit skeptical that trust alone is going to be enough, in part for the reasons you just gave. There’s that famous Russian proverb, trust but verify. It seems like there’s been a lot of talk, at least publicly, about the importance of trust, and maybe not enough about how we can come up with better ways of verifying what people’s behavior actually is. I mean, one option, I guess, would just be to have people from different organizations all working together in the same building, or to move them together so they can see what other groups are doing, which allows them to have a lot more trust just because they have much more visibility. How do you feel about that?

Paul Christiano: Yeah, so I think I would be pretty pessimistic about reaching any kind of substantive and serious agreement based only on trust for the other actors in the space. It may be possible in some … yeah, it’s conceivable amongst Western firms that are already quite closely, where there’s been a bunch of turnover of staff from one to the other and everyone knows everyone. It’s maybe be conceivable in that case. In general, when I talk about agreements, I’m imagining trust as a complement to fairly involved monitoring and enforcement mechanisms.

The modern enforcement problem in this context is quite difficult. That is it’s very, very hard for me to know, suppose I’ve reached, firm A and firm B have reached some nominal agreement. They’re only going to develop some AI that’s safe according to some standard. It’s very, very hard for firm A to demonstrate that to firm B without literally showing all of their, without giving firm B enough information they could basically take everything or benefit from all of the research that firm A is doing. There’s no easy solution to this problem. The problem is easier to the extent you believe that the firm is not running a completely fraudulent operation to maintain some appearances, but then in addition to have some … In addition to having enough insight to verify that, you still need to do a whole bunch of work to actually control how development is going.

I’m just running a bunch of code on some giant computing cluster, you can look and you can see, indeed, they’re running some code on this cluster. Even if I literally showed you all of the code I was running on the cluster, that’s actually that, wouldn’t be that helpful. It’s very hard for you to trust what I’m doing unless you’re literally have watched the entire process by which the code was produced. Or at least, you’re confident there wasn’t some other process hidden away that’s writing the real code, and the thing you can see is just a cover by which it looks like we’re running some scheduling job, but actually it’s just a … it’s carrying some real payload that’s a bunch of actual AI research that the results are getting smuggled out to the real AI research group.

Robert Wiblin: Could you have an agreement in which every organization accepts that all of the other groups are going to try to put clandestine informants inside their organization, and that that’s just an acceptable thing for everyone to do to one another because it’s the only way that you could really believe what someone’s telling you?

Paul Christiano: Yes, I think there’s a split between two ways of doing this kind of coordination. On one arm, you try and maintain something like the status quo, where you have a bunch of people independently pushing on the AI progress. In order to maintain that arm, there’s some limit on how much transparency different developers can have into each other’s research. That’s one arm. Then there’s a second arm where you just give up on that and you say yes, all of the information is going to leak.

I think the difficulty in the first arm is that it’s incredibly, you have to walk this really fine line where you’re trying to give people enough insight, which probably does involve monitors, whistle blowing, other mechanisms whereby there are people who firm A trust embedded in firm B. That’s what makes it hard to do monitoring without leaking all the information. That you have to walk that fine line. Then, if you want to leak all the information, then the main difficulty seems to be you have to reach some new agreement about how you’re actually going to divide the fruits AI research.

Right now, there’s some implicit status quo, where people who make more AI progress expect to capture some benefits by virtue of having made more AI progress. You could say, no, we’re going to deviate from the status quo and just agree that we’re going to develop AI effectively jointly. Either because it’s literally joint or because we’ve all opened … or the leaders has opened himself up to enough monitoring they cease to be the leader. If you do that, then you have to reach some agreement where you say, here’s how we compensate the leader for the fact that they were the leader. Either that or the leader has to be willing to say, yep, I used to be, have a high evaluation because I was doing so well in AI, and now I’m just happy to grant that that advantage is going to get eroded, and I’m happy to do that because it reduces the risk of the world being destroyed.

I think both of those seem like reasonable options to me. Which one that you take depends a little bit upon how serious the problem appear to be, like what the actual structure of the field is like, or like the coordinating is more reasonable if the relevant actors are close, such that … well, it’s more reasonable if there’s an obvious leader who’s going to capture the benefits and is feeling reasonably is wiling to distribute them, or is somehow there’s not a big difference between the players, such as erasing AI as a fact. If you imagine the US and China both believing that, like things are hard if each of them believes that they’re ahead in AI and each of them believe that they’re going to benefit by having AI research which isn’t available to their competitor. Things are hard if both of them believe that they’re ahead, and things are easy if both of them believe that they’re behind.

If they both have an accurate appraisal of the situation and understand there’s not a big difference, then maybe you’re also okay because everyone’s fine saying, sure, I’m fine leaking because I know that that’s roughly the same as … I’m not going to lose a whole lot by leaking information to you.

Takeoff speeds

Robert Wiblin: Okay. Let’s turn now to this question of fast versus slow take off of artificial intelligence. Historically, a lot of people who’ve been worried about AI alignment have tended to take the view that they expected progress to be relatively gradual for a while, and then to suddenly accelerate and take off very quickly over a period of days or weeks or months rather than years. But you’ve, for some time, been promoting the view that you think the take off of general AI is going to be more gradual than that. Do you want to just explain your general view?

Paul Christiano: Yeah, so it’s worth clarifying that when I say slow, I think I still mean very fast compared to most people’s expectations. I think that a transition taking place over a few years, maybe two years between AI having very significant economic impact and literally doing everything sounds pretty plausible. I think when people think about such a tiered transition, to most people on the street, that sounds like a pretty fast takeoff. I think that’s important to clarify. That when I say slow, I don’t mean what most people of by slow.

Another things that’s important to clarify is that I think there’s rough agreement amongst the alignment and safety crowd about what would happen if we did human level AI. That is everyone agrees that at that point, progress has probably exploded and is occurring very quickly, and the main disagreement is about what happens in advance of that. I think I have the view that in advance of that, the world has already changed very substantially. You’re already likely exposed to catastrophic AI risk, and in particular, when someone develops human level AI, it’s not going to emerge in a world like the world of today where we can say that indeed, having human level AI today would give you a decisive strategic advantage. Instead, it will emerge in a world which is already much, much crazier than the world of today, where having a human AI gives you some more modest advantage.

Robert Wiblin: Yeah, do you want to paint a picture for us of what that world might look like?

Paul Christiano: Yeah, so I guess there are a bunch of different parts of the worlds, and I can focus on different ones, but I can try and give some random facts or some random view, like facts from that world. They’re not real facts. They’re Paul’s wild speculations. I guess, in terms of calibrating what AI progress looks like, or how rapid it is, I think maybe two things that seem reasonable to think about are, the current rate of progress and information technology in general. That would suggest something like, maybe in the case of AI, like falling in costs by a factor of two every year-ish or every six to 12 months.

Another thing that I think is important to get an intuitive sense of scale is to compare to intelligence in nature. I think when people do intuitive extrapolation of AI, they often think about abilities within the human range. One thing that I do agree with proponents of fast takeoff about is that that’s not a very accurate perspective when thinking about AI.

I think about better way to compare is to look at what evolution was able to do with varying amounts of compute. If you look at what each order of magnitude buys you in nature, you’re going from insects to small fish to lizards to rats to crows to primates to humans. Each of those is one order of magnitude, roughly, so you should be thinking of there are these jumps. It is the case that the different between insect and lizard feels a lot smaller to us and is less intuitive significance than the difference between primate and human or crow and primate, so when I’m thinking about AI capabilities, I’m imagining, intuitively, and this is not that accurate, but I think is useful as an example to ground things out, I’m imagining this line raising and one day you have, or one year you have an AI which is capable of very simple learning tasks and motor control, and then a few years later … A year later, you have an AI that’s capable of slightly more sophisticated learning, now it learns as well as a crow or something, that AI is starting to get deployed as quickly as possible in the world and having a transformative impact, and then it’s a year later that AI has taken over the process of doing science from humans. Yeah, I think that’s important to have in mind as background for talking about what this world looks like.

Robert Wiblin: What tasks can you put an AI that’s as smart as a crow on that are economically valuable?

Paul Christiano: I think there’s a few kinds of answers. Once place where I think you definitely have a big impact is in robotics and domains like manufacturing logistics and construction. That is think lower animals are probably, they’re good enough at motor control that you’d have much, much better robotics than you have now. Today, I would say robotics doesn’t really, or robots that learn don’t really work very well or at all. Today the way we get robotics to work is you really organize your manufacturing process around them. They’re quite expensive and tricky. It’s just hard to roll it out. I think in this world, probably even before you have crow level AI, you have robots that are very general and flexible. They can be applied not only on an assembly line, but okay, one, they take the place of humans on assembly lines quite reliably, but they can also then be applied in logistics to loading and unloading truck, driving trucks, managing warehouses, construction.

Robert Wiblin: Maybe image identification as well?

Paul Christiano: They could certainly do image identification well. I think that’s the sort of thing we get a little bit earlier. I think that’s a large part of … Today those activities are a large part of the economy. Maybe this stuff we just listed is something … I don’t actually know in the US, it’s probably lower here than elsewhere, but still more than 10% of our economy, less than 25%.

There’s another cost of activities. If you look at the intellectual work humans do, I think a significant part of it could be done by very cheap AIs at the level of crows or not much more sophisticated than crows. There’s also a significant part that requires a lot more sophistication. I think we’re very uncertain about how hard doing science is. As an example, I think back in the day we would have said playing board games that are designed to tax human intelligence, like playing chess or go is really quite hard, and it feels to humans like they’re really able to leverage all their intelligence doing it.

It turns out that playing chess from the perspective of actually designing a computation to play chess is incredibly easy, so it takes a brain very much smaller than an insect brain in order to play chess much better than a human. I think it’s pretty clear at this point that science makes better use of human brains than chess does, but it’s actually not clear how much better. It’s totally conceivable from our current perspective, I think, that an intelligence that was as smart as a crow, but was actually designed for doing science, actually designed for doing engineering, for advancing technologies rapidly as possible, it is quite conceivable that such a brain would actually outcompete humans pretty badly at those tasks.

I think that’s another important thing to have in mind, and then when we talk about when stuff goes crazy, I would guess humans are an upper bound for when stuff goes crazy. That is we know that if had cheap simulated humans, that technological progress would be much, much faster than it is today. But probably stuff goes crazy somewhat before you actually get to humans. It’s not clear how many orders of magnitudes smaller a brain can be before it goes crazy. I think probably at least one seems safe, and then two or three is definitely plausible.

Robert Wiblin: It’s a bit surprising to say that science isn’t so hard, and that there might be a brain that, in a sense, is much less intelligent than a human that could blow us out of the water in doing science. Can you explain, can you try to make that more intuitive?

Paul Christiano: Yeah, so I mentioned this analogy to chess, which is when humans play chess, we apply a lot of faculties that we evolved for other purposes to play chess well, and we play chess much, much better than someone using pencil and paper to mechanically play chess at the speed that a human could. We’re able to get a lot of mileage out of all of these other … I know we evolved to be really good at physical manipulation and planning in physical contexts and reasoning about social situations. That makes us, in some sense, it lets us play good chess much better than if we didn’t have all this capacities.

That said, if you just write down a simple algorithm for playing chess, and you run it with a tiny, tiny fraction of the compute that a human uses in order to play chess, it crushes humans incredibly consistently. So, in a similar sense, if you imagine this project of look at some technological problem, consider a bunch of possible solutions, understand what the real obstructions are and how we can try and overcome those obstructions, a lot of the stuff we do there, we know that humans are much, much better than a simple mechanical algorithm applied to those tasks. That is we’re able to leverage all of these abilities that we … All these abilities that helped us in the evolutionary environment, we’re able to leverage to do really incredible things in terms of technological progress, or in terms of doing science or designing systems or et cetera.

But what’s not clear is if you actually had created, so again, if you take the computations of the human brain, and you actually put it in a shape that’s optimal for playing chess, it plays chess many, many orders of magnitude better than a human. Similarly, if you took the computation of the human brain and you actually reorganized it, so you said now, instead of a human explicitly considering some possibilities for how to approach is problem, a computer is going to generate a billion possibilities per second for possible solutions to this problem. In many respects, we know that that computation would be much, much better than humans at resolving some parts of science and engineering.

There’s been question of how, exactly how much leverage are we getting out of all this evolutionary heuristics. It’s not surprising that in the case of chess, we’re getting much less mileage than we do for tasks that are closer, that more leverage the full range of what the human brain does, or closer to tasks the human brain was designed for. I think science is, and technology are intermediate place, where they’re still really, really not close to what human brains are designed to do. It’s not that surprising if you can make brains that are really a lot better at science and technology than humans are. I think a priori, it’s not that much more surprising for science and technology than it would be for chess.

Robert Wiblin: Okay. I took us some part away from the core of this fast versus slow takeoff discussion. One part of your argument that I think isn’t immediately obviously is that when you’re saying that in a sense the takeoff will be slow, you’re actually saying that dumber AI will have a lot more impact on the economy and on the world than other people think? Why do you disagree with other people about what? Why do you think that earlier versions of machine learning could already be having a transformative impact?

Paul Christiano: I think there’s a bunch of dimensions of this disagreement. An interesting fact, I think, about the effective altruism and AI safety community is that there’s a lot of agreement about, or there’s a surprising amount of agreement about takeoff being fast. There’s a really quite large diversity of view about why takeoff will be fast.

Certainly the arguments people would emphasize, if you were to talk with them, would be very, very different, and so my answer to this question is different for different people. I think there’s this general, one general issue, is I think other people more imagine … other people look at the evolutionary record, and they more see this transition between lower primates and humans, where humans seem incredibly good at doing a kind of reasoning that builds on itself and discovers new things and accumulates them over time culturally. They more see that as being this jump that occurred around human intelligence and is likely to be recapitulated in AI. I think I more see that jump as occurring when it did because of the structure of evolution, so evolution was not really trying to optimize … It was not trying to optimize humans for cultural accumulation in any particularly meaningful sense. It was trying to optimize humans for this speed of tasks that primates are engaged in, and incidentally humans became very good at cultural accumulation and reasoning.

I think if you optimize AI systems for reasoning, it appears much, much earlier. If evolution had been trying to make AIs that would build a civilization, or if evolution had been trying to design creatures trying to optimize for creatures that would build a civilization, instead of going straight to humans who have some level of ability at forming a technological civilization, it would have been able to produce crappier technological civilizations earlier. I now think it’s probably not the case that if you left monkeys for long enough you would get a space faring civilization, but I think that’s not for reasons that are directly, I think that’s not a consequence of monkeys just being too dumb to do it, I think it’s largely a consequence of the way that monkey’s social dynamics work. The way that imitation work amongst monkeys, the way the culture accumulation works and how often things are forgotten.

I think that this continuity that we observe in the historical record between lower primates and humans, I don’t feel like it’s … It certainly provides some indication about what changes you should expect to see in the context of AI, but I don’t feel like it’s giving us a really robust indicator that it’s a really closely analogous situation. That’s one important difference. There’s this jump in the evolutionary record. I expect, to the extent there’s a similar jump, we would see it significantly earlier, and we would jump to something significantly dumber than humans. It’s a significant difference between my view and the view of some, I don’t know, maybe one third of people who are, who think takeoff is likely to be fast.

There are, of course, other differences, so in general, I look at the historical record, and I think it feels to me like there’s an extremely strong regularity of the form. Before you’re able to make a really great version of something, you’re able to make a much, much worse version of something. For example, before you’re able to make a really fast computer, you’re able to make a really bad computer. Before you’re able to make a really big explosive, you’re able to make a really crappy explosive that’s unreliable and extremely expensive. Before you’re able to make a robot that’s able to do some very interesting tasks, you’re able to make a robot which is able to do the tasks with lower reliability or a greater expense or in a narrower range of cases. That seems to me like a pretty robust regularity.

It seems like it’s most robust in cases where the metrics that we’re tracking is something that people are really trying to optimize. If you’re looking at a metric that people aren’t trying to optimize, like how many books are there in the world. How many books are there ein the world is a property that changes discontinuously over the historical record. I think the reason for that is just ’cause no one is trying to increase the number of books in the world. It’s incidental. There is a point in history where books are relatively inefficient way of doing something, and it switched to books being an efficient way to do something, and the number of books increases dramatically.

If you look at a measure of people who are actually trying to optimize, like how quickly information is transmitted, how many facts the average person knows, it’s a … not the average person, but how many facts someone trying to learn facts knows, those metrics aren’t going to change discontinuously in the same way that how many books exist will change. I think how smart is your AI is the kind of thing that’s not going to change. That’s the kind of things people are really, really pushing on and caring a lot about, how economically valuable is your AI.

I think that this historical regularity probably applies to the case of AI. There are a few plausible historical exceptions. I think the strongest one, by far, is the nuclear weapons case, but I think that that case, first, is there are a lot of very good a priori arguments for discontinuity around that case that are much, much stronger than the arguments we give for AI. Even as such, I think the extent of the discontinuity is normally overstated by people talking about the historical record. That’s a second disagreement.

I think a third disagreement, is I think people make a lot of sloppy arguments or arguments that don’t quite work. I think they’re, I feel like, a little bit less uncertain because I feel like it’s just a matter of if you work through the arguments, they don’t really hold together.

I think an example of that is I think people often make this argument of imagining your AI is being a human who makes mistakes sometimes, just an epsilon fraction of the time or fraction of cases where your AI can’t do what a human could do. You’re just decreasing epsilon over time until you hit some critical threshold where now your AI becomes super useful. Once it’s reliable enough, like when it gets to zero mistakes or one in a million mistakes. I think that model is like … there’s not actually, or it looks a priori like a reasonable-ish model, but then you actually think about it. Your AI is not like a human that’s degraded in some way. If you take human and you degrade them, there is a discontinuity that gets really low levels of degradation, but in fact, your AI is falling along a very different trajectory. The conclusions from that model turn out to be very specific to the way that you were thinking of AI as a degraded human. Those are the three classes of disagreements.

Robert Wiblin: Let’s take that it’s given that you’re right that an AI takeoff will be more gradual than some people think. Although, I guess, still very fast by human time scales. What kind of strategic implications does that have for you and me today trying to make that transition go better?

Paul Christiano: I think the biggest strategic question that I think about regularly that’s influenced by this is to what extent early developers of AI will have a lot of leeway to do what they want with the AI that they’ve built. How much advantage will they have over the rest of the world?

I think some people have a model in which early developers of AI will be at huge advantage. They can take their time or they can be very picky about how they want to deploy their AI, and nevertheless, radically reshape the world. I think that’s conceivable, but it’s much more likely that the earlier developers of AI will be developing AI in a world that already contains quite a lot of AI that’s almost as good, and they really won’t have that much breathing room. They won’t be able to reap a tremendous windfall profit. They won’t be able to be really picky about how they use their AI. You won’t be able to take your human level AI and send it out on the internet to take over every computer because this will occur in a world where all the computers that were easy to take over have already been taken over by much dumber AIs.

It’s more like you’re existing in this soup of a bunch of very powerful systems. You can’t just go out into a world … people imagine something like the world of today and human level AI venturing out into that world. In that scenario, you’re able to do an incredible amount of stuff. You’re able to basically steal everyone’s stuff if you want to steal everyone’s stuff. You’re able to win a war if you want to win a war. I think that that model, so that model I think is less likely under a slow takeoff, though it still depends on quantitatively exactly how slow. It especially depends on maybe there’s some way … if a military is to develop AI in a way where they selectively … They can develop AI in a way that would increase the probability of this outcome if they’re really aiming for this outcome of having a decisive strategic advantage. If this doesn’t happen, if the person who develops AI doesn’t have this kind of leeway, then there are, I think the nature of this safety problem changes a little bit.

In one respect, it gets harder because now you really want to be building an AI that can do … you’re not going to get to be picky about what tasks you’re applying your AI to. You need an AI that can be applied to any task. That’s going to be an AI that can compete with a world full of a bunch of other AIs. You can’t just say I’m going to focus on those tasks there’s a clear definition of what I’m trying to do, or I’m just going to pick a particular task, which is sufficient to obtain a strategic advantage and focus on that one. You really have to say, based on the way the world is set up, there’s a bunch of tasks that people want to apply AI to, and you need to be able to make those AI safe.

In that respect, it makes the problem substantially harder. It makes the problem easier in the sense that now you do get a little bit of a learning period. It’s like as AI ramps up, people get to see a bunch of stuff going wrong. We get to roll out a bunch of systems and see how they work. So it’s not like there’s this one shot. There’s this moment where you press the button and then your AI goes, and it either destroys the world or it doesn’t. Its more there’s a whole bunch of buttons. Every day you push a new button, and if you mess up then you’re very unhappy that day, but it’s not literally the end of the world until you push the button the 60th time.

It also changes the nature of the policy or coordination problem a little bit. I think that tends to make the coordination problem harder and changes your sense of exactly what that problem will look like. In particular, it’s not, it’s unlikely to be between two AI developers who are racing to build a powerful AI then takes over the world. It’s more likely there are many people developing AI, or not many, but whatever. Let’s say there are a few companies developing AI, which is then being used by a very, very large number of people, both in law enforcement and in the military and in private industry. The kind of agreement you want is a new agreement between those players.

Again, the problem is easier in some sense, in that now the military significance is not as clear. It’s conceivable that that industry isn’t nationalized. That this development isn’t being done by military. That it’s instead being treated in a similar way to other strategically important industries.

Then it’s harder because there’s not just this one. You don’t have to hold your breath until an AI takes over the world and everything changes. You need to actually set up some sustainable regime where people are happy with the way AI development is going. People are going to continue to think, engage in normal economic ways as they’re developing AI. In that sense, the problem gets harder. I think both problems, some aspects of the problem, both the technical and policy problems become harder, some aspects become easier.

Robert Wiblin: Yeah. That’s a very good answer. Given that other people would disagree with you, though, what do you think are the chances that you’re wrong about this, and what’s the counter argument that gives you the greatest concern?

Paul Christiano: Yeah, I feel pretty uncertain about this question. I think we could try to quantify an answer to how fast this takeoff by talking about how much time elapses between certain benchmarks being met. If you have a one year lead in the development of AI, how much of an advantage does that give you at various points in development.

I think that when I break out very concrete consequences in the world, like if I ask how likely is it that the person who develops AI will be able to achieve a decisive strategic advantage for some operationalization at some point, then I find myself disagreeing with other people’s probabilities, but I can’t disagree that strongly. Maybe other people will assign a 2/3 probability to that event, and I’ll assign a 1/4 probability to that even, which is a pretty big disagreement, but certainly doesn’t look like either side being confident. Let’s 2/3 versus 1/3. It doesn’t look like either side being super confident in their answer, and everyone needs to be willing to pursue policies that are robust across that uncertainty.

I think the thing that makes me most sympathetic to the fast takeoff view is not any argument about qualitative change around human level. It’s more an argument just of like look quantitatively about the speed of development and think about if you were scaling up on the times scale. If every three months you were corresponding to a, your AIs were equivalent to an animal with a brain twice as large, it would not be many months between AIs that seemed minimally useful and AI that was conferring at a strategic advantage. It’s just this quantitative question of exactly how fast this development, and even there’s no qualitative change, you can have development that’s fast enough that it’s correctly described as a fast takeoff. In that case, the view I’ve described of the world is not as accurate. We’re more in that scenario where the AI developer can just keep things under wraps during these extra nine months, and then, if they’d like, have a lot of leeway about what to do.

Robert Wiblin: How strong do you think is the argument that people involved in AI alignment work should focus on the fast takeoff scenario even if it’s less likely because they expect to get more leverage, personally, if that scenario does come to pass?

Paul Christiano: I think that’s a … There’s definitely a consideration that direction. I think it tends to be significantly weaker than the focusing on short time. There’s a similar argument for focusing on short timelines, which I think is quite a bit stronger. I mean, I think that … The way that argument runs, the reason you might focus on fast timelines, or on fast takeoff, is because over the course of a slow takeoff, there will be lots of opportunities to do additional work and additional experimentation to figure out what’s going on.

If you have a view where that work can just replace anything you could do now, then anything you could do now becomes relatively unimportant. If you have a view where there’s any complementarity between work we do now and work that’s done. Imagine you have this, let’s say, one to two years period where people are really scrambling, where it becomes clear to many people that there’s a serious problem here, and we’d like to fix it. Because any kind of complementarity between the work we do now and the work that they’re doing during that period, then that doesn’t really undercut doing work now.

I think that it’s good. We can then advance to do things like understand the nature of the problem, the nature of the alignment problem, understand much more about how difficult the problem is, set up institutions such that they’re prepared to make these investments, and I think those things are maybe a little bit better in fast takeoff worlds, but it’s not a huge difference. I think it’s not more than … intuitively, I think it’s not more than a factor of two, but I haven’t thought that much about it. It might be … Maybe it’s a little more than that.

The short timelines thing I think is a much larger update.

Robert Wiblin: Yeah. Tell us about that.

Paul Christiano: Just, so if you think that AI might be surprisingly soon, in general, what surprisingly soon means is that many people are surprised, so they haven’t made much investment. In those worlds, there’s a lot less, much less has been done. Certainly, if AI was developed in 50 years, I do not think it’s the case that the research I’m doing now could really, very plausibly be relevant, just because there’s so much time that other people are going to have to rediscover the same things.

If you get a year ahead now, that means maybe five years from now you’re 11 months ahead of where you would have been otherwise, and five years later, you’re eight months of where you would have been otherwise. Over time, the advantage just shrinks more and more. If AI’s developed in 10 years, then something crazy happened, people were completely, the world at large has really been asleep at the wheel if we’re going to have human level AI in 10 years, and in that world, it’s very easy to have very large impact.

Of course, if AI is developed in 50 years, it could happen that people are asleep at the wheel in 40 years. They can independently make those … I don’t know, you can invest now for the case that people are asleep at the wheel. You aren’t really foreclosing the possibility of people being asleep in the future. If they’re not asleep at the wheel in the future, then the work we do now is a much lower impact.

It’s mostly, I guess, just a neglectedness argument where you’re not really expect up here AI to be incredibly neglected. If, in fact, people with short timelines are right, if the 15% in 10 years, 35% in 20 years is right, then AI is absurdly neglected at the moment. Right? In that world, what we’re currently seeing in ML is not unjustified heights but desperately trying to catch up to what would be an acceptable level of investment given the actual probabilities we face.

Robert Wiblin: Earlier you mentioned that if you have this two year period, where economic growth has really accelerated in a very visible way, that people would already be freaking out. Do you have a vision for exactly what that freaking out would look like, and what implications that has?

Paul Christiano: I think there’s different domains and different consequences in different domains. Amongst AI researchers, I think a big consequence that a bunch of discussions that are currently hypothetical and strange, the way we talk about catastrophic risk caused by AI. We talk about the possibility of AI much smarter than humans, or we talk about decisions being made by machines, a bunch of those issues will cease to become, stop being weird considerations or speculative arguments and will start being this is basically already happening. We’re really freaked out about where this is going, or we feel very viscerally concerned.

I think that’s a thing that will have a significant effect on both what kind of research people are doing and also how open they are to various kinds of coordination. I guess that’s a very optimistic view, and I think it’s totally plausible that … Many people are much more pessimistic on that front than I am, but I feel like if we’re in this regime, people will really be thinking about [prioritizing 01:06:13] the thing that’s clearly coming, and they will be thinking about catastrophic risk from AI as even more clear than powerful AI, just because we’ll be living in this world where AI is really … you’re already living in world where stuff is changing too fast for humans to understand in quite a clear way. In some respects, our current world has that character, and that makes it a lot easier to make this case than it would have been 15 years ago. But that will be much, much more the case in the future.

Robert Wiblin: Can you imagine countries and firms hoarding computational ability because they don’t want to allow anyone else to get in on the game?

Paul Christiano: I think mostly I imagine defaults is just asset prices get bit up a ton. It’s not that you hoard competition so much as just computers become incredibly expensive and that flows backwards to semi-connector fabrication becomes incredibly expensive. IP chip companies become relatively valuable. That could easily get competed away. I think to first order, the economic story is probably what I expect, but then I think if you try it, if you look at the world, and you have, imagine asset prices and some area are raising by a factor of 10 over the course of a few years or a year, I think that it’s pretty likely that the normal … I think the rough economic story is probably still basically right, but markets, or the formal structure of markets is pretty easy to break down in that case.

You can easily end up in the world where computation is very expensive, but prices are too sticky for actually prices to adjust in the correct way. Instead, that ends up looking like computers are still somewhat cheap, but now effectively they’re impossible for everyone to buy, or machine learning hardware is effectively impossible for people to buy at the nominal price. That world might look more like people hording computation, which I would say is mostly a symptom of an inefficient market world. It’s just the price of your computer has gone up by an absurd amount because everyone thinks this is incredibly important now, and it’s hard to produce computers as fast as people want them. In an inefficient market world, that may look like …. That ends up looking like freaking out, and takes the form partly of a policy response instead of a market response, so strategic behavior by militaries and large firms.

Timelines

Robert Wiblin: Okay, that has been the discussion of how fast or gradual this transition will be. Let’s talk now about when you think this thing might happen. What’s your best guess for, yeah, AI progress timelines?

Paul Christiano: I normally think about this question in terms of what’s the probability of some particular development by 10 or 20 years rather than thinking about a median because those seem like the most decision relevant numbers, basically. Maybe one could also, if you had very short timelines give probabilities on less than 10 years. I think that my probability for human labor being obsolete within 10 years is probably something in the ballpark of 15%, and within 20 years is something within the ballpark of 35%. AI would then have, prior to human labor being obsolete, you have some window of maybe a few years during which stuff is already getting quite extremely crazy. Probably AI [risk 01:09:04] becomes a big deal. We can have permanently have sunk the ship like somewhat before, one to two years before, we actually have human labor being obsolete.

Those are my current best guesses. I feel super uncertain about … I have numbers off hand because I’ve been asked before, but I still feel very uncertain about those numbers. I think it’s quite likely they’ll change over the coming year. Not just because new evidence comes in, but also because I continue to reflect on my views. I feel like a lot of people, whose views I think are quite reasonable, who push for numbers both higher and lower, or there are a lot of people making reasonable arguments for numbers both much, like shorter timelines than that and longer timelines than that.

Overall, I come away pretty confused with why people currently are as confident as they are in their views. I think compared to the world at large, the view I’ve described is incredibly aggressive, incredibly soon. I think compared to the community of people who think about this a lot, I’m more somewhere in, I’m still on the middle of the distribution. But amongst people whose thinking I most respect, maybe I’m somewhere in the middle of the distribution. I don’t quite understand why people come away with much higher or much lower numbers than that. I don’t have a good … It seems to me like the arguments people are making on both sides are really quite shaky. I can totally imagine that after doing … After being more thoughtful, I would come away with higher or lower numbers, but I don’t feel convinced that people who are much more confident one way or the other have actually done the kind of analysis that I should defer to them on. That’s said, I also I don’t think I’ve done the kind of analysis that other people should really be deferring to me on.

Robert Wiblin: There’s been discussion of fire alarms, which are kind of indicators that you would get ahead of time, that you’re about to develop a really transformative AI. Do you think that there will be fire alarms that will give us several years, or five or ten-years’ notice that this is going to happen? And what might those alarms look like?

Paul Christiano: I think that the answer to this question depends a lot on … There’s many different ways the AI could look. Different ways that AI could look have different signs in advance. I think if AIs developed very soon, say within the next 20 years, I think the best single guess for the way that it looks is a sort of … The techniques that we are using are more similar to evolution than they are to learning occurring within a human brain. And a way to get indications about where things are going is by comparing how well those techniques are working to how well evolution was able to do with different levels of … different computational resources. On that perspective, or in that scenario, what I think is the most likely scenario within 20 years, I think the most likely fire alarms are successfully replicating the intelligence of lower animals.

Things like, right now we’re kind of at the stage where AI systems are … the sophistication is probably somewhere in the range of insect abilities. That’s my current best guess. And I’m very uncertain about that. I think as you move from insects to small vertebrates to larger vertebrates up to mice and then birds and so on, I think it becomes much, much more obvious. It’s easier to make this comparison and the behaviors become more qualitatively distinct. Also, just every order of magnitude gets you an order of magnitude closer to humans.

I think before having broadly-human level AI, a reasonably good warning sign would be broadly lizard-level or broadly mouse-level AI, that is learning algorithms which are able to do about as well as a mouse in a distributional environment that’s about as broad as the distribution environments that mice are evolved to handle. I think that’s a bit of a problematic alarm for two reasons. One, it’s actually quite difficult to get a distribution of environments as broad as the distribution that a mouse faces, so there’s likely to be remaining concern. If you can replicate everything a mouse can do in a lab, that’s maybe not so impressive, and it’s very difficult to actually test for some distribution environments. Is it really flexing the most impressive mouse skills?

I think that won’t be a huge problem for people … A very reasonable person looking at the evidence will still be able to get a good indication, but it will be a huge problem for establishing consensus about what’s going on. That’s one problem. And then the other problem was this issue I mentioned where it seems like transformative impacts should come significantly before broadly human-level AI. I think that a mouse-level AI would probably not give you that much warning, or broadly mouse-level AI would probably not give you that much warning. And so you need to be able to look a little bit earlier than mice. It’s plausible that in fact one should be regarding … One should really be diving into the comparison to insects now and say, can we really do this? It’s plausible to me that that’s the kind of … If we’re in this world where our procedures are similar to evolution, it’s plausible to me the insect thing should be a good indication, or one of the better indications, that we’ll be able to get in advance.

Robert Wiblin: There was this recent blog post that was doing the rounds on social media called, “An AI Winter is Coming,” which was broadly making the argument that people are realizing that current machine learning techniques can’t do the things that people have been hoping that they’ll be able to do over the last couple of years. That the range of situations they can handle is much more limited and that the author expects that the economic opportunities for them are gonna dry out somewhat, and an investment will shrink. As we’ve seen, so they claim, in the past when there’s been a lot of enthusiasm about AI, and then it hasn’t actually been able to do the things that we claimed. Do you think there’s much chance that that’s correct, and what’s your general take on this AI boom, AI winter view?

Paul Christiano: I think that the position in that post are somewhat … I feel like the position in that post is fairly extreme in a way that’s not very plausible. For example, I think the author of that post is pessimistic about self-driving cars actually working because they won’t be sufficiently reliable. I think its correct to be like, this is a hard problem. I think that … I would be extremely happy to take a bet at pretty good odds against the world they’re imagining. I guess I … I also feel somewhat similarly about robotics at this point. I think what we’re currently able to do in the lab is approaching good enough that industrial robotics can … That’s a big … If the technology is able to work well, it’s a lot of value. I think we’re able to in the lab is a very strong indication that that is going to work in the reasonably short term.

I think those things are pretty good indications that, say, current investment in the field is probably justified by, or the level of investment is plausible given the level of applications that we can foresee quite easily, though I don’t wanna comment on the form of investment. There’s maybe a second … I think I don’t consider the argument in the post … I think the arguments in the post are kind of wacky and not very careful. I think one thing that makes it a little bit tricky is this comparison. If you’re compare the kind of AI we’re building now to human intelligence, I think literally until the very end, actually, probably after the very end, you’re just gonna be, look there’s all these things that humans can do that our algorithms can’t do. I think one problem that’s just kind of a terrible way to do the comparison. That’s the kind of comparison that’s predictably going to leave you being really skeptical until the very, very end.

I think there’s another question, which is, and maybe this is what they were getting at, which is, there’s a sense maybe amongst the … especially certainly deep-learning true believers, at the moment, that you can just take existing techniques and scale them quite far. If you just keep going, things are gonna keep getting better and better, and we’re gonna get all the way to powerful AI like that. I think it’s a quite interesting question whether that is … If we’re in that world, then we’re just gonna see machine learning continue to grow, so then we would not be in a bubble. We would be in the beginning of this ramp up to spending some substantial fraction of GDP on machine learning. That’s one possibility. Another possibility is that some applications are going to work well, so maybe well get some simple robotics applications working well which could be quite large, that could easily have impacts in hundreds of billions or trillions of dollars. But, things are gonna dry up long before they get to human level. I think that seems quite conceivable. I would maybe be … Maybe I think it’s a little bit more likely that not that at some point things pull back. I mean it’s somewhat less than 50% that the current wave of enthusiasm is going to just continue going up until we build human level AI. But I also think that’s kind of plausible.

I t