Transcript

Robert Wiblin: Hi listeners, this is the 80,000 Hours Podcast, where each week we have an unusually in-depth conversation about one of the world’s most pressing problems and how you can use your career to solve it. I’m Rob Wiblin, Director of Research at 80,000 Hours.

Today’s episode is with three people working at OpenAI who want to discuss how AI development teams can best coordinate to avoid racing to deploy artificial intelligence too quickly and what role, if any, government ought to play in this.

Before that I just wanted to pull in my colleague Niel Bowerman to say hi and talk about an article he’s written – hey Niel.

Niel Bowerman: Hey Rob, thanks for having me on the show.

Robert Wiblin: My pleasure. This episode is likely to be of interest to most listeners, even if they can’t see themselves working on something to do with AI themselves.

But for those who can, I wanted to make sure they know about your new article called “The case for building expertise to work on US AI policy, and how to do it”. What might readers find in there?

Niel Bowerman: Yeah so AI policy is an issue that’s been discussed on the podcast a couple of times, and a whole bunch by 80,000 Hours. But, surprisingly often, I get people coming to me and asking… 1) “Why? I don’t really get why the US government is such an important place to go and work if what you want to do is improve the long term outcomes of AI” and 2) “How do you even get into these careers?”

And so, I wrote this guide, or article, to essentially make the case for why I think building expertise to go and work on AI policy is really important. And we give a bunch of arguments for as well as a bunch of arguments against — why this might not be a good idea, why it’s maybe a risky career move. And then we go into this question of “how do you even get into these careers?”

So just a bunch of masters programmes, a bunch of jobs that you can take early on when you’re starting out. But ultimately what this article is doing is saying: that people with the right combination of technical expertise, social skills, and willingness to work in multi-stakeholder policy environments, and the ability and willingness to move in slow-moving bureaucracies… and if you’re excited to work on shaping the future of AI policy, then this might be one of the most impactful things that you can do with your career over the coming decade.

Robert Wiblin: Great thanks for that Niel – I’ll bring you back along with Michelle Hutchinson at the end of the episode to offer some quick reactions to the interview.

But first we have to listen to it, so without any further ado, here’s Amanda, Miles and Jack from OpenAI.

Robert Wiblin: Today I’m speaking with Amanda Askell, Miles Brundage, and Jack Clark.

Since her last appearance on the podcast last September, Amanda has joined OpenAI as a Research Scientist working on AI policy. She completed a PhD in philosophy at NYU, one of the world’s top philosophy grad schools, with a thesis focused on Infinite Ethics. Before that, she did a BPhil at Oxford University, and she blogs at rationalreflection.net.

Miles had the honour of being the first ever guest on the podcast, and now also works as a Research Scientist on OpenAI’s policy team. Previously, he was a Research Fellow at the University of Oxford’s Future of Humanity Institute (where he remains a Research Associate). He’s also a PhD candidate in Human and Social Dimensions of Science and Technology at Arizona State University.

Jack works as the Policy Director at OpenAI. He was previously the world’s only neural network reporter working at Bloomberg, is the creator of the weekly newsletter Import AI, which is read by a large fraction of the AI industry, and until today, he had never been on the 80,000 Hours Podcast even once.

Robert Wiblin: Thanks for coming on the show again, Amanda.

Amanda Askell: Thanks for inviting me.

Robert Wiblin: Thanks for coming on the show again, Miles.

Miles Brundage: Great to be here. Thanks.

Robert Wiblin: And Jack, welcome on for the first time.

Jack Clark: Thank you very much.

Robert Wiblin: I hope to get into talking about how people can actually pursue careers in AI policy, like you all are, and also some of the latest developments in the field, which are pretty interesting. But first, can each of you just quickly tell me, what are you guys working on, and why do you think it’s really important work? Maybe Amanda first.

Amanda Askell: Yeah. I’ve been focusing recently on this notion of AI development races between different developers. I think that a lot of the dialogue around that has focused on these highly adversarial race scenarios where people are talking about things like arms races. I basically think that the situation is kind of more complex than that, and it’s important that we acknowledge that even if there’s a development race, it doesn’t have to be highly adversarial. That’s my main focus. And another area that I’m thinking about a lot is the kind of intersection between policy questions and questions in safety.

Miles Brundage: Yeah. There are a couple of things that I’m thinking about these days, broadly in the bucket of making sure that AI goes well and that OpenAI does what it can to set the right sorts of norms around AI development. Today we had a big blog post, which was a step in this direction in the context of disclosure norms and publishing norms, where we were very transparent about our concerns around the malicious use of a certain technology. In addition to that, I’m also thinking about some of the same issues Amanda mentioned, around cooperation and what the sort of technical and social mechanisms are that we might draw on to build trust between different countries and companies.

Jack Clark: And I spend a good chunk of my time thinking about the sorts of interventions that Amanda and Miles and other members of the team can make, which will be most effective. I do that by spending a lot of time in Washington every month, what I call the happiest place on Earth. I also go and spend time in other major cities close to other sort of reasonably large governments as well.

Jack Clark: I think that one of the challenges for AI policy is being able to do significant research that deals with many of the open questions in this domain, which are also well calibrated to what governments are trying to think about and trying to actively work on today, and ideally trying to make sure that that research is aligned with the sorts of bills, legislation, or projects that governments are thinking about doing in this domain, so we can work collaboratively.

Intro to AI policy

Robert Wiblin: Maybe Jack, we’ve already had two episodes broadly describing the problem of AI policy, one with Miles and one with that Allan Dafoe. But you want to just to quickly, for people who haven’t heard that described, what is the issue with artificial intelligence and how we ought to approach it as a society and, I guess, as a government?

Jack Clark: The good thing about AI is that it applies everywhere. And this is also the extremely bad thing about AI for a policy challenge, because AI’s effect on policy is happening in two major areas. One, it’s happening as an augmentation of existing policy problems. So you think about fairness in the criminal justice system. Well, that’s an area where AI is today having an effect and accentuating the sharp edges of those problems. Similarly, you know, AI and insurance with regard to discrimination, debates about inequality are influenced by the effects of AI on actors and the economic marketplace, and so on and so forth. And then at a meta level, the question of AI policy to me is what new questions need to be worked on, or what existing policy things need to be dramatically reframed?

Jack Clark: So if you think about issues like controls on AI in the same way that you would think about controls on previous transformative technologies like nuclear technology or so on, it’s clear that AI has very different rules and very different traits, which means that the challenge there is different. So a lot of AI policy right now is about discovering those areas where there needs to be new work on defining the questions so we can go and change things.

Significant changes over the last year or two

Robert Wiblin: The last episode we had on this topic I recorded about a year ago, maybe even a bit longer than that, and it’s a field that’s changing incredibly fast, because it kind of really only emerged in its own right a few years ago. Maybe could each of you in turn describe what the most significant changes have been over the last year or two?

Miles Brundage: Sure. Definitely, there’s been a lot of mainstreaming of AI in public discourse and AI policy and AI ethics as areas of discussion within the research community. I would say that it’s sort of been continuous with what happened in previous years. You know, in around 2015 there was the first FLI Conference and the Open Letter on Robust and Beneficial AI. So a lot of these ideas around sort of social responsibility in the AI community had been percolating for a while, but they’ve been more mainstream in terms of conferences and researcher conversations, and in the case of our blog post today, sort of concrete decisions taken by AI labs as these issues have gotten more clearly connected to the real world and AI has gotten more impactful.

Jack Clark: I’d say from my perspective that the politicization of AI, the realization among people taking part in AI, that it is a political technology that has political effects, has been very significant. We’ve seen that in work by employees at AI organizations like Google, Amazon, and Microsoft to push back on things like AI being used in drones in the case of Google or AI and facial recognition in the case of Microsoft and Amazon. And that’s happened alongside politicians realizing AI is important and is something they should legislate about. So you have Ben Sasse, who’s a Republican senator here in America, has submitted a bill called the Deepfakes Prohibition Act, which is about stopping people using synthetic images for bad purposes.

Jack Clark: I think that the fact AI has arrived as a cause of legislative concern at the same time that AI employees and practitioners are realizing that they are political agents in this regard and have the ability to condition the legislative conversation is quite significant. And I expect that next year and the year after, we’re going to see very significant changes, especially among western countries, as they realize that there’s a unique political dynamic at play here that means that it’s not just like a normal industry in your country, it’s something different.

Amanda Askell: I think some of the biggest changes I’ve seen have mainly been in a move from a pure problem framing to something more like a focus on a greater number of potential solutions and mechanisms for solving those problems, which I think is a very good change to see. Previously, I think there’s been a lot of pessimism around AI development among some people, and now we’re seeing really important ideas get out there like the idea of greater collaboration and cooperation, ways in which we can just ensure that the right amount of resources go into things like AI safety work and ensuring that systems are safe and beneficial. I think that one good thing is that there’s perhaps a little bit more optimism as a result of the fact that we’re now focusing more on mechanisms and solutions than just on trying to identify the key problems.

Robert Wiblin: Yeah, do you want to elaborate on that? Has there been a change in people’s sense of what the most important questions in this field are and what people are looking into in more detail?

Miles Brundage: I’ve definitely noticed a growing familiarity/agreement with the idea that there’s some sort of collective action problem here, not necessarily convergence on a very concrete framing, but I think some of the ideas in, for example, the book Superintelligence and more recently sort of more prosaic versions of these arms race concerns and the autonomous weapons and other contexts have caused people to think about, oh, maybe we need to find a way to coordinate. But that is not a very crisp consensus, and views vary a lot on exactly what the prospects are for coordination.

Jack Clark: Here in America in 2016 there was a presidential election, and it led to us having the current administration, and that generated a lot of interest from AI practitioners about how AI technology is used, because suddenly you had an administration came in which had political goals that frequently conflict with the political values of AI researchers themselves. And I think that that has been in a way helpful, because it’s helped frame the AI problem of multi-use or omni-use or dual-use technology away from purely military terms and into this broader context of, oh, if we build AI stuff, other people can apply it in different ways. Who are these people, how might they apply it, and what steps can we as developers take to ensure that if they do get the chance to apply this stuff, they apply it in a good context? I mean, that context led to OpenAI adopting a different release strategy with some language AI work, which we have been talking about to you today. And I think that it’s going to change how most AI developers approach these questions of release in the future, which I’m excited to see.

Are we still in the disentanglement phase?

Robert Wiblin: So what kinds of things are people spending most of their time working on these days? I think a year or two ago, people were saying that AI strategy and policy was kind of in a disentanglement phase, where what we really needed was like people who could figure out what are the most important questions to be focusing on, and that’s kind of a skill in itself. Do you think we’re still in the disentanglement phase if we ever were, or has it become clear like exactly what we need to be doing?

Miles Brundage: I think opinions vary on that question. I personally am sort of bullish on a particular framing of the problem around a sort of collective action and trust and think that there are pretty tangible research problems in that area. But others might sort of disagree with that framing or find it ill-specified or have a totally different problem framing. So I think there’s both further disentangling going on and sort of more granular research agendas for particular framings.

Amanda Askell: Yeah, I think one thing worth noting here is that there’s kind of not just one central problem but a collection of problems, and so you can have different rates of progress on each of them. So some questions might be things like how do you distribute the benefits of AI going into the future, which is a bit of a different question from things like how do you prevent adversarialism between different AI developers? And I think that some of those are more developed than others. I would probably class myself more on the disentangling end of things, but I think this is probably because I have a kind of deep love of conceptual clarity. And when you get a new research area like this, you’re having to both discover what the different problems are, what the solution spaces is, what even the relevant concepts are to use here. And so I think that that has been in fact disentangled in some areas more than others, but there’s still a lot of work to be done there.

Jack Clark: Just doubling down on what Miles and Amanda said, there are definitely known problems now that there’s convergence on working on, like the problem of multiple AI organizations needing to be able to collaborate increasingly closely with each other and exchange information. I think that everyone agrees that that’s a shared problem of concern now and deserves its own investigation. So I think that it’s positive that we have some known things to work on, but the issue with a lot of this AI policy stuff is that over time the number of actors is sort of changing, which conditions the types of questions, and that the level of entanglement or disentanglement is conditioned by the growth in the field over time. So actually, every month you’ll see a new statement about AI from a government or a billionaire or a company, and you sort of have to look at your sheet of paper on which we have our grand AI policy plan, and you need to slightly redraw it to account for those different actors in the space.

Research vs. action

Robert Wiblin: How much is the field of AI policy still in the phase of just doing research and figuring out what should be done, versus actually trying to change things in the real world, like try to get organizations to change their behavior or get the government to implement particular policies?

Miles Brundage: I would say that there are multiple worlds of AI policy and multiple senses of AI policy. And the world that is of interest to our listeners might be different from the way that it’s seen by corporate executives or whatever. A lot of people doing quote unquote “policy” are sort of in information dissemination mode. They’re trying to get policymakers up to speed on what AI is and prevent them from doing crazy things and sort of answering questions from the public and thinking about press coverage and stuff like that. So there are many things that fall under the heading of policy that aren’t necessarily focused on the long term or focused on AGI or focused on optimizing for the broad interests of humanity. So I think it’s important to draw the lines in the right way.

Miles Brundage: But even if you go further and say AI policy research, that’s still a pretty broad area. I think most people who are doing AI policy research are fairly zoomed into a particular domain of application, like either autonomous cars or predictive policing or something like that, or a slightly higher level category like law enforcement technology or something like that. So I think it’s not clear yet what the synthesis of these communities will be and what an optimal distribution would be, but currently I see fairly disconnected communities having kind of different conversations.

Amanda Askell: Yeah. I think that in terms of room for growth, I would say that if people are interested in working either on the kind of more action-oriented side of AI policy or on the research side of AI policy, there’s a huge amount of room for growth in both. And I also think that they go kind of hand in hand in some cases. If you’re doing research into AI policy, very often you’re going to want to add in certain actionable steps that people can take on the basis of that research. And I think that’s really important, because there can be something a bit disheartening about reading something fairly abstract and then not being told how to respond to a problem or things that can actually be done. It’s kind of excellent when you have people in the right positions to be able to say yes, here’s a direct output of this that I could in fact do. So I second what Miles said, but would also say if people are interested in kind of one or the other, then there are huge amounts of room for both.

Jack Clark: I’d say that there’s huge room for translators, and I describe myself as that. Miles and Amanda are producing a lot of fundamental ideas that will be inherent to AI policy, and they’re also from time to time going and talking to policymakers or other parties about their research. I spend maybe half my time just going and talking to people and trying to translate, not just our ideas, but general ideas about technical trends in AI or impacts of AI to policymakers. And what I’ve discovered is that the traditional playbook for policy is to have someone who speaks policy, who’s kind of like a lobbyist or an ex-politician, and they talk to someone who speaks tech, who may be at the company’s home office or home base. And as a consequence, neither side is as informed as they could be.

Jack Clark: The tech person that speaks tech doesn’t have a great idea of how the sausage gets made in Washington or Brussels or whatever, and the policy person who speaks policy doesn’t really have a deep appreciation for the technology and specifically for technology’s trajectory and likely areas of impact. And I found that just by being able to go into the room and say, “I’m here to talk about this tech, and I’m here to talk about the areas it may go over the next four to five years,” has been very helpful for a lot of policy people, because they think over that timeline, but they rarely get people giving them a coherent technical description of what’s going to happen.

Predicting AI capabilities over the next 5 to 20 years

Robert Wiblin: I have to come back to learning about OpenAI strategy for making AI go well later on. But first, I’m very curious to get your views on what people ought to expect about what capabilities AI is likely to develop over the next five, 10, 15, 20 years. Some eyebrows being raised here. It’s a classic difficult question, but just, okay, go, Miles.

Miles Brundage: The reason I’m raising my eyebrows is that first of all, this is very up my alley. I’m interested in this sort of question, but it’s also very difficult. And for example, we had a report on the malicious use of AI last year, where we sort of had these fairly abstract, these semi-concrete, semi-abstract scenarios, which we said were plausible within five to 10 years. Some of them had to do with generation of text. I would say we classified this broad area of more human-like creation of media as a potential source of threat. But we didn’t know exactly how quickly the technical progress would occur, that NLP would have this big jump in performance compared to, say, images.

Miles Brundage: I mean, arguably there’s been substantial progress in images, but that we’re sort of catching up now in the language domain. So I think it’s hard to be very confident about those sorts of things, and in part that’s because we don’t have the right infrastructure, so we’re sort of flying in the dark about what is the most likely misuse of this language model? We don’t have good ground truth on what people are doing with crappier versions. So I think there’s a lot of room for improvement, both in terms of actually making grounded technical forecasts, as well as sort of building an infrastructure to map how these technologies are actually used.

Robert Wiblin: Just to frame the question a little bit. I follow this as kind of an amateur person with some interest in it, and I guess there’s been various posts that maybe alarm me a little bit, but I’m not sure how to read them. So, OpenAI put out a blog post describing how it seems like there’s been about a 300,000-fold increase in the amount of computation that goes into building the state of the art ML models since 2012. I guess just last week we got this news about DeepMind producing an ML program, AlphaStar, that’s extremely good at playing StarCraft and is now beating the best players or very close to doing it, which seemed like it was going to be quite challenging just a year ago.

Robert Wiblin: And then just today, OpenAI has released this blog post describing the new method of producing a kind of natural-sounding texts, paragraphs, like basically essays written by ML programs, as far as I understand, that seem at least some of the time quite convincing. It’s almost as though a human wrote them, although they don’t have perhaps a great grasp of the concepts, and they’re not saying anything terribly sensible. It’s just very hard for me to read this and get any sense of, is this going ahead of what we thought it would? How difficult are these tasks in reality? Does anyone know the answer to these questions?

Jack Clark: One thing that’s become clear is that there’s an interplay between the kind of complexity of the task you’re trying to do and also how many tasks you are doing. So we’ve moved from this regime of evaluating single purpose AI systems against single benchmarks, to usually single AI systems against multiple benchmarks. And you’ve seen this in reinforcement learning, where we’ve started to test single agents on kind of multiple games or multiple games of games. You’ve seen this in language modeling, where in the case of what we released today, we’re testing that on like 10 or 11 different things. You see this in other large scale systems, even AlphaZero, which is playing chess and shogi and Go.

Jack Clark: And so the fact that we’ve moved to sort of multiple evaluations of single systems should itself tell us that there’s been a significant growth in the underlying complexity of these things. They now have symptoms which need to be evaluated, symptoms of some kind of like air quotes “cognition”, which is very weird and different to a specific point-in-time task. I think the other way that I think about it is that humans are incredibly bad at modeling really big growth curves, and so when we see this growth of compute by like 300,000X in six years and see that it correlates to many of these machine learning results, which have been surprising, like machine translation, Dota 2, AlphaGo, the original DQN algorithm on Atari, it makes me think that our ability to predict any further than the next three years is actually somewhat limited.

Amanda Askell: Yeah, I mean I think I would second that. And I would also say that people can look at some of these results and maybe be alarmed or, you know, just see something like a kind of upward trend, but it’s also worth noting that sometimes key difficulties are very hard to predict as well, so not only areas in which you’re going to see more progress and areas in which you will have more data, for example, but just areas in which you suddenly see technical difficulties that you didn’t anticipate. And it’s worth bearing that in mind. Yeah, I think this is a reason at least to be a bit more measured in one’s response to these results.

Miles Brundage: And just to add one point, is that it’s important to distinguish things that we can be reasonably confident about or could be more confident about, like the sort of structural properties of AI as a technology to be governed, the idea that once you’ve trained the system, it’s easy to produce copies of it. That sort of has some social implications and implications for how you release things, like our decision today. Things like the fact that the upper limit on the speed of these systems is much higher than humans. You can see that with the case of GPT-2 in our announcement today. What’s impressive both is that it’s producing these coherent samples, but also that it can do it at a superhuman rate and scale. So I think that we have to think not just about what is the sort of waterline of capabilities, but also what’s the sort of scale up from those to social impact, in terms of speed, quantity, et cetera?

Jack Clark: I’d like to just reiterate kind of what Miles said and note that there are hard problems which we know are definitely going to be here forever, like how do you release increasingly powerful systems while being confident that they aren’t going to be able to cause harm? That’s a long-term kind of safety problem, and it’s also a short-term real policy question in the case of today’s text generation systems or things like facial recognition systems.

Timelines

Robert Wiblin: So how much do you kind of focus on what needs to be done to make sure the AI goes well in the next couple of years versus the next couple of decades? It seems like there’s different timelines where you might be focusing on like what this community ought to be working on.

Jack Clark: Just from the OpenAI perspective, our stuff, our activities are designed to be robust to the long term and ideally as a second order effect should help the short term. So you know, one of our initiatives is going and talking about the need for better methods to measure and assess AI, and we want a broader number of people to be doing that, not just AI organizations but specific government agencies, third party researchers, academics. That’s something where if we did it today, it would just improve debates and decision making about a number of near-term policy questions. But fundamentally what it’s doing is it’s building capacity for having a global community of people that think about measuring AI progress, which we think is a prerequisite for sensible policy with regard to long-term powerful systems.

Miles Brundage: Yeah. I think there is some sort of irreducible uncertainty about how much the challenges we’re facing today will translate into future ones. But as Jack said, we should be very mindful of sort of locking in the right or wrong set of institutions and norms and debate. So that’s something I worry about, is sort of maybe we don’t have to solve everything in the next year or two, but we do want to at least do some damage control and prevent people from locking into an AI arms race mentality, for example.

Amanda Askell: Yeah, I think it’s tricky, because in most cases there’s kind of robust interventions that you can do that work pretty well, regardless of whether you have kind of in the long term and in the short term. The key worry’s going to be cases where there’s any inconsistency between what you would do now if you were thinking that you’re going to face a challenging result in three months versus a challenging result that’s going to happen over the course of a year. I think for the most part there’s often not attention there. You want to do things like build the kinds of institutions and responses that are great now and great going forward into the future. I do think that in many ways, and this is an opinion that I’d be interested to know, maybe other people disagree with it, but challenges that come quickly and kind of in a way that you didn’t anticipate are much more difficult than ones that you have a lot of time to respond to and build institutions around.

Amanda Askell: And so in some ways you can think that actually we will have a lot of time to deal with some of these problems, and we can simply build, slowly build the institutions that we think are good for managing them, so around things like text generation. But it can be worth just doing work that assumes that you won’t necessarily have that long time to think about things, just because in that case it can be really hard to spin up a response really quickly unless you have in fact been anticipating that possibility. So sometimes we can end up working on things that are focused on, well what happens if we discover in three months that there’s going to be this really important result that’s going to have massive policy implications? And ideally you’re like, well, we’ve already been thinking about that for the last year, so that’s great.

Jack Clark: Yeah. And we to some extent already do this. The long term good version of the world you want is you want to have formal processes for coordinating between different labs. That’s going to obviously take a while to build. While we’re trying to build that, we’re also creating the super hacky version of that, which works today, of informal relationships between us and people at other labs, basically because we will get surprised, and we will need to draw on things that have the shape of the sort of institutions we want to build in the long term and which function in a similar way today, but which are the Wright Brothers held together with Scotch Tape equivalent of the big jet engine that we’re sort of driving towards.

Robot hands and computer games

Robert Wiblin: Jack, if I understood you correctly, you were saying that it’s interesting that we now have kind of reinforcement learning algorithms that can accomplish multiple quite different outputs. And I’m interested, it seems like you’re using the same reinforcement learning algorithm here at OpenAI to both train a hand in how to pick things up and manipulate objects, as well as to win at this game, Dota 2, that is, Defense of the Ancients 2. It’s a game kind of like StarCraft II, as far as I know. That’s kind of surprising to me that you would use the same underlying system for that, but perhaps that just shows my naivete about this technology. What’s the story there?

Jack Clark: I’ll tell you the cartoonish explanation of why a robot hand and a computer game are just the same problem, and maybe that will shed some light on this. So in Dota 2, you have to control a team of several different people. You need to move them around a map, and you need to attack an enemy. With the case of a robot hand, you need to hold an object, move your fingers, and rotate it to the desired position. So what do those things have to do with each other? Well actually, they have weirdly large amounts of stuff in common. Your hand has 20 or 30 different joints in it. At the same time, the number of actions that you can take at any one point in time in Dota 2 is 10 to 20 main actions, plus you can select your specific movement.

Jack Clark: And in the same way that when you’re rotating an object in your hand, it’s partially observable. You’re aware of the connection of the object, where it connects to your own sense of it, and sense of its friction, but you aren’t aware of the shape of the entire object from a sensory perspective. There are bits which are occluded to you, bits you can’t feel. It’s the same in Dota 2, where you are not able to see the whole map, you’re able to see the bits where the enemies’ connects to you or where you explore it. Counterintuitively, you end up in a place where from a computational perspective, these are remarkably similar problems. And the truth is, is that many problems in life are similar at root when it comes to compute. We’ve just lacked generalizable software systems that we can attach to those problems that can basically interpret the different inputs and compress it down to the same computational problem, which we then solve.

Jack Clark: We used an algorithm called Proximal Policy Optimization, PPO, which is a fairly robust algorithm. What we mean by robust is really just you can throw it at loads of different contexts, and you don’t need to worry too much about tuning it. It will sort of do okay initially. I think that speaks to the huge challenge of AI policy, is that we are going to continue to invent things like PPO. We are going to continue to do things like train an increasingly large general language model, and whenever we do these things we’re going to enable vast amounts of uses, some of which we can’t predict.

Miles Brundage: Yeah, so I’ll just comment a bit more concretely in the context of language models. I think it’s a particularly tricky case there, if you read the blog posts and the research paper, it’s clear that the strength of the system comes from this sort of fairly unsupervised or very unsupervised sort of learning process on huge amounts of diverse data. So it’s sort of hard to maintain the strength of that system while having a sort of more controllable, say, single topic system that can only do one thing. It’s hard to sort of … we don’t know what a pipeline is yet that will result in a fairly narrow but competent natural language system that doesn’t have potential for misuse, but we now seem to know how to make a generic one that does have potential for misuse.

Robert Wiblin: So if in fact these tasks are subtly more similar than what it might appear, perhaps is it less interesting that a very similar learning algorithm can learn to do both of them? Because you might worry that oh, it actually, if most tasks are like more similar than it seems, then you might expect more rapid progress, because we can just have one underlying learning process and then it can just learn to do practically everything that humans do. But maybe it’s just that like it happens to be that hands and computer games are similar.

Jack Clark: I think that this is a general rule that will come true over time. The story of the last few years has been increasingly robust algorithms that are more resilient to the context changing around them. And if you look, you step out from individual things like reinforcement learning to just look across supervised learning, RL, unsupervised learning, you see this trend across all of it. I should note that it wasn’t very easy to get this to work on both. Like we’re incredibly excited it did. It was a humbling experience for OpenAI to work on real robotic hardware. I would recommend everyone who has calibrated intuitions about AI timelines spend some time doing stuff with real robots and it will probably … how should I put this? … further calibrate your intuitions in quite a humbling way.

Malicious AI Report

Robert Wiblin: All right, let’s push on to the malicious AI report. So last year in February you released this malicious AI article. I think it had 26 authors of which I think at least Miles and Jack were on there. Maybe Amanda had some input as well. Yeah, I guess you had like four high level recommendations that we might be able to go through. But maybe do you just want to kind of summarize what was the key message here and perhaps … it sounds like today’s article about the production of a kind of natural language definitely plays into this, or is an example of like a risky application of AI.

Miles Brundage: Yeah, so I am not sure that this is exactly how we were thinking about it at the time, but in retrospect I think the best way to think about it is that the malicious use report sort of framed the general topic at a high level of abstraction and pointed to a lot of key variables and structural factors, like the scalability of AI, that might cause one to sort of have some reason to be worried about this stuff. But there was irreducible uncertainty or possibly somewhat reducible, but some substantial uncertainty about which the most promising defenses were and what the most worrying threats were. And I think now we have a much richer sense of what the threat landscape looks like in the case of language, and I think over the course of time that’s sort of how we’ll follow up on the report, is sort of diving deeper.

Miles Brundage: We sort of framed this high level search problem of find a way to deal with dual-use AI and now we know a little bit more about what the levers and options are in one context, but I think the broader issue still remains.

Jack Clark: And one of the recommendations of the report was that AI organizations kind of look into publication, and different methods of doing different types of publication. So today, with this language model, we’re releasing a research paper. We’re not releasing the data set. We’re releasing the small model, not the large model. So we are trying to sort of run almost a responsible experiment in this domain that was recommended by the malicious uses report. We broadly think that lots of the recommendations in that report probably need to get more evidence generated around those recommendations, and so we’d be excited to see other organizations also do this and create more case studies that we can then learn from.

Amanda Askell: Yeah, I think one of the useful things is that we have used this as a kind of reason to make sure we’re kind of evaluating the potential for misuse of our own systems. And I think this is helpful both because it means that we end up using these as essentially case studies in how to do this well, and then get feedback on that and try to make sure that we are doing so responsibly. Which might seem trivial from the outside, but I also think it’s really easy for people who are building things with the intention of doing good, which is the case with almost all ML researchers, to not think about the ways in which someone who wanted to misuse the system could misuse the system. And so I think the fact that we are starting to do that kind of evaluation is important.

Amanda Askell: And I think also, ideally, the more that other people do this, then we end up getting more case studies on like dual-use of systems and now to respond to those concerns, including feedback that we get on publication norms for example.

Four high level recommendations

Robert Wiblin: The four kind of high level recommendations … or what was it. I guess I’ll read bits of them. So number one was, “A policymaker should collaborate closely with technical researchers to investigate, prevent and mitigate potential malicious uses of AI.” Two was, “Researchers and engineers in AI should take the dual-use nature of their work seriously.” Three is, “A best practice should be identified in research areas with more mature methods for addressing dual-use concerns.” Then four is, “Actively seek to expand the range of stakeholders and domain experts involved in discussions of the challenges.” Reading that it all feels like very high level.

Robert Wiblin: It’s like who should be doing what, I guess, is a little bit the response that people have.

Miles Brundage: Yeah, so I think it was somewhat high level on purpose, because we had … or by necessity because we had 26 authors and multiple institutions. But yeah, I think there was also some inevitable abstraction, because there are a bunch of like known unknowns that relate to what the most worrying concerns are that we had limited information about at the time. So I think it was kind of inevitable that there would be some learning process and that some of our recommendations would miss the mark. So concretely, we now have a better understanding of what a concrete experiment in a different approach to openness looks like, and we’ll be following closely over what researchers’ reactions are and whether others reproduce our results and if so how quickly and whether they publish.

Miles Brundage: So there’s a bunch of information that we’ll be getting in this particular context, but I think more generally there are other domains that we have even less information about.

Jack Clark: Yeah, and I could maybe tell you a little sort of cartoon story for how I think of this. So one of these recommendations is about policymakers being better able to kind of assess and mitigate for malicious uses of AI. So how do we get there? Well I think that means that technical experts need to help produce tools to let policymakers assess malicious uses of AI or unsafe uses of AI. You could imagine an organization like OpenAI coming up with some metrics that relate to the safety of a given system, trying to work with a multi-stakeholder group like, say, the Partnership on AI. Having that group or a subgroup within it think about the safety measures that OpenAI has proposed, and if they end up agreeing that those are good measures, you could then go to policymakers and say it’s not just one organization.

Jack Clark: It’s this subset of an 80-person membership of PAI that has said you should consider using this technical metric when thinking about safety. So we can think about actual discrete sets of work that people can do here now, which I think is new, and I’m excited to have us all figure out what those should be, because there’s clearly a lot of stuff that needs to get done.

Amanda Askell: I think one thing that is useful to know on the question of like the abstraction here is just that I think it can actually be good in many cases to have fairly abstract recommendations when you’re looking at an extremely broad domain potentially. So it doesn’t make a lot of sense to give really specific recommendations of the form don’t release your model, because in lots of cases if we just had that as a norm across the field, you would expect it to be pretty harmful. And so you have to do a lot of things on a case-by-case basis, which means that one of the first things you end up doing is just giving kind of abstract recommendations and principles, and then you look at specific cases and you say well what precisely can we do in this case is getting the right balance between, say, openness and making sure that we’re preventing malicious use, and then using that as like a study going forward.

Amanda Askell: So in many ways I want to kind of both defend but also note the importance of not giving hyper-specific policy recommendations when it comes to just what is an extremely broad range of potential innovations and events. And so that’s probably why it’s good to keep things like abstract and high level when the domain is extremely broad.

Miles Brundage: And often it’s the case that the optimal action to take depends on the actions of others, so you probably shouldn’t specify everything in advance. So for example, our decision on the language model release might have been different in a world in which we knew that someone else already had a hundred X bigger one and was about to release it.

Consequences for next couple of years?

Robert Wiblin: Are there any … yeah, any malicious uses of AI that you think we should anticipate occurring over the next couple of years, and are there any like concrete things that we should be thinking about doing now to protect ourselves against those changes?

Jack Clark: I guess I get to be Dr. Doom here. I think that for some of these malicious uses of AI, talking about them is itself a safety and a policy challenge. You know, we think about the topic of information hazard here at OpenAI, and what that means is when you’re talking about some research or even talking about hypothetical research, are you at risk of saying something that could kind of differentially accelerate some actor or a group of actors towards developing AGI more or an unsafe thing. So that’s why I’m going to be a little cagey in my responses, but I do have a couple of examples for you.

Jack Clark: I think that the intersection of drones and increasing amounts of autonomy via pre-trained models is going to be an area of huge policy concern, and I’m inspired to say that because we have observed that, in the field of asymmetric warfare, groups have used drones to go and do new types of warfare because they let them access a new type of military capability, which is I can go and cause you trouble at distance. Attribution to me becomes harder. I have control of a mobile platform, and I can drop munitions from it. We’re obviously concerned about what happens when those mobile platforms that can drop munitions gain autonomy.

Jack Clark: And that will be an area where you have real questions about publication norms emerge, I would expect very quickly from the first case of that happening. I think our work on language is us trying to experiment in the domain where you’re not talking about people’s lives being at risk. You are talking about severe effects to be sure, but you’re a little ahead of where the rubber really hits the road there. The other way we can expect these malicious things to be used, I think, is just in poisoning public spaces. So I don’t have to be that smart to make it difficult for you and I to have a conversation, I just need to be incoherent and to never stop talking.

Jack Clark: Which is actually relatively easy to do, and I think that when people start to do that, that’s the point when governments are going to start to think about speech as being human or AI-driven, which will raise its own malicious uses and sort of legal questions.

How much capability does this really add?

Robert Wiblin: Yeah. I guess, if I can play skeptic for a minute: So yeah, I guess when I read that report, when I think about this in general, I find that like it’s easy to whip myself into a lather of being worried about all of these new potential threats. But then sometimes when I think about it, I’m like, “Kind of we can already do this stuff.” So it’s like, with drones for example, you could try to like … you could shoot people from the drone, but one is governments can already do that. But also ever since we’ve had like sniper rifles, it’s been like fairly easy to try to shoot someone from a long way away and very hard to get caught.

Robert Wiblin: There was like that spate of terrorist attacks in D.C. I think, back in 2001 or 2002 where just people got into the back of a car and started shooting people from far away with a sniper rifle, and it was like extremely hard to catch them. So this is like something that people could already do. Is the drone adding all that much there? There’s also like hackings or breaking into systems. I think we already believe basically that like all the major governments in the world have hacked the electricity grids of most of the other major governments in the world and would shut them off or try to do so if they ended up in a war with them.

Robert Wiblin: So in a sense, it’s like how much capability is this really adding? I think like even during the Syrian Civil War, there was like vigilante groups that had pretty substantial cyber war capabilities. And for example, people also worry about AI being used in kind of phishing attacks and things like that. Phishing people is already so unbelievably easy it’s kind of hilarious, and like everybody needs to get U2F keys to protect themselves against that already without adding AI into the picture. Yeah, do you kind of share sometimes that people can be a bit hysterical about things that are not different than what we already have?

Miles Brundage: For sure. And some people accuse us of being hysterical with the malicious use report. I think time will tell who was or wasn’t hysterical or had their heads in the sand, or other characterizations, but I think one point that I’ll sort of reiterate from earlier is that it’s important to distinguish sort of the capabilities of the system on some human scale or some other scale versus the structural properties of the technology, like scale and speed. So I think just because humans are already doing it doesn’t mean that it won’t change the economics of crime or the economics of information if you made it much easier to do it.

Jack Clark: Well I think we know that when stuff gets fast and/or cheap, the dynamics change. You know, Miles just alluded to that, and I think that if we can think of a world where phishing via AI is a hundred times cheaper than phishing via a person, or generating disinformation via a human is a hundred times more expensive than using an AI, then you’d expect to see the types of people using this technology change. To your point about sniper rifles and such, yes, however I think that the drone argument is more compelling where you’ve been able to buy the ability to go and attack people at distance for a long time via stuff like sniper rifles, except they’re somewhat controlled. And now this sort of drones arrive, and now I have the ability to attack people at distance which is much, much, much, much cheaper.

Jack Clark: It’s also much faster for me to acquire like drones and use them against people than it is for me to acquire loads of sniper rifles. And so there you saw that a change in the speed of deployment and also the cost of deployment meaningfully altered the behavior of actors, and also meaningfully altered military responses. So if I’m a military now, and I’m sending soldiers into an area where I’m dealing with sort of asymmetric war-minded people, I have to have soldiers who are protected against small drones carrying grenades, so now I need to outfit them with different things.

Jack Clark: So actually you see that tools are always used to litigate, like, the economics of war, and it may not seem like a big deal that you’re just changing the tool, even though the capability remains the same, but if you look at what it does to the costs and incentives of the different actors, those changes can have really significant second order effects.

Amanda Askell: Yeah, so I think when we think about things like technically we can do some action now, so why should we be worried about things in the future, one thing that’s important to think about is what is it that prevents more of these actions from occurring. And I think when you think about that, often it’s things like well, it can be done, but it’s not trivial to do, for example. So if you make something a little bit less trivial to do, you see like a large reduction in the amount of people that do it. Another thing is like we’ve built up pretty secure institutions to prevent people from behaving badly. You know, we’ve created a kind of set of incentives around it, and so one thing you should be concerned about I think are cases where the current mechanisms — and those are just a couple of examples that mean that things that we’re technically capable of doing, we don’t see, like, massive misuse of them — think about ways in which those mechanisms could break down if you see kind of technological advances of certain types. And I think … so the move from it is possible to commit a successful phishing attack to it is trivial to … if you just want some money you can just do it instantly … in my mind, like I think it might make a huge difference to the amount of this that we see. Similarly think about things like the institutions that we have.

Amanda Askell: So like legal institutions around these issues and how responsive and well-adapted they are to some of these problems. And if we think that we don’t anticipate current institutions actually being able to deal with it well, then that’s another reason to be really worried that as you make these things easier and more trivial, you’d just see like a lot more of it in the future.

Countermeasures

Robert Wiblin: Yeah, I’ve read some articles lately about this drone issue, because apparently like ISIS was using them in the war in Syria. I guess I was left a little bit confused about why it’s so hard to design countermeasures against them. You’d think you could just create like counter-drones that you just say go and crash into that drone and pull it out of the sky, and like how can that be so much harder than designing the attack drones in the first place. Because I’m very curious to hear whether there’s like any ideas out there for countermeasures for the kind of things that we’re worried about in the next few years that are already being developed, or people are already trying to get policy implemented.

Miles Brundage: There is a lot of interesting countermeasures. They vary in terms of scalability and cost and so forth. I think more generally … I mean I’m not an expert on what the state of the art of the countermeasures, but for a publication on this general issue, there’s a good paper by Ben Garfinkel and Allan Dafoe on sort of how the offense and defense balance might scale over time as we automate sort of both sides, and I think that’s very relevant here.

Jack Clark: So one dynamic that I think is important is that many ways to both attack with AI and defend against AI involve compute … it involves spending some amount of resources on compute. And so there’s this underlying dynamic which is yes, we may have technical countermeasures. It’s unclear how many of these countermeasures can be defeated by the attacker having a bigger computer, or not. And I think with that idea of the extent to which AI is like offense-dominant or defense-dominant depending on the underlying computational resources of the actor will have a big bearing on AI grand strategy. And it’s not clear how we get better information about this.

Authoritarianism

Robert Wiblin: One malicious use of AI that stuck with me from the interview with Allan Dafoe was the potential for China to use it to like stabilize kind of authoritarian rule by massive scaled surveillance that’s very cheap and tracking a lot of information about kind of every citizen and being able to keep tabs on them so that such that its very hard to engage in any kind of civil disobedience. And I guess you just raised the issue here of it just poisoning politics by allowing you to have kind of garbled speech at such a huge scale that it’s hard for humans even speak to one another, at least online. Yeah, do you have any thoughts on how that situation of like AI’s influence on politics is progressing?

Jack Clark: One of the clear truths is that AI augments other stuff. You know, AI isn’t really a thing in itself. Maybe in the long-term if we have long-term powerful AGI-style systems it will become that way, but for now, a lot of AI takes place in the form of a discrete capability that you layer over in other parts of the world. Therefore, AI is uniquely powerful in the context of political systems where you can dovetail your political system and structure into your technology substrate as a society. And that’s something which in the market you’re going to have more trouble with, because the markets may not allocate technical resources to your specific political will.

Jack Clark: There may be confounding factors, like it just doesn’t make money. People don’t like it. In the case of [Maven 00:50:11], the employees don’t want to build it. All of these problems. In the case of a different regime, a regime where you have government and tech move a lot more together because they’re naturally bound up more through a whole bunch of things, AI is going to make you more effective along those lines. One of the challenges that we’re going to deal with in the West is that we have a certain political system here which doesn’t seem to get accelerated that much by AI, whereas more centralized and control-based systems do seem to get accelerated by it.

Jack Clark: I think there’s an open question as to whether that’s like a risk that societies that don’t have that capability need to think about, or perhaps an opportunity to think about how might we structure ourselves with AI when the more advanced AI systems we need to sort of make our government better arrive. It’s a good and weird question.

Amanda Askell: Yeah. Just to like kind of agree with Jack on some of this, I think that one thing you want to think about are ways in which our current institutions are already a little bit robust to some of these problems and ways in which they aren’t. So I think one reason why people have been so kind of concerned about like the possibility of generated news or like generated political speech is that in some ways, our system isn’t as robust to that as we might think, and I think we’ve seen that. You know, where people are allowed to post articles, and people are allowed to post articles on their social news feed, and ways in which that can just be kind of undermined because it’s not something that we have these safeguards against.

Amanda Askell: And that’s going to just vary across different states, essentially. So in some states you do in fact have some safeguards against like malicious kind of use of speech in political campaigns, and in many states you have similar mechanisms to prevent like massive surveillance in ways that could be problematic. So I think that it’s important to kind of look at this as a problem of like … in some ways that these are problems that every state has and not just something like centralized states versus the West, for example. So yeah, just kind of worth noting, I think.

Politics

Robert Wiblin: People have raised this concern about AI influencing politics a lot with the 2016 presidential election. I think, having looked into it a little bit, I’m not as convinced that it had all that much impact on how people voted at the end of the day, but I suppose I’m left in this awkward position of saying it didn’t have much impact but I’m really worried that it will in future, because we’re just kind of seeing the tip of the iceberg here, or just seeing the very beginning of what’s possible. I think another one with this is a lot of people are worried about AI or technology causing mass unemployment, and I think I don’t see very much evidence at all that like … that any implement that we’re seeing today is mostly driven by technological improvements.

Robert Wiblin: But I am like very concerned that in the longer term … like it’s quite possible that like almost everyone will be out of a job, because AI and machines will be able to do everything that we can do better. Yeah, do you have any comments on this? Have you looked into either of those questions?

Miles Brundage: Yeah. So I mean first of all, there are several outstanding bets and you know, some resolved and some outstanding bets on this topic, and … for example, Tim Hwang, Rebecca Crootof and a few other sort of relevant experts have been featured in IEEE Spectrum magazine debating these topics, and sort of making bets about whether it would have a big impact in 2018 and in 2020. And I think it’s hard to come to a strong conclusion about these things, because you could interpret the evidence in various ways. You could say oh well it didn’t happen this time, but that’s because they’re saving their special sauce for next time.

Miles Brundage: So sort of an unfalsifiable perspective. So I think that’s why we need these conversations to be more grounded in what’s actually happening and sort of build that infrastructure.

Amanda Askell: Yeah, I think in the case of unemployment it’s just an extremely difficult question to answer, because a lot of it varies by how responsive the market is, for example. So if you see like the automation of like one small field, does this basically have very little impact on unemployment because people can just get jobs in other fields, and you’ve got general economic growth as a result of that? This seems kind of plausible to me. In many cases it’s a bit harder to anticipate what would happen if you saw this happen more rapidly and across a broader range of fields. I also think that one question people have started to ask that is really important is who this affects.

Amanda Askell: So for example, could automation have a really negative effect on people in developing countries? So not just thinking like within states, but between states. So it’s very hard to predict based on what we’ve currently seen, and I can understand why someone would be optimistic based on that, but I also think there are reasons to think that if you … So for example like rapid automation of larger fields, then it might be that we do see changes that we didn’t anticipate based purely on … like looking at some very specific type of factory work being automated.

Jack Clark: I think a good example here is when we got the first websites for uploading and sharing videos, I think everyone thought great, here’s a way to waste time or to better inform myself about the world. And we did get that. What we also got was a system whereby we plugged the incentives of advertising and clickbait into creating content for 3-year-old to 5-year-old children to essentially sort of hack their brains. No one decided to go and make content to hack children’s brains. We just built a system that intersected with a market that led to there being enough incentives there for that stuff to be created. And I think that highlights how it’s really, really tricky to correctly anticipate where it’s really going to hurt you.

Jack Clark: And I think that some of the mindset which I want us an organization to think about and other AI researchers more generally is it’s helpful to imagine the positive uses of your stuff as well as the negative uses. And yes, it’s likely that many of your predicted negative uses are not going to be the ones that matter. Like in the case of deepfakes. In the short-term, yeah, maybe concerns about them being used in politics were overblown, but maybe the concerns about them being used to target and harass women who had been in relationships with like skeezy men who then make deepfaked porn out of those women to embarrass them after a relationship … Maybe that was underblown.

Jack Clark: Because that’s caused real human harm. We just don’t talk about it, because it doesn’t fit with a big narrative like politics. It just fits to what an individual’s life has become like. But we’re now in a world where an individual has to deal with this, especially if they’re female, as an attack vector. And that just adds sort of cognitive load to their life and has all of these effects that we can’t quite predict the outcomes of.

Why should we focus on the long-term?

Robert Wiblin: Yeah, this is a criticism I guess that some people would make of this whole field, is to say that it’s just so hard to anticipate what’s going to happen, even in a couple years’ time, that when it comes to AI policy, we should really be focusing on fixing the problems that are occurring right now, or that we can … you know, we think might happen in the next few months, rather than trying to look ahead. What do you think of that?

Jack Clark: I’m going to politely completely disagree with you, and make a point that Miles also made earlier, and Amanda has been making, which is that there are these larger problems that we know to be true. We know that with increasingly transformative technology, you need the ability to coordinate between increasingly large numbers of actors to allow that technology to make its way into the world stably. That’s not going to stop being a problem of concern, and it’s not going to stop being a problem that gets more important over time. Of course we can’t say we need to all work together now because we have a specific technical assumption that will come true in eight years. That would be totally absurd. But I don’t think anyone serious is kind of proposing that.

Jack Clark: They’re saying we have the general contours of some problems. We accept that the details may change, but there’s no way the problems change unless you fix all of like human emotion and fallibility in the short-term, which is unlikely to happen.

Amanda Askell: Yeah. In many ways I think I just don’t see the projects as being inconsistent, and there’s just room to do work on both. So sometimes I don’t like it when it’s kind of pitted against each other. Like I’m really glad that people are working on these immediate problems, and in many ways I think when you’re working on problems that are … or you’re trying to think about long-term problems, the issues that are identified in these like immediate-term problems can often be the kind of seeds of things that you are generalizing, both in terms of concerns and in terms of solutions. So if you see something like deepfakes making women’s lives terrible, you can think about things like well what are the mechanisms that we would usually use or we would want to see in place to prevent that from happening?

Amanda Askell: Are those generally good mechanisms that could in fact help with like future problems of that sort? So I think it’s tricky because I’m just … yeah, I think it’s more like reiterating the point of short-term work and long-term work are not inconsistent, and in many ways very complementary to one another.

Robert Wiblin: So I guess … because we’d talked about there’s a malicious AI report that’s kind of kept our focus a bit on like what kind of things we might expect in the next five or ten years. But I guess most people I think … I may have mentioned all of us here, are like mostly concerned about this because we think AI’s going to have really transformative impacts in the longer term, and focusing on what’s happening in the shorter term is kind of a way into affecting the longer term. Do any of you want to comment on the relationship between these two issues?

Miles Brundage: Yeah. I think the distinction is super overblown, and I mean I’m guilty of having propagated this short-term long-term distinction in among other places in an 80,000 Hours article a while back. But I think there are a bunch of good arguments that have been made for why there are at least some points of agreement between these different communities’ concerns around sort of the right publication norms and what role should governments play, and how do we avoid collective action problems and so forth. So first of all they’re structurally similar things, and secondly they plausibly involve the same exact actors and possibly the same exact sort of policy steps.

Miles Brundage: Like sort of setting up connections between AI labs and managing compute, possibly. So there are these levers that I think are like pretty generic, and I think a lot of this distinction between short and long-term is sort of antiquated based on overly confident views of AI sort of not being a big deal until it suddenly is a huge deal. And I think we’re getting increasing amounts of evidence that we’re in a more sort of gradual world.

Robert Wiblin: Because it seems like to me on the technical side, people have really started to think that the problems we have with AI not doing what we want today are just like smaller cases of this broader problem of like it being very hard to like instruct ML algorithms to do the things that you really want them to do, and the kind of … but basically it’s just the same problem at a different level of scale and a different level of power that the algorithm has. Is the same thing kind of seeming true or is there some kind of convergence on the policy and strategy side that kind of these are all just the same things, it’s just that the issues are going to get bigger and bigger?

Miles Brundage: I mean I should caveat this by saying that there is uncertainty about how connected these things are and how we address the near-term things will affect how connected they are to the long-term things, so I don’t think there’s like a crisp fact of the matter. But my general direction of change over the past several years has been thinking that it’s one-ish issue.

Amanda Askell: Yeah, I think it is another area where you might just see differences across domains. So I think that it’s certainly true that you’re seeing a lot of issues that do generalize, and there’s a question of also how different they are as you increase capabilities. So you know, I think the example you might have been alluding to there is something like kind of goal misspecification. You know, so you have a social media site that is just optimizing for people clicking on ads, for example. This can be done without any kind of like malicious intention, it just ends up being, you know …

Amanda Askell: Or similarly with like kind of other goals that it turns out are not in fact the things that make the people on the social media site happy, or just continually looking at the social media site rather than doing your work or going and meeting up with your friends, et cetera. And the idea there is it can be really easy to not realize that you’re targeting the wrong goal.

And then if you scale the capabilities of that, I think the concern becomes much larger because suddenly you have a situation where just slight goal misspecification can actually have pretty radical results. So you know, people have used examples that are like, imagine a system that is monitoring or controlling the whole of the US power grid. Suddenly just accidentally misspecifying the goal of that system can be really harmful. I think this is true in some domains, but we should also anticipate the possibility of asymmetries in others.

Amanda Askell: In general it means that I don’t want the message to be something like people should not worry so much about long-term issues because here, if we just focus on the short-term problems, it will naturally result in a solution to those long-term issues. Because you could see, for example, an increase in the speed of development in a domain that you didn’t expect. And if that’s the case, then having nearly done these key studies of the immediate impact of your system won’t prepare you for the kind of implications and the sort of actions you need to take when that’s the case.

OpenAI strategy

Robert Wiblin: Let’s move on to talking about OpenAI strategy for making AI safe in general and making sure that we get humanity approaches deployment of AI in a smart way. Just wondering like at a high level, what is the approach that OpenAI is taking to make it all go well.

Jack Clark: What is our approach to definitely make sure that unpredictable, increasingly powerful advanced technology that everyone uses will benefit everyone and nothing will go wrong: an easy question. Thank you very much for asking us. I’ll give an overview and I’m sure that Miles or Amanda may have a couple of interpretations as well. As my response should indicate we have a few ideas but we’re not claiming that we know the idea. This is definitely a domain of there are more kind of questions and answers. OpenAI has three main areas. It has capabilities, safety and policy. And these are all quite intentional. You know, safety is about the long-term alignment of systems and investigations into how to assure safety from both the perspective of a person interacting with the system. You know, how can I know I’m not being deceived.

Jack Clark: Various things like that, but also from the point of view of a system designer, how can I design a system but won’t do unsafe stuff? Policy is about how do we make sure that sort of, it goes well at the institutional level, but also how do we make sure that OpenAI has enough constraints placed upon it internally to do the right thing, and how do we take the ideas that come out of safety and come out of capabilities and integrate them not only with ideas relevant to a policy domain like this experiment we’re doing it for a moment on publication norms with regard to language, but also how do we go and tell people like military organizations about safety?

Jack Clark: Because though we do not want to enable military organizations in terms of their capability, we know that they’re going to develop capability and we want that capability to be safe or else none of us get to like live in AGI world cause we all die before then, which would be unfortunate and in my case I wouldn’t like that to happen.

Jack Clark: The key idea of capabilities is that a lot of these systems are empirical in nature. What I mean by that is a priori you can’t offer great guarantees about how they will behave. You can’t offer really solid guarantees of what capabilities they will and won’t have. In the case of our language model, we trained a big model with a single purpose, predict the next word in the sentence and then when we analyzed the model we discovered, “Oh it can do summarization.” If you just write a load of text and put “TLDR, colon”, ask for a completion, it will give you a summary.

Jack Clark: Similarly, it can do like English and French translation and other things. We found that out by training a thing and then going and looking at it and sort of prodding it. And so if you’re in the world where your way to understand future capabilities for long-term powerful systems is one where you need to like poke and prod them, you really want safety and policy to be integrated into the poking and prodding of what could be some kind of proto-super intelligence. So that would be the rough notion for how OpenAI makes sure this goes well and makes sure that OpenAI is a responsible actor.

Amanda Askell: Yeah, I think I would second that and agree that one of the ways that we’re trying to tackle this is by heavily integrating these three fields and in many ways I’m kind of sympathetic to people who think it’s unfortunate that we have terms like safety and possibly also terms like policy because in almost any other discipline, this would just be part of the task of building the thing. And so in some ways I think that we’re just trying to exemplify this norm of when you’re building AI systems, you’re trying to build things that help people and are beneficial, and that means that at kind of every stage of development you should be thinking through both the social implications that your system has if you were to release it and what forms you release it, and also safety implications that it has in making sure that you have a way of verifying that your system is not going to do unintended harm.

Amanda Askell: I think Stuart Russell had a quote in this where he was like, we don’t call it building bridges that don’t fall down. We just call it building bridges. And so I think it’s really important to try, and bring these three fields together. And we’re doing that and obviously we’re also just doing further work in AI safety, which is hopefully going to be useful, and AI policy work that’s also hopefully going to be useful both within OpenAI and beyond.

Miles Brundage: In terms of like super high level framings of the problem, I sometimes think of it in terms of “figure out what steps different actors need to take and then figure out how to get them to take those steps”. And I think a lot of safety and ethical issues fall into the first bucket and then a lot of game-theoretic and economic and legal issues fall into the second. But that’s obviously a very rough rubric.

Robert Wiblin: What is the ideal vision of like how AI progresses and OpenAI’s role in making it go well over a period of decades?

Jack Clark: I guess my story here is that we help a bunch of other AI organizations coordinate and figure out what information they need to share with each other and also the processes for sharing increasingly sensitive information with each other. While doing that, we continue to stay at, in the leading edge in terms of capabilities and safety and policy. And then we’re able to use what we learned to either ensure that ourselves, as a main actor, coordinate with others to build a safe system and disperse the benefits to humanity. Or, if it’s the case for whatever reason, in the dynamics of the landscape, we are not the main person here, we are able to help that main actor make more correct decisions then if we had not existed.

Miles Brundage: Yeah, so I don’t have a very concrete preferred scenario. And partly because I think as I was saying earlier, it depends what actions others take. I mean maybe you could say, okay, here’s a globally optimal scenario. But I think it’s more useful pragmatically for us to have a sort of self-centered view of: what does this mean for us? What actions do we take given this global picture? And I think from that perspective it’s more important to be robust than optimal. And so I think I’m less interested in working backwards from a perfect solution and more interested in what are the steps along the way that we could take to marginally move things in the direction of greater trust between actors, greater awareness of relevant safety techniques, et cetera. So on the margin what can be done at every point in time.

Amanda Askell: Yeah, I think I would agree with that where there are just so many levers to make sure that things go well here. So there’s both private companies, there are governments, there’s already existing institutions like law, and working to improve those to make sure that they’re like at each step of the way responding to technological improvements is really important. I do think that in some ways, I felt like the question was something like, what is like a really good way of this going or something like that. And I do want to-

Robert Wiblin: Tell me a beautiful story. I think it’s going to well.

Amanda Askell: Tell you a beautiful story. In some ways I just think that it is easy. I don’t want people to be overly pessimistic. I’m very optimistic and excited about technological development. And I think if it’s done well, it can be extremely beneficial. We have a lot of huge problems in the world right now that I think that advanced technologies could really help with. I am actually optimistic about it really improving things like reducing global poverty, improving health outcomes. I would love to see increasing amounts of this being used to cure diseases that we couldn’t cure before, et cetera.

Amanda Askell: I think the kind of beautiful story is one where it’s like, we’ll take a lot of the problems that we currently have that we could just solve if we could put more time into them, for example, and then have a system that can in fact just like process more information and can in fact put more time into it and can like take in a bunch of medical images and can give you a really accurate diagnosis. And I’m like, that’s an exciting world to me. I think a world where you have pretty robustly safe systems doing these things could in fact be like a really wonderful world in which we really solve many outstanding problems. So maybe I’m giving a really optimistic view of the future of, you know, we have no poverty and we’re all very healthy and happy.

How to avoid making things worse

Robert Wiblin: At 80,000 hours we think of AI policy and strategy as one of the areas where it’s unusually easy perhaps to cause harm, to make things worse by saying or doing the wrong thing. What are some of the potential ways you think OpenAI could make things worse and how do you try to anticipate that and avoid that happening?

Jack Clark: We can make people race on capabilities. An inherent challenge that we have, but I think most AI people have, is that we get to create futures ahead of other organizations and other actors like governments, and we get to see those futures and see the upsides and downsides and then what we communicate about that will have huge effects on what these people choose to do. And it’s a domain where you get very little information if what you did was really bad, ’cause really bad in this world usually corresponds to a classified budget massively growing in size. That’s necessarily something that is hard to get evidence about from where I’m sitting. I think that’s one.

Jack Clark: I think the other is that you could misjudge the types of coordination actions that the community actually wants to do in practice, and you could try and contrive a load of things to do of coordination which everyone sort of does up until the point when you get to hard decisions, and then those coordination mechanisms might have some flaw which would have not been clear to most people in the community. The moment it becomes clear, everyone defaults to less communication with each other and less coordination. I think those are two of the things. But I’m curious to hear what you think, Amanda?

Amanda Askell: Yeah. I think the first point that you make was one that I was like quite focused on or interested in where, business as usual, in a lot of domains, you have lots of competing actors and if you’re just like an additional actor in that space, a worry that you might have is just that you increase the chance that people are going to try to develop capabilities faster because the idea is that the goal is to sort of outcompete other people who are within the same domain or producing similar systems. I think that one way that we can try and mitigate that a little bit is by spreading a view of this entire discipline as one where we often have shared goals with other organizations. I don’t think the goal is something like “have your organization be the first to do some specific task”, rather, there’s this shared goal of creating really good advanced technologies.

Amanda Askell: That means that you shouldn’t necessarily see any other actors as competitors but rather like similarly working towards that goal. And so yeah, it’s difficult, where I’m like, the potential for increasing the chance of racing is something that does worry me and I like hope that we can help mitigate that by kind of really focusing on that kind of mindset. On the second point, I do think that another way that you can end up doing more harm without expecting to is failing to design these mechanisms that actually work when push comes to shove or just failing to anticipate scenarios where actually these mechanisms break down. And there are lots of scenarios that I can think of where the things that we’re building, you know, might just end up kind of not working. And if you haven’t thought of sufficiently many scenarios, that’s like one example.

Amanda Askell: I think another key thing in policy that it’s important to think about is basically like how good the mechanisms that you are recommending or the actions you are recommending are across a wide variety of possible outcomes. So it’s really easy for people to think through something like, well, what is the perfect outcome? And then to kind of work backwards from that and to build mechanisms that are like what they see is like the clearest path to the perfect outcome, when in reality, because there’s so much uncertainty and so many ways that things can go, you instead have to think about like all of these like distributions of outcomes and things that do pretty well in most of those worlds rather than like what would get us to the perfect world if things happen as we think that they will happen. And that’s a way in which you can end up recommending these kind of brittle mechanisms that do fail. I think that’s like another thing.

Amanda Askell: Then a final way that any organization can end up harming others or doing unintentional harm is just by making a mistake in a judgment call. You know, we are thinking about things like publication norms with the recent language work and it’s just very easy there isn’t like a huge template for what you do here and it’s really easy to just unintentionally make mistakes or just make the slightly wrong call and more on what you do or what you release. There’s not necessarily a lot you can do about that if you’re trying to be as responsible as possible, but there is like a possibility here that people have to be aware of.

Miles Brundage: Just one follow up point on there not being a blueprint. I think that’s like a super important and underappreciated point that I think a lot of people say, “Okay, well why don’t you just do this thing, like they did in nuclear policy or whatever.” And I think there’s a ton of value of using analogies for inspiration and to make you realize a variable that you hadn’t considered or to get you to think more creatively about what’s possible. But I think that you quickly run into limits in terms of what you can get from these analogies in any particular case.

Miles Brundage: I think one way in which we probably erred in the malicious use of AI report is thinking that we had more to learn from other fields. It’s not to say that there isn’t something to learn but you quickly reach diminishing returns and have to make a context-based decision about in this particular domain what are the misuse risks and what are the relative capabilities of different actors and so forth. So it’s not clear that an influenza virus case study from like five years ago tells us that much.

Robert Wiblin: I imagine that quite a number of listeners might end up going into this field. Do you want to comment on maybe how cautious it’s appropriate for people to be as you seem to be pointing out that people working in AI strategy and policy are generally quite cautious about what they publish for example, especially I guess at this early stage ’cause you don’t want to frame things incorrectly. Do you just want to comment on whether other young people entering this area should just generally be very cautious and be trying to always get other people to check what they’re doing.

Jack Clark: I think that one thing that people in this field don’t do enough of is calibrate their model for what they shouldn’t say against some of the constituents that they really care about. It’s quite common that I see people presume a level of attention, competence and awareness in government, which I know does not exist for some governments and it conditions the model that people have. My experience has been going and talking to people. Like with this language model, we had a lot of questions of what the government reaction would be. So we just talked to a bunch of people and connected to a bunch of governments and asked them for opinions about it in a way that did not leak information about the precise techniques of the model, but let them experience it.

Jack Clark: And I think that that gave us a better calibration as to where we thought the threat was. Now obviously we could have got this horribly wrong. I may get out of this podcast to some very exciting email that makes me turn a pale shade of white, hopefully not. And I think it’s that being cautious is sensible when you aren’t situated in the world, but everyone can get situated because everyone knows someone who knows someone who’s involved in a government or an intelligence agency or something like that and can kind of ask some questions.

Amanda Askell: Yes. I think in one sense being cautious is like checking with other people that the work that you’re doing is good and useful and in that sense I think it’s good to be cautious. In fact, most people just should be cautious. I think another sense of being cautious here that I want to kind of highlight is something like, “Well, should I just like not go into this field because look at all of the potential for harm that I can do.” You know what if I could go into this other field that is much more guaranteed that I will do some good, even if I won’t do as much good. And I think that if you think that you’re going to have valuable contributions and you’re going to be able to identify if and when anything that you’re doing is harmful, then it’s better to just go with the kind of expected impact that you have, even though that can be kind of psychologically difficult.

Amanda Askell: It’s really easy for people to just do this kind of like harm avoidance strategy with what they do. And it’s realizing that that strategy can be either ineffective, so you just have like a lower impact than you wanted to have. And in some cases like, it can be negative. So in many cases, saying something, even though you know, it has some potential for harm and like ideally a large potential for good, can be better than saying nothing. And that saying nothing can often actually be itself harmful, and I think people can sort of forget that when they’re thinking about how cautious to be both in terms of like, what they’re saying and the work that they’re producing and also just with their careers generally and what they want to do. Be careful. But like yeah, I think that would be my general advice.

Miles Brundage: One general comment on risk aversion and putting out work that’s not finished. I think that some people in the EA community are a bit overly risk averse when it comes to sort of sharing their views on these topics. And both in terms of talking about AI timelines and scenarios and stuff like that. I think often people overestimate, how controversial or important their view is, that’s one point. And the other thing is in terms of risk aversion in the context of publishing. There’s a lot of people moving into the AI policy area and not all of them have the same goals and quality standards as we do.

Miles Brundage: That doesn’t mean that we should lower ourselves to their level, but I think that raises questions about what the optimal explore-exploit ratio is in the AI community or in the portion of the AI policy community that’s concerned with the long term, because we don’t want to never publish anything and then have the conversation totally be dominated by people with low intellectual standards. But nor do we want to put out bad work. So I don’t know exactly how to address that.

Publication norms

Robert Wiblin: Especially people who are concerned about the long term of AI are becoming fairly cautious about what they publish. And I suppose this is true both in the policy and strategy crowd and I guess also increasingly potentially among the technical crowd. People are cautious about what code they’re publishing because they’re worried about how it might be misused. And I guess that we’re seeing that today, that OpenAI doesn’t want to publish the full code for this algorithm for producing seemingly realistic text because you’re not. But I guess you want to think about it more before you put it out ’cause you can’t withdraw it. Amanda, do you just want to comment on I guess like the trade off between, I guess making things too secret versus just like everyone running their mouth and being too dangerous?

Amanda Askell: Yeah. I think this is an area where it’s really easy to see the potential harms from some publication or something that you’re working on, or just like some thoughts that you have. And to think that the thing then to do is just to kind of close up and say nothing and that, that’s going to be the best way to go about things. I do think that there’s a danger here of making it seem like this is a field or a domain that’s kind of shrouded in secrecy or that’s like lots of things are happening behind closed doors that we don’t know about when that may not actually be the case.

Amanda Askell: Lots of the problems that we’re dealing with and that we’re working on. They’re just like out in the open. People are talking about them and it’s completely fine for that to be the case and in many ways I caution against … basically the problem that you have is trying to find the balance between these two things, making sure that you’re doing kind of responsible releasing information that you think could be used maliciously, but I think also not saying absolutely nothing in a way that can also be quite harmful.

Amanda Askell: It’s really important that this is a field, I think that is credible and trustworthy and honest and where people have some faith that you are making a kind of genuine effort to evaluate the kinds of things that you are really saying. And they sort of trust the underlying mechanisms to be ones where you understand the value of openness, and are weighing that against these other considerations rather than thinking that you’re an actor that’s just, or a person who’s just unwilling to say anything or unwilling to say anything honest.

Amanda Askell: That’s extremely harmful and so yes, striking that balance is really hard, but it’s also really important. I think is really important relative to doing something like completely shutting down and being overly cautious and saying nothing.

Jack Clark: When we think about publication norms, one of the things I think about, and I don’t know if this is that widely known about me, but I spent many years as a professional investigative journalist. When you do that type of journalism, you have this philosophy of find the thing that no one talks about publicly, but everyone talks about privately and publish some kind of story that relates to that thing. That’s just the way that you do the job. I think that it’s weirdly similar in policy for certain things.

Jack Clark: Something that I’ve been hearing among policymakers for a long time now is, in public, tech companies say everything’s great, lah, lah, lah, lah, lah. Aren’t we having a good time? And then in private they’ll say to regulators, we really need actual legislation around say facial recognition, because we are selling all of this stuff and we know that it’s being used to do things that make our employees uncomfortable.

Jack Clark: It’s difficult for us to restrict ourselves. We would like there to be a conversation about norms and potential regulations that wasn’t just us having it. And I think this is just a general case in AI where you talk to AI researchers privately and if they work with very, very large models, they’ll say, “Yes from time to time I find this stuff a little perturbing, or from time to time I think about how good this stuff is getting, and I wonder about the implications.”

Jack Clark: In public there’s been a lot of challenges associated with talking about these downsides because you don’t want to cause another AI winter. You don’t want to be seen as being Chicken Little, and saying “the sky is falling” when it may not be and you don’t want to, if you’re a corporate researcher, run afoul of your company’s communication sort of policies, which will typically encourage you or force you to avoid talking about downsides.

Jack Clark: With our l work here on language and on publication norms in general. The idea is to go and get that conversation that we know is happening privately and force it into a public domain by doing something that invites those people to debate it and invites actually frankly everyone who has different opinions here to now have a case study that they can talk about, and I’m frankly, I would be excited if all that came out of this was the whole community had a discussion and said we were ever so slightly too extreme in this instance.

Jack Clark: That would actually be a good thing because it would have helped calibrate the whole community around what it really thinks and would give us loads of evidence. Now my expectation is what might happen is people will talk about what we’ve done and then when they do their own releases, will maybe be able to do their own forms of release experiment while pointing to us as the person that sort of went first and maybe we de-risked that for them.

Amanda Askell: I think also improving the conversation around publication norms so that it’s no longer one where it’s like either you’re completely in favor of everything being open source or you’re completely closed and you don’t see any of the benefits of openness. I think showing that we are as an organization sensitive to all of the upsides of openness in research, it pushes forward the kind of scientific boundaries. It gets more people into the field, it allows people to rerun your experiments. Like we’re really sensitive to the fact that there are lots of extreme benefits to being really open with your research.

Amanda Askell: And then you just have to counter that against your potential misuse or unintended side effects or bad social impacts of what you’re doing, and ideally moving the conversation to one where it’s not like you either have to be completely for or against complete openness, but to one where it’s just like, yeah, there are just like pros and cons, there’re considerations for and against, and it’s fine to have a position that’s somewhere in the middle, and to favor something that’s responsible publication and finding out exactly what the sweet spot is there is like quite difficult, but I think important.

Robert Wiblin: My impression from outside is that AI is a technical field and is extremely in favor of kind of publishing results and sharing a code so that other people can replicate what you do. Is this one of the first cases where people have published a result and not release the code that would allow people to replicate it? I guess this sounds like perhaps you are trying to set an example that will encourage other people to think more carefully about this in future?

Miles Brundage: This is not the first case in which people haven’t published all of their results and model and code and so forth. What’s different is that the decision was A) Made explicitly on the basis of these, misuse considerations and B) It was communicated in a transparent way that was aimed at fostering debate. So it wasn’t that no one has ever worried about the social consequences of publishing before, but we took the additional step of trying to establish it as an explicit norm at AI Labs.

Jack Clark: Yeah. The way I think of it is that we built a lawn mower. We know that some percentage of the time this lawn mower drops a bit of oil on the lawn which you don’t want to happen. Now, most lawn mower manufacturers would not lead their press strategy with: we’ve made a slightly leaky mower. That’s sort of what we did here, and I think the idea is t