Transcript

Robert Wiblin: Hi listeners, this is the 80,000 Hours Podcast, where each week we have an unusually in-depth conversation about one of the world’s most pressing problems and how you can use your career to solve it. I’m Rob Wiblin, Director of Research at 80,000 Hours.

Last year for episode 44, I interviewed Paul Christiano for almost four hours about his views on a wide range of topics, including how he thinks AI will affect life in the 21st century and how he thinks we can increase the chances that those impacts are positive.

That episode was very popular, and Paul is a highly creative and wide-ranging thinker, who’s always writing new blog posts about topics that range from the entirely sensible to the especially strange, so I thought it would be fun to get him back on to talk about what he’s been thinking about lately.

On the sensible side, we talk about how divesting from harmful companies might be a more effective way to improve the world than most people have previously thought.

On the stranger side, we think about whether there are any messages we could leave future civilizations that could help them out should humanity go extinct, but intelligent life then to re-evolve on Earth at some point in the far in the future.

We also talk about some speculative papers suggesting that taking creatine supplements might make people a bit sharper, while being in a stuffy carbon-dioxide-filled room might make people temporarily stupider.

Honestly we just have a lot of fun chatting about some things we personally find interesting.

On the more practically useful side of things though, I get his reaction to my interview with Pushmeet Kohli at DeepMind for episode 58 a few months back.

I should warn people that in retrospect this episode is a bit heavy on jargon, and might be harder to follow for someone who is new to the show. That’s going to get harder to avoid over time, as we want to dig deeper into topics that we’ve already introduced in previous episodes, but we went a bit further than I’d ideally like this time.

Folks might get more out of it if they first listen to the previous interview with Paul back in episode 44 – that’s Dr. Paul Christiano on how we’ll hand the future off to AI, & solving the alignment problem. But a majority of this episode should still make sense even if you haven’t listened to that one.

This episode also has the first outtake we’ve made. I encouraged Paul to try recording a section in the interview on a subfield of philosophy called decision theory, and some heterodox ideas that have come out of it, like superrationality and acausal cooperation.

I planned to spend half an hour on that, but that was really very silly of me. We’d need a full hour just to clearly outline the problems of decision theory and the proposed solutions, if we could do it at all. And explaining and justifying the possible implications of various unorthodox solutions that are out there could go on for another hour or two, and it is hard to do it all without a whiteboard.

So, by the end of that section, we thought it was more or less a trainwreck, though potentially quite a funny trainwreck for the right listener. We do come across as a touch insane, which I’m fairly sure we’re not.

So if you’d like to be a bit confused and hear what it sounds like for a technical interview to not really work out, you can find a link to the MP3 for that section in the show notes. If you happen to have the same level of understanding of decision theory that I did going into the conversation, you might even learn something. But I can’t especially recommend listening to it as a good use of time.

Instead, we’ll come back and give decision theory a proper treatment in some other episode in the future.

Alright, with all of that out of the way, here’s Paul Christiano.

Robert Wiblin: My guest today is Dr. Paul Christiano. Back by popular demand, making his second appearance on The 80,000 Hours Podcast. Paul completed a PhD in Theoretical Computer Science at UC Berkeley and is now a technical researcher at OpenAI, working on aligning artificial intelligence with human values. He blogs about that work at ai-alignment.com and about a wide range of other interesting topics at sideways-view.com. On top of that, Paul is not only a scholar, but also always and everywhere a gentleman. Thanks for coming on the podcast, Paul.

Paul Christiano: Thanks for having me back.

Robert Wiblin: I hope to talk about some of the interesting things you’ve been blogging about lately, as well as what’s new in AI reliability and robustness research. First, what are you doing at the moment and why do you think it’s important work?

Paul Christiano: I guess I’m spending most of my time working on technical AI safety at OpenAI. I think the basic story is similar to a year ago, that is, building AI systems that don’t do what we want them to do, that push the long-term future in a direction that we don’t like, seems like one of the main ways that we can mess up our long-term future. That still seems basically right. I maybe moved a little bit more towards that being a smaller fraction of the total problem, but it’s still a big chunk. It seems like this is a really natural way for me to work on it directly, so I think I’m just going to keep hacking away at that. That’s the high level. I think we’re going to get into a lot of the details, probably in some questions.

What’s new in AI research?

Robert Wiblin: We started recording the first episode last year, almost exactly a year ago, actually. When it comes to AI safety research, and I guess your general predictions about how advances in AI are going to play out, have your opinions shifted at all? If so, how?

Paul Christiano: I think the last year has felt a lot like, there’s no big surprises and things are settling down. Maybe this has been part of a broader trend where my view is– five years ago, my view was bouncing around a ton every year, three years ago, it was bouncing around a little bit, over the last year, has bounced around even less, so I think my views haven’t shifted a huge amount. I think we haven’t had either big downward or upward surprises in terms of overall AI progress. That is, I think we’ve seen things that are probably consistent with both concerns about AI being developed very quickly, but also like the possibility of it taking a very, very long time.

In terms of our approach to AI alignment, again, I think my understanding of what there is to be done has solidified a little bit. It’s moved more– continues to move from some broad ideas of what should be done, to here at the particular groups implementing things. That’s continuing to happen but there haven’t been big surprises.

Robert Wiblin: Yes, last time we spoke about a bunch of different methods, including AI safety via debate. I mean different AIs debate one another and then we’re in a position to– Well, hopefully, we’re in a position to adjudicate which one is right. Is there any progress on that approach or any of the other ones that we spoke about?

Paul Christiano: Yes. I work on a sub-team at OpenAI that probably works on that idea, the safety via debate, as well as amplification. I would say that over the last year, a lot of the work has been on some combination of building up capacity and infrastructure to make those things happen, such as scaling up language models and integrating with good large language models, so it’s things that understand some of the reasoning humans do when they talk or when they answer questions. Trying to get that to the point where we can actually start to see the phenomena that we’re interested in.

I think there’s probably, generally, been some convergence in terms of how different– at least different parts within OpenAI, but I think also across organizations have been thinking about possible approaches. E.g. I guess within OpenAI, for people thinking about this really long-term problem, we mostly think about amplification and debates.

There’s this on-paper argument that those two techniques ought to be very similar. I think they maybe suggest different emphases on which experiments you run in the short term. I think as we’ve been trying things, both the people who started more on the amplification side are running experiments that look more similar to what you might suspect from the debate perspective, and also vice versa, so I think there’s less and less big disagreements about that.

I think similarly, independent thinking at, most people think about long-term safety at DeepMind, I guess I feel like there’s less gap between us now. Maybe that’s good because it’s easier to communicate and be on the same page and more shared understanding of what we’re doing. I think that compared to a year ago, things feel like this is just related to things settling down and maturing people are– It’s still a long way from being like we’re almost any normal field of academic inquiries, it’s nowhere close to that.

Robert Wiblin: Don’t people disagree more or just have very different perspectives and–?

Paul Christiano: Yes, they disagree more, they less have of a common sense. They have less of a mature method of inquiry, which everyone expects to make progress. It’s still a long way away from more mature areas, but it is moving in that direction.

Robert Wiblin: This is maybe a bit random, but do you feel like academic fields are often held back by the fact that they codify particular methods, and particular kinds of evidence, and particular worldviews that blinkers them to other options? Maybe it’s an advantage to have this sort of research to be a bit more freewheeling and diverse.

Paul Christiano: I think that’s an interesting question. I guess I would normally think of it as an academic field that’s characterized by this set of tools and its understanding of what constitutes progress. If you think of the field as characterized by problems, then it makes sense to talk about the field being blinkered in this way or having value left on the table. If you think about the field as characterized by this set of tools, then that’s the thing they’re bringing to the table.

I would say from that perspective, it’s both bad that you can’t use some existing set of tools. That’s a bummer and it’s not clear. I think there’s a lot of debate with people about how much we should ultimately expect the solutions to look like using existing set of tools. That’s bad. It’s also sort of a little bit bad to not have yet like mature tools that are specific to this kind of inquiry. I think that’s more how I think of it.

I think many academic fields are not good at– If you think of them as, they’re answering this set of questions and they’re the only people answering the set of questions maybe they’re not really set up that optimally to do that. I think I’ve shifted to not mostly thinking of academic fields that way.

Robert Wiblin: I guess economics has problems with its method, but then those are covered by other fields that use different methods. That would be the hope.

Paul Christiano: That’s the hope.

Robert Wiblin: [laughs]

Paul Christiano: I think economics is an interesting case since there are a bunch of problems. I think that most fields do have this like their bunch of problems sort of fit in economics and the set of tools economists use. If there’s a problem that fits in nominally, the economics meant– under their purview, which is not a good fit for their tools, then you’re in sort of a weird place. I think economics also may be, because of there being this broad set of problems that fit in their domain like– I think it’s not- this distinction is not obvious. There’s some–

Robert Wiblin: It’s an imperial field kind of, notoriously. The ones that go and colonize it are like every other or questions that it can touch on. Then sometimes, I guess, yes, the method might be well suited to those questions that it wants to tackle.

Paul Christiano: Yes, although in some sense, if you view it as like a field that has a set of tools that it’s using, it’s very reasonable to be going out and finding other problems that are actually– If you’re actually correct about them being amenable to those tools. I think there’s also a thing on the reverse, where like, you don’t want to be really that staking claim on these questions. You should be willing to say, “Look, these are questions that sort of we’ve traditionally answered but there are other people.” Sometimes, those can be answered in other ways.

Robert Wiblin: Yes. It’s an interesting framing of problems with academic fields that its kind of not so much that the field is bad, but maybe that it’s tackling the wrong problem or, it’s tackling problems that are mismatched to the methods.

Paul Christiano: I think about this a lot maybe because in computer science, you more clearly have problems which like, it’s not so much staked out. It’s not like, “Here’s a problem and this problem fits in a domain.” It’s more like, there are several different approaches, like there are people who come in with a statistics training, and there are people who will come in as theorists, and there are people who come in as like various flavors of practitioners or experimentalists, and they sort of– You can see the sub-fields have different ways they would attack these problems. It’s more like you understand, like, this sub-field’s going to attack this problem in this way, and it’s a reasonable division of labor.

Robert Wiblin: Let’s back up. You talked about running experiments, what kind of experiments are they concretely?

Paul Christiano: Yes. I think last time we talked, we discussed three kinds of big uncertainties or room for making progress. One of them, which isn’t super relevant to experiments is figuring out conceptual questions about how are we going to approach, like find some scalable approach to alignment? The other two difficulties, both were very amenable to different kinds of experiments.

One of our experiments involving humans, where you start to understand something about the character of human reasoning, you understand sort of– We have some hopes about human reasoning, we hope that, in some sense, given enough time or given enough resources, humans are universal and could answer some very broad set of questions if they just had enough time, enough room to reflect. It’s like one class of experiments that’s sort of getting at that understanding. In what sense is that true? In what sense is that false? That’s a family of experiments I’m very excited about.

OpenAI has recently started hiring people. We just hired two people who will be scaling up those experiments here. Ought has been focused on those experiments and it’s starting to really scale up their work. That’s one family of experiments. There’s a second difficulty or third difficulty, which is, understanding how both theoretical ideas about alignment and also, these facts about how human reasoning work. How do those all tie together with machine learning?

Ultimately, at the end of the day, we want to use these ideas to produce objectives that can be used to train ML systems. That involves actually engaging with a bunch of detail about how ML systems work. Some of the experiments are directly testing those details, so saying, “Can you use this kind of objective? Can machine learning systems learn this kind of pattern or this kind of behavior?” Some of them are just experiments that are like– maybe more in the family of, “We expect them to work if you just iterate a little bit”. So you sort of expect there is going to be some way that we can apply language models to this kind of task, but we need to think a little bit about how to do that and take a few swings at it.

Robert Wiblin: I saw that OpenAI was trying to hire social scientists and kind of making the case that social scientists should get more interested in AI alignment research. Is this the kind of work that they’re doing, running these experiments or designing them?

Paul Christiano: Yes, that’s right. I think we hired– we’re aiming initially to hire one person in that role. I think we’ve now made that hire and they’re starting on Monday. They’ll be doing experiments, trying to understand like, if we want to train to use human reasoning in some sense as a ground truth or gold standard, like how do we think about that? How do we think about– In what sense could you scale up human reasoning to answer hard questions? In what sense are humans a good judge of correctness or incentivize honest behavior between two debaters?

Some of that is like, what are empirically the conditions under which humans are able to do certain kinds of tasks? Some of them are more conceptual issues, where like humans are just the way you get traction on that because humans are the only systems we have access to that are very good at this kind of flexible, rich, broad reasoning.

Robert Wiblin: I mentioned on Twitter and Facebook that I was going to be interviewing you again and a listener wrote in with a question. They had heard, I think, that you thought there’s a decent probability that things would work out okay or that the universe would still have quite a lot of value, even if we didn’t have a solid technical solution to AI alignment and AI took over and was very influential. What’s the reasoning there, if that’s a correct understanding?

Paul Christiano: I think there’s a bunch of ways you could imagine ending up with AI systems that do what we want them to do. One approach which is, as a theorist, the one that’s most appealing to me, is to have some really good understanding on paper. Like, “Here’s how you train an AI to do what you want,” and we just nail the problem in the abstract before we’ve even necessarily built a really powerful AI system.

This is the optimistic case where we’ve really solved alignment, it’s really nailed. There’s maybe a second category where you’re like, or this broad spectrum where you’re like, “We don’t really have a great understanding on paper of a fully general way to do this, but as we actually get experienced with these systems, we get to try a bunch of stuff.” We get to see what works, we got to– If we’re concerned about a system failing, we can try and run it in a bunch of exotic cases and just try and throw stuff at it and see. Maybe if its like- we stress-tested enough on something that actually works.

Maybe we can’t really understand a principled way to extract exactly what we value, but we can do well enough at constructing proxies. There’s this giant class of cases where you like, don’t really have an on-paper understanding but you can still wing it. I think that’s probably not what the asker was asking about. There’s a further case where you try and do that and you do really poorly, and as you’re doing it, you’re like, “Man, it turns out these systems do just fail in increasingly catastrophic ways. Drawing the line out, we think that could be really bad.”

I think for now, even in that worst case that you don’t have an on-paper understanding, you can’t really wing it very well. I still think there’s certainly more than a third of a chance that everything is just good, and that would have to come through, like people probably understanding that there’s a problem, having a reasonable consensus about, it’s a serious problem, being willing to make some sacrifices in terms of how they deploy AI. I think that at least on paper, many people would be willing to say, like, “If really rolling AI out everywhere would destroy everything we value, then we are happy to be more cautious about how we do that, or roll it out in a more narrow range of cases, or take development more slowly.”

Robert Wiblin: People showing restraint for long enough to kind of patch over the problems well enough to make things okay.

Paul Christiano: Yes. Somehow, there’s a spectrum of how well– some substitution between how much restraint you show and how much you are able to either ultimately end up with a clean understanding or wing it. One-third is my number if it turns out that winging it doesn’t work at all, like we’re totally sunk, such that you have to show very large amounts of restraint. People have to actually just be like, “We’re going to wait until they are so much smarter.” We’ve either used AI to become much smarter, or better able to coordinate, better able to resolve these problems, or something like that. You have to wait until that’s happened before you’re actually able to deploy AI in general.

I think that’s still reasonably likely. I think that’s a point where lots of people disagree, I think, on both ends. A lot of people are much more optimistic, a lot of people have the perspective that’s like, “Look, people aren’t going to walk into razorblades and have all the resources in the world get siphoned away or like deploy AI in a case where catastrophic failure would cause everyone to die.” Some people have the intuition like, “That’s just not going to happen and we’re sufficiently well coordinated to avoid that.”

I’m not really super on the same page there. I think if it was a really hard coordination problem, I don’t know, it looks like we could certainly fail. On the other hand, some people are like, “Man, we can’t coordinate on anything.” Like if there was a button you could just push to destroy things or someone with $1 billion could push to really mess things up, things would definitely get really messed up. I just don’t really know.

In part, this is just me being ignorant and in part, it’s me being skeptical of both of the extreme perspectives, like when people advocating them are also about as ignorant as I am of the facts on the ground. I certainly think there are people who have more relevant knowledge and who could have much better calibrate estimates if they understood the technical issues better than I do. I’m kind of at like some form of pessimism. If things were really, really bad, if we really, really don’t have an understanding of alignment, then I feel pessimistic, but not radically pessimistic.

Robert Wiblin: Yes. It seems like a challenge there is that you’re going to have a range of people who can have a range of confidence about how safe the technology is. Then you have this problem that whoever thinks it’s the safest, is probably wrong about that because most people disagree and they’re the most likely to deploy it prematurely.

Paul Christiano: Yes. I think it depends a lot on what kind of signals you get about the failures you’re going to have, so like how much you have a– Yes, we can talk about various kinds of near misses that you could have. I think the more clear of those are, the easier it is for there to be enough agreement. That’s one thing.

A second thing is we’re concerned or I’m concerned about a particular kind of failure that really disrupts the long-term trajectory of civilization. You can be in a world where that’s the easiest kind of failure. That’s sort of getting things to work in practice, is much easier than getting them to work in a way that preserves our intention over the very long term.

You could also imagine worlds though where a system which is going to fail over the very long term, is also reasonably likely to be a real pain in the ass to deal with in the short term. In which case, again, it will be more obvious to people. Then, I think a big thing is just– we do have techniques, especially if we’re in a world where AI progress is very much driven by large amounts of giant competing clusters.

In those worlds, it’s not really like any person can press this button. It’s one, there’s a small number of actors. The people who are willing to spend, say, tens of billions of dollars and two, they are those actors who have some room to sit down and reach agreements or like- which could be formalized to varying degrees, but it won’t be like people sitting separately in boxes making these calls.

At worst, it’ll be like in that world. At worst, still be like a small number of actors who can talk amongst themselves. At best, it’ll be like a small number of actors who agree like, here are norms. We’re going to actually have some kind of monitoring and enforcement to ensure that even if someone disagreed with the consensus, they wouldn’t be able to mess things up.

Robert Wiblin: Do you think you or OpenAI have made any interesting mistakes in your work on AI alignment over the years?

Paul Christiano: I definitely think I have made a lot of mistakes, which I’m more in a position to talk about.

Robert Wiblin: [laughs] Go for it. [laughs]

Paul Christiano: I guess there’s, yes, one category or there’s been a lot of years I’ve been thinking about alignment, so that’s a lot of time to rack up mistakes made. Many of which aren’t as topical though. It was a class of intellectual mistakes, I feel like, I made like four years ago or say, five years ago, when I was much earlier in thinking about alignment, which we could try and get into.

I guess my overall picture of alignment has changed a ton since six years ago. I would say that’s basically because six years ago, I reasoned incorrectly about lots of things. It’s a complicated area. I had a bunch of conclusions I reached. Lots of the conclusions were wrong. That was a mistake. Maybe an example of a salient update is I used to think of needing to hit this, like you really need to have an AI system that understands exactly what humans want over the very long term.

I think my perspective shifted more to something maybe more like a commonsensical perspective of, if you have a system which sort of respects short-term human preferences well enough, then you can retain this human ability to course correct down the line. You don’t need to appreciate the full complexity of what humans want, you mostly just need to have a sufficiently good understanding of what we mean by this course correction, or remaining in control, or remaining informed about the situation.

I think it’s a little bit hard to describe that update concisely, but it does really change how you conceptualize the problem or what kinds of solutions are possible. That’s an example of a long-ago, or that there’s a whole bunch of those that have been racked up over many years. Certainly I also made a ton of low-level tactical mistakes about what to work on. Maybe a more recent mistake that is salient is like, I don’t feel like I’ve done very well in communication about my overall perception of the problem. That’s not just expressing that view to others but also really engaging with reasons that maybe more normal perspectives are skeptical of it.

I’ve been trying to work a little bit more and I’m currently trying to better pin down, here is a reasonably complete, reasonably up-to-date statement of my understanding of the problem, and how I think we should attack the problem. Really iterating on that to get to the point where it makes sense to people who haven’t spent years thinking in this very weird style that’s not well-vetted. I’m pretty excited about that. I think that’s probably something I should’ve been doing much more over the last two years.

Robert Wiblin: Have you seen Ben Garfinkel’s recent talk and blog post about how confident should we be about all of this AI stuff?

Paul Christiano: I think I probably have seen a Google Doc or something. Yeah.

Robert Wiblin: Do you have any views on it, if you can remember it? [laughs]

Paul Christiano: I think there are lots of particular claims about AI that I think were never that well-grounded but people were kind of confident in, which I remain pretty skeptical about. I don’t remember exactly what he touches on in that post, but claims about takeoff, I think people have, of a really or very, very rapid AI progress and particularly claims about the structure of that transition. I think people would have pretty strong, pretty unconventional views. I guess to me it feels like I’m just taking more of an agnostic opinion, but I think to people in the safety community, it feels more like I’m taking this outlier position. That’s definitely a place where I agree with Ben’s skepticism.

I think in terms of the overall, how much is there an alignment problem? I think it’s right to have a lot of uncertainty thinking about it and to understand that the kind of reasoning doing is pretty likely to go wrong. I think you have to have that in mind. That said, I think it is clear, there’s something there. I don’t know if he’s really disagreeing with that.

Robert Wiblin: I think his conclusion is that he– it’s well worth quite a lot of people working on this stuff, but a lot of the arguments that people have made for that are not as solid as maybe we thought when you really inspect all the premises and think, yes.

Paul Christiano: Yes. I definitely think it’s the case that people have made a lot of kind of crude arguments and they put too much stock in those arguments.

Robert Wiblin: One point that he made which stood out to me was, there’s been technologies that have dramatically changed the world in the past, electricity, for example, but it’s not clear that working on electricity in the 19th century would have given you a lot of leverage to change how the future went. It seems like even though it was very important, it was just on a particular track and there was only so much that even a group of people could have steered how electricity was used in the future. It’s like possible that AI will be similar. It’d be very important, but also that you don’t get a ton of leverage by working on it.

Paul Christiano: Yes. I think it’s an interesting question. Maybe a few random comments are like one, it does seem like you can accelerate the adoption. If you had an understanding early enough– and I’m not exactly sure how early you would have acted to get much leverage. If you understand the problem early enough, you could really change the timeline for adoption. You can really imagine the small groups having pushed adoption forward by six months or something to the extent of like–

There are a lot of engineering problems and conceptual difficulties that were distinctive to this weird small thing, which in fact did play a big– sometimes there’s part of the overall machine and trajectory of civilization, but it really was well-leveraged and progress. faster progress in that area seems like it would have had unusually high dividends for faster overall technological progress.

Maybe going along with that. I think it is also reasonable to think that if a small group had positioned themselves to understand that technology well and be pushing it and making investments in it, they probably could have had– They couldn’t like have easily directly steered from a great distance, but they could’ve ended up in a future situation where they’ve made a bunch of money or in a position to understand well an important technology, which not that many people understand well as it gets rolled out.

I think that’s again, a little bit different from the kind of thing he’s expressing skepticism about. It seems like an important part of the calculus if one is thinking about trying to have leverage by working on AI, thinking about AI.

I do think the alignment problem is distinctive from anything you could have said in the context of electricity. I’m not mostly trying to do the like , “Make investments in AI sets you in a better position to have influence later, or make a bunch of money”. I’m mostly in the like, “I think we can identify an unusually crisp issue,” which seems unusually important and can just hack away at that. It seems like it should have a lot of question marks around it, but I don’t really know if historical cases which use a similar heuristic.

Sometimes people cite them and I’d tried to look into a few of them, but I don’t know historical cases where you would have made a similarly reasonable argument, and then ended up feeling really disappointed.

Robert Wiblin: Do you have any thoughts on what possible or existing AI alignment work might yield the most value for a traditional person or million dollars that it receives at the moment?

Paul Christiano: Yes. I mentioned earlier these three categories of difficulties. I think different resources will be useful in different categories, and each of them is going to be best for some resources, like some people, or some kinds of institutional will. Briefly going over those again, one was conceptual work on how is this all going to fit together if we imagine what kinds of approaches potentially scale to very, very powerful AI systems, and what are the difficulties in that limit as systems become very powerful? I’m pretty excited for anyone who has reasonable aptitude in that area to try working on it.

It’s been a reasonable fraction of my time over the last year. Over my entire career, it’s been a larger fraction of my attention, and it’s something that I’m starting to think about scaling up again. This is thinking, doing theoretical work directly at the alignment problem. Asking on paper, ”What are the possible approaches to this problem? How do we think that this will play out moving towards having a really nailed down solution that we’ll feel super great about?” That’s one category. It’s a little bit hard to down money on that, but I think for people who like doing the theoretical or conceptual work, that’s a really good place to add such people.

There’s a second category that’s like this sort of understanding facts about human reasoning. Understanding in the context of debate, can humans be good judges between arbitrating different perspectives, competing perspectives? How would you set up a debate such that in fact, the honest strategy wins an equilibrium? Or on the amplification side, asking about this universality question, is it the case you can decompose questions into at least slightly easier questions.

I’m also pretty excited about throwing people at that. Just running more experiments, trying to actually get practice, engaging in this weird reasoning and really seeing, can people do this? Can we iterate and try and identify the hard cases? I’m pretty excited about that. I think it involves some overlap in the people who do those kinds of work, but it maybe involves different people.

There’s this third category of engaging with ML and actually moving from the theory to implementation. Also getting in a place with infrastructure, and expertise, and so on to implement whatever we think is the most promising approach. I think that again, requires a different kind of person still and maybe also requires a different kind of institutional will and money. It’s also a pretty exciting thing to me. That maybe it can both help provide a sanity check for various ideas coming out of the other kinds of experiments, and it can also be a little bit more of this being in a position to do stuff in the future.

We don’t necessarily know exactly what kind of alignment work will be needed, but just having institutions, and infrastructure, and expertise, and teams that have experience thinking hard about that question, actually building ML systems, and trying to implement say, “Here’s our current best guess. Let’s try and use this alignment or integrate these kind of ideas about alignment into state-of-the-art systems.” Just having a bunch of infrastructures able to do that seems really valuable.

Anyway, those are the three categories where I’m most excited about throwing resources on alignment work. I mostly don’t think– It’s very hard to talk in the abstract about which one’s more promising, just because there’s going to be lots of comparative advantage considerations, but I think there’s definitely a reasonable chunk of people that push, “I think it’s best to go into any of those three directions.”

Sending a message to the future

Robert Wiblin: Changing gears into something a bit more whimsical about a blog post that I found really charming. You’ve argued recently that a potentially effective way to reduce existential risk would be to leave messages somewhere on earth for our descendants to find in case civilization goes under or humans go extinct and then life reappears, intelligent life reappears on the earth, and we maybe want to tell them something to help them be more successful where we failed. Do you want to outline the argument?

Paul Christiano: Yes. The idea is, say, humanity– if every animal larger than a lizard was killed and you still have the lizards, lizards have a long time left before the lizards would all die, before photosynthesis began breaking down. I think based on our understanding of evolution, it seems reasonably likely that in the available time, lizards would again be able to build up to a spacefaring civilization. It’s definitely not a sure thing and it’s a very hard question to answer, but my guess would be more likely than not, lizards will eventually be in a position to also go and travel to space.

Robert Wiblin: It’s a beautiful image.

[laughter]

Paul Christiano: That’s one place where we’re like, “That’s a weird thing.” There’s another question, “How much do you care about that lizard civilization?” Maybe related to these other arguments, related to weird decision theory arguments about how nice should you be to other value systems? I’m inclined to be pretty happy if the lizards take our place. I prefer we do it, but if it’s going to be the lizards or nothing, I consider a real– I would really be inclined to help the lizards out.

Robert Wiblin: Maybe this is too much of an aside here, but I kind of– In that case, I had the intuition that, “Yes, future lizard people or humans now…” It’s like I’m not sure which is better. It’s like humans were drawn out of the pool of potential civilizations. It’s not obvious whether we’re better or worse than like– If you reran history with lizards rather than people.

Robert Wiblin: I just wanted to jump in because some of my colleagues pointed out that apparently there’s some insane conspiracy theory out there about so-called ‘lizard people’ secretly running the world, which I hadn’t heard of. To avoid any conceivable possible confusion, what we’re talking about has nothing to do with any such ‘lizard people’. [laughter]

‘Lizard people’ is just our jokey term for whatever intelligent life might one day re-evolve on Earth, many millions or hundreds of millions of years into the future, should humans at some point die out. Perhaps ‘lizard people’ was a slightly unfortunate turn of phrase in retrospect! OK, on with the show.

Paul Christiano: Yes. I think it’s an interesting question. I think this is, to me, one of the most– it’s related to one of the most important open philosophical questions, which is, just in general, what kinds of other value systems should you be happy with replacing you? I think the lizards would want very different things from us and on the object level, the world they created might be quite different from the world we would’ve created.

I share this intuition of like, I’m pretty happy for the lizards. It’s like I’d feel pretty great. If I’m considering, should we run a risk of extinction to let the lizards take over? I’m more inclined to let the lizards take over than run a significant risk of extinction. Yes, it’s like I would be happy. If there’s anything we could do to make life easier for the lizards, I’m pretty excited about doing it.

Robert Wiblin: I’m glad we’ve made this concrete with the lizard people.

[laughter]

Okay, carry on.

Paul Christiano: I’d say lizards also in part because if you go too much smaller than lizards, at some point, it becomes more dicey. If you only had like plants, it’s a little bit more dicey whether if they have enough time left. Lizards, I think, are kind of safe-ish. Lizards are pretty big and pretty smart. Most of the way to spacefaring.

Then there’s this question of, what could we actually do? Why is this relevant? The next question is, is there a realistic way that we could kill ourselves and all the big animals without just totally wiping out life on earth or without replacing ourselves with say AIs pursuing very different values? I think by far the most likely way that we’re going to fail to realize our values is like we don’t go extinct, but we just sort of are doing the wrong thing and pointing in the wrong direction. I think that’s much, much more likely than going extinct.

My rough understanding is that if we go extinct at this point, we will probably take really most of the earth’s ecosystem with us. I think if you thought that climate change could literally kill all humans, then you’d be more excited. It’s like there’s some plausible ways that you could kill humans but not literally kill everything. It’s a total really brutal collapse of civilization. Maybe there’s like some kinds of bioterrorism that kill all large animals but don’t kill- or kill all humans, but don’t necessarily kill everything.

If those are plausible, then there’s some chance that you end up in this situation where we’ve got the lizards and now it’s up to the lizards to colonize space. In that case, it does seem like we have this really interesting lever, where lizards will be evolving over some hundreds of millions of years. They’ll be like in our position some hundreds of millions of years from now. It does seem probably realistic to leave messages that is to like, somehow change earth such that a civilization that appeared several hundred million years later could actually notice the changes we’ve made and could start investigating them.

At that point, we would probably have like, if we’re able to call the attention of some future civilization to a particular thing, I think then we can encode lots of information for them and we could decide how we want to use that communication channel. Sometimes people talk about this, they normally are imagining radically shorter time periods than hundreds of millions of years, and they’re normally not being super thoughtful about what they’d want to say. I think my guess would be that like, if there are ways– You could really substantially change the trajectory of civilization by being able to send a message from a much, much more–

If you imagine like the first time that humans could have discovered a message sent by a previous civilization, it would have been– I mean it depends a little bit on how you’re able to work this out, but probably at least like a hundred years ago. At that point, the message might’ve been sent from a civilization which was much more technologically sophisticated than they are. Also, which has like experienced an entire civil– the entire arc of civilization followed by extinction.

At a minimum, it seems like you could really change the path of their technological development by like selectively trying to spell out for them or show them how to develop- how to achieve certain goals. You could also attempt, although it seems like a little bit more speculative to help set them on a better course and be like, “Really, you should be concerned about killing everyone.” It’s like, “Here’s some guidance on how to set up institutions so they don’t kill every new one.”

I’m very concerned about AI alignment, so I’d be very interested as much as possible being like, “Here’s the thing, which upon deliberation we thought was a problem. You probably aren’t thinking about it now, but FYI, be aware.” I do think that would put a community of people working on that problem and that future civilization into a qualitatively different place than if like– It’s just sort of– I don’t know.

It’s very hard to figure out what the impact would be had we have stumbled across these very detailed messages from the past civilization. I do think it could have a huge technological effect on the trajectory of development, and also reasonably likely have a reasonable effect either on deliberation and decisions about how to organize ourselves or on other intellectual projects.

Robert Wiblin: Yes. Give this hypothetical again, could we have made history go better if we could just send as much text as we wanted back to people in 1600 or 1700? Then it kind of on reflection does seem like, “Well yes, we could just send them lots of really important philosophy and lots of important discoveries in social science, and tell them also the things that we value that maybe they don’t value. Like speed up kind of the strains of philosophical thought that we think are particularly important.”

Paul Christiano: You also just choose what technology– [chuckles] like pick and choose from all the technologies that exist in our world and be like, “Here’s the ones we think are good on balance.”

Robert Wiblin: Right, yes. You just like, you don’t give them the recipe for nuclear weapons. Instead, you give them the game theory for mutually assured destruction so they can– or you like tell them everything we do about how to sustain international cooperation, so whenever they do develop nuclear weapons, they’re in a better position to not destroy themselves.

Paul Christiano: Yes, and “Here’s a way to build a really great windmill.”

[laughter]

Robert Wiblin: [laughs] Yes, “Here’s solar panels. Why not? Yes, get some solar panels stuff.”

Paul Christiano: I don’t know how much good you could do with that kind of intervention and it’s a thing that would be interesting to think about a lot more. My guess would be that there’s some stuff which in expectation is reasonably good, but it’s hard to know.

Robert Wiblin: Yes. There’s a pretty plausible case that if humans went extinct, intelligent life might reemerge. Probably, if we thought about it long enough, we could figure out some useful things that we could tell them that would probably help them and give them a better shot at surviving, and thriving, and doing things that we value. How on earth would you leave a message that could last hundreds of millions of years? It seems like it could be pretty challenging.

Paul Christiano: Yes, I think there’s two parts of the problem. One part is calling someone’s attention to a place. I think that’s the harder part by far. For example, if you were to like– you can’t just bury a thing in most places on earth, because hundreds and millions of years is long enough in that the surface of the earth is no longer the surface of the earth. I think the first and more important problem is calling someone’s attention to a spot or to one of a million spots or whatever.

Then the second part of the problem is, after having called someone’s attention to a spot, how do you actually encode information? How do you actually communicate it to them? It’s also probably worth saying, this comes from a blog post that I wrote. I expect, I think that there are people who have a much deeper understanding of these problems, that have probably thought about many of these exact problems in more depth than I have. I don’t want to speak as if I’m like a–

Robert Wiblin: An authority– leaving messages for future civilizations. [laughs]

Paul Christiano: That’s right. I thought about it for some hours.[laughter]

In terms of calling attention, I thought of a bunch of possibilities in the blog post that I was interested in– started some discussions online with people brainstorming possibilities. I think if we thought about a little bit, we could probably end up with a clearer sense.

Probably the leading proposal so far is, I think Jan Kulveit had this proposal of– There’s this particularly large magnetic anomaly in Russia, which is very easy for civilization to discover quite early, and which is located such that it’s unlikely to move as tectonic plates move. It seems pretty plausible, it’s a little bit difficult to do but it’s pretty plausible that you could use modifications to that structure or locating things and Schelling points in the structure in a way that at least our civilization would very robustly have found. It’s hard to know how much a civilization quite different from ours would have…

Robert Wiblin: You said, just the straightforward idea of a really big and hard rock that’s jots out of the earth. Hopefully, we’ll survive long enough to be– [crosstalk]

Paul Christiano: Yes, it’s really surprisingly hard to make things like that work. [chuckles]

Robert Wiblin: Yes, I guess it’s over that period of time, even a very durable rock is going to be broken down by erosion.

Paul Christiano: Yes. Also stuff moves so much. Like you put the rock on the surface of the earth, it’s not going to be on the surface of the earth in hundreds of millions of years anymore.

Robert Wiblin: It just gets buried somehow. Yes, interesting. [crosstalk]

Paul Christiano: Surprisingly, I really updated a lot towards it being rough. When I started writing this post, I was like, “I’m sure this is easy,” and I was like, “Aw jeez, Really, basically, everything doesn’t work.”

Robert Wiblin: What about a bunch of radioactive waste that would be detectable by Geiger counters?

Paul Christiano: Yes, so you can try and do things– You have to care about how long these things can last, and how easy they are to detect, and how far from the surface they remain detectable, but I think there are options like that, that work. [chuckles] I think also magnets for me and magnets are longer-lasting than we might have guessed and a reasonable bet. I think it can be easily as effective.

Robert Wiblin: You made this point that you can literally have thousands of these sites and you can make sure that in every one, there’s a map of where all the others are, so they only have to find one. Then they can just go out and dig up every single one of them, which definitely improves the odds.

Paul Christiano: Yes. Also, there are some fossils around, so if you think you got a million very prone-to-be fossilized things, then it’s probably not going to work. Yes, I haven’t thought about that in a while. I think probably if you sat down though, if you just took a person, and that person spent some time really flushing out these proposals, and digging into them, and consulting with experts, they probably find something that would work.

Similarly, on the social side, if you thought about a really long time, expect you could find– you sort of have a more conservative view about whether there’s something to say that would be valuable. The first step would be, do you want to pay someone to spend a bunch of time thinking about those things? Is there someone who’s really excited to spend a bunch of time thinking about those things, nailing down the proposals? Then seeing whether it was a good idea and then if it was a good idea, spending millions or tens of millions of dollars you need to do to actually make it happen.

Robert Wiblin: In terms of how you would encode this information, it seemed like you thought of probably just etching it in rock would be a plausible first pass. That would probably be good enough for most of the time. You could probably come up with some better material on which you could etch things that is very likely to last a very long time. At least if it’s buried properly.

Paul Christiano: I think other people have thought more about this aspect of the problem and I think in general, with more confidence, something will work out, but I think just etching stuff is already good enough under reasonable conditions. It’s a lot easier to have a little thing that will survive. It’s easier to have a small thing that will survive for hundreds of millions of years than to disfigure the earth in a way that will be noticeable and would call someone’s attention to it in hundreds of millions of years.

Robert Wiblin: Okay, this brings me to the main objection I had, which is that the lizard people probably don’t speak English, and so even if we bury Wikipedia, I think they might just find it very confusing. How is it clear that we can communicate any concepts to lizard people in a hundred million years time?

Paul Christiano: Yes, I think that’s a pretty interesting question. That goes into things you want to think about. I do think when people have historically engaged in the project of like trying to figure out easy, like if you have a lost language or you have some relatives you’re trying to make sense of, you’re really in a radically worse position than like the lizard people would be in with respect to this artifact, since we would have put a lot of information into it really attempting to be understood. I think we don’t really have examples of humans having encountered this super information-rich thing that’s attempting to be understood.

I guess this is like a game, you can try and play it amongst humans and I think humans can win very easily at it, but it’s unclear the extent to which, it’s because we have all this common context and because I think humans do not need anything remotely resembling language because art easily wins this game. In order to easily build up the language of concepts just by simple illustrations, and diagrams, and so on.

I think it’d be right to be skeptical of even when it’s not a language, we just are using all of these concepts that are common. We’ve thought about things in the same way, we know what we’re aiming at. I think I’m like reasonably optimistic, but it’s pretty unclear. This is also a thing that I guess people have thought about a lot, although in this case, I’m a lot less convinced in their thinking, than in the ‘writing stuff really small in a durable way case’.

Robert Wiblin: My understanding was that the people who thought about it a lot seemed very pessimistic about our ability to send messages. Well, I guess, to be honest, the only case I know about is, there was a project to try to figure out what messages should we put at the site where we’re burying really horrible nuclear waste. You’re putting this incredibly toxic thing under the ground and then you’re like, “Wow, we don’t want people in the future to not realize what this is, and then dig it up, and then kill themselves.”

There was quite a lot of people, I guess linguists, sociologists, all these people who were trying to figure out what signals do we put there? Is it signs? Is it pictures? Whatever it is. They settled on some message that I think they drew out in pictures, that was, I thought, absolutely insanely bad because it was like– I couldn’t see how any future civilization would interpret it as anything other than like religious stuff that they would be incredibly curious about, and then would absolutely go and dig it up.

[laughter]

I’ll find the exact message that they decided to communicate and potentially read it out here, and people could judge for themselves.

—

Rob Wiblin: Hey folks, I looked up this message to add in here so you can pass judgement on it. Here it is:

“This place is a message… and part of a system of messages …pay attention to it!

Sending this message was important to us. We considered ourselves to be a powerful culture.

This place is not a place of honor… no highly esteemed deed is commemorated here… nothing valued is here.

What is here was dangerous and repulsive to us. This message is a warning about danger.

The danger is in a particular location… it increases towards a center… the center of danger is here… of a particular size and shape, and below us.

The danger is still present, in your time, as it was in ours.

The danger is to the body, and it can kill.

The form of the danger is an emanation of energy.

The danger is unleashed only if you substantially disturb this place physically. This place is best shunned and left uninhabited.”

As I said, I really think a future civilization, human or otherwise, would be insanely curious about anything attached to a message like that, and would guess that the site was religious in nature. If they hadn’t learned about nuclear radiation themselves already, I think they’d be more likely to dig at that spot than if it were simply left unmarked. Alright, back to the conversation.

—

Anyway, they did have this– I think actually the plan there was to write it in tons of languages that exist today in the hope that one of those would have survived. That was one of the options.

Paul Christiano: That’s not going to be an option here.

Robert Wiblin: Not an option here.

Paul Christiano: I think it’s quite a different issue– It’s different if you want to make a sign, so someone who encounters that sign can tell what it’s saying versus if I want to write someone a hundred million words, such as like somehow if they’re willing to spend– if we encountered a message from some civilization that we can tell has technological powers much beyond our own, we’re like, “Okay, that’s really high up on our list of priorities. I don’t know what the hell they’re talking about.” It’s just a very different situation where they were in this huge amount of content. It’s like the most interesting academic project of all academic– it goes to the top of the intellectual priority queue upon discovering such a thing.

I have a lot more confidence in our ability to figure something out or a civilization who has a similar ability to us to figure something out under those conditions. Than under like, they’re walking around, they encounter a sign– perhaps they’re somewhat primitive at this point. I also have no idea what’s up with it. It’s also just not that much content. It’s unclear how you– in the case where you’re like are only giving them like 10,000 words of content or some pictures, they just don’t have enough traction to possibly figure out what’s up.Whereas, in this case, we have– we’re not just coming in with one proposal of how you could potentially build a shared conceptual language, we’re like, “We have a hundred proposals, we’re just trying them all, just every proposal like any fourth-grader came up with.

“That’s fine. Throw it in there too.” [laughs] Bits are quite cheap so you can really try a lot of things in a much– Yes, I think it’s just a much better position than people normally think

about.

Robert Wiblin: I think archaeologists, when they’ve dug up a writing, sometimes they’ve decoded it by like analogy to other languages that we do have records about. Sometimes they’re just like the Rosetta Stone where it’s like, “Now, here we’ve got a translation so then we can figure out what that–” I think they had like a translation for two of them and there was a third language that was the same thing. Then they could figure out what the language sounded like from that, and then figure out very gradually what the words meant.

I think there’s other cases where just from context, they’ve dug up stones and they’re like, “What is this?” It turns out that it’s a bunch of financial accounts for a company, and they’re like, figuring out like imports and exports from this place, which like makes total sense. You can imagine that they’ll be doing that. Your hope here is that we will just bury so much content, and then we’ll have like a bunch of pictures, like lots of words, repeating words, that eventually, they’ll be able to decode.

They’ll figure out from some sort of context, I guess, they’ll be flicking through the encyclopedia and then they’ll find one article about a thing that they can figure out what it is, because they also have this thing. They’re like trees. Okay, we’ve got the article about trees and we still have trees. Then they kind of work out, “Well, what would I say about trees if I was writing an encyclopedia? They read an article about trees, so they guess what those words are. Then they kind of go out from there.

Paul Christiano: We can make things a lot simpler than encyclopedia articles where you can be like, “Here’s a lexicon of a million concepts. For each of them or whatever, 10,000 concepts. For each of them, a hundred pictures, and a hundred sentences about them, and a hundred attempts to define them. Attempted to organize well.

Robert Wiblin: Yes. Okay, I agree. I think if you went to that level, then probably you could do it. Although some concepts might be extremely hard to illustrate.

[laughter]

Paul Christiano: Yes, I’m more optimistic about like communi– Well, I don’t know. Communicating technology seems easier than–

Robert Wiblin: Just like, “Here’s a picture of a steam engine.” Whereas, maybe philosophy is a bit trickier or religion. In the blog post, you suggested. That this might be a pretty good bang for your buck in terms of reducing existential risk. I think you had a budget of $10 million for a minimum viable product of this. You were thinking, “Yes, this could improve their odds of surviving by one percentage point is if we’re very careful about what messages we send them and what messages we don’t send them.” Do you still think something like that?

The budget of $10 million seemed incredibly low to me. I guess here we’ve been envisaging something potentially a lot more ambitious than perhaps what you were thinking about at the time.

Paul Christiano: Yes, $10 million, I think, does seem– After talking to people about what the actual storage options are or how to make a message, how the biggest people could find a message, $10 million seems low and $100 million seems probably more realistic, which makes cost-effectiveness numbers worse.

I think it is worth pointing out that you have to go separately on the– If you imagine three phases, four phases of the project: figuring out what to say, somehow making a landmark people can identify, actually including a bunch of information, and then actually writing, trying to communicate the information, the thing that you wanted to say.

If any one of those is expensive, you can relatively easily bring the others up to the same cost.

If we’re getting to spend millions of dollars on each of those phases. I think actually, I’m probably imagining the lion’s share of the cost going into leaving a landmark, but that still leaves you with millions of dollars to spend on other components, which is a few people working full-time for years.

Robert Wiblin: I would have thought that the most difficult thing would be to figure out what to say and then figure out how to communicate it. If we’re talking about, it’s like drawing pictures for every word that we think lizard people would be able to understand, that seems more like a lot of homework.

[laughter]

Paul Christiano: I think it’s hard to ballpark the cost of that kind of work. Are we talking a hundred-person years or a thousand-person years? How many person years of effort is that? You can think about how many person years of effort go into reasonable encyclopedias. It’s tricky thinking about the costs. I think at $100 million, I feel good about how thoroughly– again, you’re not going to be able to have a great answer what to send, but you’re going to have an answers supported by people who are going to think a few years. I guess probably if you’re doing this project, you’re doing it under a certain set of rules.

This project is already predicated on a bunch a crazy views about the world, and so you’re making an all out bet on those crazy views about the world. When you’re doing these other stages, you’re also sort of just conditioning on those crazy views about the world being correct, about what basic things are important, and how things basically work, which I think does in some sense help. You only have to eat those factors of those crazy views being right ones. You don’t have to pay them again.

I guess I’ve always imagined that it would take less than a few person years of effort to produce like– if I wanted to produce something that could be understood by future civilization. Maybe I’m just way too optimistic about that. I haven’t engaged with any of the communities that have thought about this problem in detail. Totally possible that I’m way off base.

Anyway, when I imagine people spending 10 years on that, I’m like, “10 years? That seems pretty good. It seems that they’re going to have this nailed. They’re going to have tested it a bunch of times. They’re going to have like six independent proposals that are implemented separately. Each of them is going to be super exhaustive with lots of nice pictures.” Nice pictures are actually a little bit hard, but they probably just get these bits and they’re like, “What do they do with all the bits?”

Robert Wiblin: Should listeners maybe fund this idea? Has anyone expressed interest in being the team lead on this?

Paul Christiano: Yes, there’ve been some conversations, very brief conversations of the landmarking step. I think that’s probably the first thing I would be curious about. What is the cost like? I don’t think it’s a big project to be funded yet. I don’t think anyone’s really expressed interest in taking it up and running with it. [chuckles] I think the sequence would probably be, first check to see if the landmark thing makes sense and roughly, how it’s going to survive if it would necessarily be. Then think about the– Maybe do a sanity check on all the details, and then start digging in a little bit for a few months on how you would send things and how good does it actually look? Then six months in, you’d be like, now we have a sense of whether this is a good deal.

Robert Wiblin: If one of you listeners out there is interested in taking on this project, send me an email because you sound like a kind of fun person.

[laughter]

Do you have any other neglected or crazy sounding ideas that might potentially compare favorably to more traditional options for reducing existential risk?

Paul Christiano: I do think it’s worth caveating, I think, if there’s any way to try and address AI risk, that’s probably going to be better than this kind of thing related to my comparative advantage seeming to be in AI risk stuff. In terms of weird, altruistic schemes, I feel like I haven’t thought that much about this kind of thing over the last year. I don’t have anything that feels both very weird and very attractive.

Robert Wiblin: [laughs] What about anything that’s just attractive? I’ll settle. [chuckles]

Paul Christiano: I remain interested in– There’s a few things we discussed last time that, maybe very shallowly or maybe we didn’t have a chance to touch on, but I remain excited about. Some basic test of interventions that may affect cognitive performance seem like pretty weirdly neglected. Right now, I’m providing some funding to some clinical psychiatrists in Germany to do a test of creatine in vegetarians, which seems pretty exciting. I think the current state of the literature on carbon dioxide and cognition is absurd. I probably complained about this last time I was here. It’s just– [crosstalk]

Robert Wiblin: Let’s dive into this. It was a mistake of mine not to put these questions in. Just to go back on this creatine issue, there’s been some studies, one study in particular that suggested that for vegetarians and potentially for non-vegetarians as well, taking creatine gives you an IQ boost of a couple of points. It was very measurable even with a relatively small sample. This was a pretty big effect size by the standards of people trying to make people smarter.

Paul Christiano: Small by the standards of people normally looking for effects. Like a third of a standard deviation. This is respectable, but it’s huge, I don’t know of many interventions being that effective.

Robert Wiblin: Yes. If we can make like everyone three IQ points smarter, that’s pretty cool. Then there was just not much follow up on this even though it seems like this is way better than most of the other options we have for making people smarter other than, I suppose, improving health and nutrition.

Paul Christiano: Yes, this review is on the effects in omnivores. That’s been better studied. I think it doesn’t look that plausible that it has large effects in omnivores and there’s been some looking into mechanisms, and in terms of mechanisms, it doesn’t look great. If you look at like how creatine– I don’t know much about this area, all these areas we’re listing now are just random shit I’m speculating about sometimes. I really want to– I’ve got to put that out there. There should be a separate category for my views on AI.

Anyway, yes, looking at mechanisms, it doesn’t look that great. It would be surprising given what we currently know about biology for creatine supplementation to have this kind of cognitive effect. It’s possible and it’s not ruled out in vegetarians. The state in vegetarians is, I think, one inconclusive thing and there’s one really positive result. It seems just worth doing a reasonably powered check in vegetarians again.

I would be very surprised if something happened, but I think it’s possible. Some people would be more surprised, some people are like obviously nothing, but I’m at the like, 5-10% seems like a reasonable bet.

Robert Wiblin: On the vegetarianism point, when I looked at that paper, it seemed like they had chosen vegetarians mostly just because they expected the effect to be larger there because it’s the case that creatine supplementation also increases, like free creatine in the body for meat eaters. Just to explain for listeners who don’t know, meat has some creatine in it, although a lot less than people tend to supplement with. Vegetarians seem to have less because they’re not eating meat. The supplementation eventually has a larger effect.

Paul Christiano: Most likely that was just the choice that study made and then there was random variation where some studies– I’ve definitely updated more in the direction of their study is showing everything and it’s very, very easy to mess up studies or very, very easy to get not even just in the like 5% of the time you have results significant peak was .05, but just radically more often than that you get results that are wrong for God knows what reason.

Anyway, so most likely that’s a study that happened to return a positive result since they happened to be studying vegetarians. That was the reason they did it. Seemed like it should have a larger effect. I think since we’ve gotten negative evidence about the effects and omnivores, it doesn’t seem that likely. Although that would also be consistent with them just being three times smaller and omnivores would be plausible and then it would be compatible with what we know.

Robert Wiblin: You were kind of, “Goddamn, this is really important but like people haven’t put money into it, people haven’t run enough replications of this.” You just decided to–

Paul Christiano: One replication. It’s one pre-registered replication. That’s all I want.

Robert Wiblin: You were like, “I’m going to do it myself.” Talk about that for a minute?

Paul Christiano: Well, I feel like in this case, providing funding is not the hard part, probably. I’m happy for stuff like this. I’m very interested in providing funding. I made a Facebook post like, “I’m really interesting providing funding” and then EA stepped up and was like, “I know a lab that might be interested in doing this.” They then put me in touch with them.

Robert Wiblin: When might they have results?

Paul Christiano: In a year. I don’t know.

Robert Wiblin: Okay. Are you excited to find out?

Paul Christiano: I am. Yes, I’m excited to see how things go.

Robert Wiblin: Yes, talk about the carbon dioxide one for a minute because this is one that’s also been driving me mad the last few months just to see that carbon dioxide potentially has enormous effects on people’s intelligence and in offices but you eventually just have extremely– And lecture halls especially just have potentially incredibly elevated CO2 levels that are dumbing us all down when we most need to be smart.

Paul Christiano: Yes. I reviewed the literature a few years ago and I’ve only been paying a little bit of attention since then, but I think the current state of play is, there was one study with preposterously large effect sizes from carbon dioxide in which the methodology was put people in rooms, dump some gas into all the rooms. Some of the gases were very rich in carbon dioxide and the effect sizes were absurdly large.

They were like, if you compare it to the levels of carbon dioxide that occur in my house or in the house I just moved out of, the most carbon dioxide-rich bedroom in that house had one standard deviation effect amongst Berkeley students on this test or something, which is absurd. That’s totally absurd. That’s almost certainly–

Robert Wiblin: It’s such a large effect that you should expect that people, when they walk into a room with carbon dioxide which has elevated carbon dioxide levels, they should just feel like idiots at that point or they should feel like noticeably dumber in their own minds.

Paul Christiano: Yes, you would think that. To be clear, the rooms that have levels that high, people can report it feels stuffy and so part of the reason that methodology and the papers like just dumping in carbon dioxide is to avoid like if you make a room naturally that CO2 rich, it’s going to also just be obvious that you’re in the intervention group instead of the control.

Although to be fair, even if I don’t know, at that point, like even a placebo effect maybe will do something. I think almost certainly that seems wrong to me. Although maybe this is not a good thing to be saying publicly on a podcast. There’s a bunch of respected researchers on that paper. Anyway, it would be great to see a replication of that. There was subsequently replication with exactly the same design which also had p = 0.0001.

Now, we’ve got the two precise replications with p = 0.0001. That’s where we’re at. Also the effects are stupidly large. So large. You really, really need to care about ventilation effects. This room probably is, this is madness. Well, this building is pretty well ventilated but still, we’re at least a third of a standard deviation dumber.

Robert Wiblin: Yes, I’m sure dear listeners you can hear us getting dumber over the course of this conversation as we fill this room with poison. Yes, I guess potentially the worst case would be in meeting rooms or boardrooms where people are having very long– Yes prolonged discussions about difficult issues. They’re just getting progressively dumber as the room fills up with carbon dioxide and it’s going to be more irritable as well.

Paul Christiano: Yes, it would be pretty serious and I think that people have often cited this in attempts to improve ventilation, but I think people do not take it nearly as seriously as they would have if they believed it. Which I think is right because I think it’s almost certainly, the effect is not this large. If it was this large, you’d really want to know and then–

Robert Wiblin: This is like lead poisoning or something?

Paul Christiano: Yes, that’s right.

Robert Wiblin: Well, this has been enough to convince me to keep a window open whenever I’m sleeping. I really don’t like sleeping in a room that has no ventilation or no open door or window. Maybe I just shouldn’t worry because at night who really cares how smart I’m feeling while I’m dreaming?

Paul Christiano: I don’t know what’s up. I also haven’t looked into it as much as maybe I should have. I would really just love to be able to stay away, it’s not that hard. The facts are large enough but it’s also short term enough to just like extremely easy to check. In some sense, it’s like ”What are you asking for, there’s already been a replication”, though, I don’t know, the studies they use are with these cognitive batteries that are not great.

If the effects are real you should be able to detect them in very– Basically with any instrument. At some point, I just want to see the effect myself. I want to actually see it happen and I want to see the people in the rooms.

Robert Wiblin: Seems like there’s a decent academic incentive to do this, you’d think, because you’d just end up being famous if you pioneer this issue that turns out to be extraordinarily important and then causes buildings to be redesigned. I don’t know, it could just be a big deal. I mean, even if you can’t profit from it in a financial sense, wouldn’t you just want the kudos for like identifying this massive unrealized problem?

Paul Christiano: Yes, I mean to be clear, I think a bunch of people work on the problem and we do have– At this point there’s I think there’s the original– The things I’m aware of which is probably out of date now is the original paper, a direct replication and a conceptual replication all with big looking effects but all with slightly dicey instruments. The conceptual replication is funded by this group that works on ventilation unsurprisingly.

Robert Wiblin: Oh, that’s interesting.

Paul Christiano: Big air quality. Yes, I think that probably the take of academics, insofar as there’s a formal consensus process in academia, I think it would be to the effect that this is real, it’s just that no one is behaving as if the effect of that size actually existed and I think they’re right to be skeptical of the process, in academia. I think that does make– The situation is a little bit complicated in terms of what you exactly get credit for.

I think people that would get credit should be and rightfully would be the people who’ve been investigating it so far. This is sort of more like checking it out more for– Checking it out for people who are skeptical. Although everyone is implicitly skeptical given how much they don’t treat it like an emergency when carbon dioxide levels are high.

Robert Wiblin: Yes, including us right now. Well, kudos to you for funding that creatine thing. It would be good if more people took the initiative to really insist on funding replications for issues that seemed important where they’re getting neglected.

Paul Christiano: Yes, I think a lot of it’s great– I feel like there are lots of good things for people to do. I feel like people are mostly at the bottleneck just like people who have the relevant kinds of expertise and interests. This is one category where I feel people could go far and I’m excited to see how that goes.

Effect of more compute

Robert Wiblin: Last year OpenAI published this blog post which got people really excited. Showing that there has been a huge increase in the amount of compute used to train cutting edge ML systems. I think for the algorithms that have absorbed the most compute, there was a 300,000 fold increase in the amount of compute that had gone into them over six years.

It seemed like that’d been potentially a really big driver of more impressive AI capabilities over recent years. Would that imply faster progress going forward? Or did you think it will slow down as the increasing compute runs its course and gets harder and harder to throw more thermal processes at these problems?

Paul Christiano: I think it just depends on what your prior perspective was. If you had a prior perspective where you were eyeballing progress in the field and being like, “Does this feel like a lot of progress?” Then in general, it should be bad news or not bad news. It should make you think AI is further away. Then you’re like, “Well there was a lot of progress.” I had some intuitive sense of how much progress that was.

Now I’m learning with that rate of progress can’t be sustained that long or a substantial part of it has been this unscalable thing. You could talk about how much more you could go but maybe you had a million X over that period and you can have a further thousand X or something like that maybe 10,000 X.

Robert Wiblin: Well, I suppose there’s only so fast that process of getting faster and then also just the cost of buying tons of these things. People were able to ramp it up because previously it was only a small fraction of the total costs of their projects but I guess it’s now getting to be a pretty large fraction of the total cost of all of these AI projects in just buying enough processes.

Paul Christiano: Yes, a lot of things have a large compute budget. It’s still normally going to be small compared to staff budget and you can go a little bit further than that, but it’s getting large and you should not expect, if you’re at the point where you’re training human-level AI systems that the cost of– Like the compute cost for this training run should be a significant fraction of global outputs.

You could say maybe this trend could continue until you got up there. It’s probably not at this pace, it’s going to have to slow down a long time before it gets to like we are spending 2% of GDP on computers doing AI training. If you had that perspective we were eyeballing progress, then I think it should generally be an update towards longer timelines.

I think if you had a perspective that this is more random, coming from where you’re like, “Man, it’s really hard to tell.” It’s very hard to eyeball progress and be like, “How impressive is this? How impressive is beating humans at chess or beating humans at Go or classify images as well?” To do this particular image classification task, I find it very hard to really eyeball that kind of progress and make a projection.

I think if instead your estimates were coming from– Well, we think there is some more– We have some sketchy ways of estimating how much computing might be needed. We can make some analogy with the optimization done by evolution or by an extrapolation of training times or by arguments about other kinds of arguments about the human brain, which are really anchored to amounts of compute, then I think you might have a perspective that’s more like, “Well, this tells us something about, on paper, these arguments would have involved using large amounts of compute.”

There’s a lot of engineering effort in that kind of scale-up. There’s a lot of genuine uncertainty, especially if you’re talking about moderate timelines of, “Will that engineering effort actually be invested and will that willingness to spend actually materialize?” I think that might make you move in the direction of like, “Yes, apparently, people are putting in the effort and engineering practices are reasonably brisk.”

If instead, you were doing an estimate that was really driven by how much compute– This is the style of the old estimates futurists made. If you look at like, I mean Moravec. Like one of the earlier estimates of this flavor and Kurzweil has a very famous estimate of this flavor where they’re like, “It really matters like how much you compute you’re throwing at this task.”

If you have that kind of view and then you see this compute spending is rising really rapidly, I guess that’s evidence that maybe it will continue to rise and therefore, it will be shorter than you would have thought.

Robert Wiblin: Some people seem to think that we may be able to create a general artificial intelligence just by using the algorithms that we have today, but waiting for another decade or two worth of processing power to come online, progress in the chips and just building that infrastructure. How realistic do you think that is? Is that a live possibility in your mind?

Paul Christiano: I think it’s really hard to say, but it’s definitely a live possibility. I think a lot of people have an intuitive reaction– Some people have an intuition that’s very much “That’s obviously how it’s going to go.” I don’t think I sympathize with that intuition. Some people on the other side have an intuition, obviously, they’re really important things we don’t yet understand which will be difficult, so it’s hard to know how long they will take to develop, it’s going to be much longer the amount of time required to scale up computing.

I also I’m not super sympathetic to that either. I kind of feel like it’s really hard to know, it seems possible. It’s hard to rule it out on a priori grounds. Our observations are pretty consistent with things being loosely driven by compute. If you think of it like, what is the trade-off rate between compute and progress, conceptual progress or algorithmic progress.

I think our observations are pretty compatible with a lot of importance on compute, and also are compatible with the scale-up of existing things eventually getting you to– I guess that’s like definitely a view I have that eventually, enough scale-up will certainly almost certainly work. It’s just a question of how much and was that waiting to be seen over the next one or two decades, or is it like going to take you far past physical limits? Or, I’ll end up just pretty uncertain. I think a lot of things are possible.

Robert Wiblin: How does this question of the importance of compute relate to Moravec’s paradox? I guess, what is that for the audience of people who haven’t heard of it?

Paul Christiano: This is the general observation. There are some tasks humans think of as being intellectually difficult. A classic example is playing chess, and there are other tasks that they don’t think would be computationally difficult, that are like picking up an object. Looking at a scene, seeing where the objects are, picking up an object, and then bringing it.It has seemed to be the case that the tasks that people think of as traditionally intellectually challenging were easier than people suspected relative to the task people thought of as not that intellectually demanding. It’s not super straightforward because there’s still certainly big chunks of intellectual inquiry that people have no idea how to automate it and I think that’s the general pattern.

Robert Wiblin: You mean for example, humans think of philosophy is difficult and it’s also hard for computers to do philosophy or they don’t seem to be beating us at that.

Paul Christiano: Or mathematics or science. I guess people might often think to humans, it feels similar maybe to be doing mathematics and to be playing a really complicated board game, but to a machine, these tasks are not that similar.

Robert Wiblin: The board game is way easier.

Paul Christiano: Board games it turned out was very, very easy relative to all the other things even for– At this point, Go is a reasonable guess for the hardest board game. It was much easier than it is for other tasks for humans to automate. Yes, I think in general part of what’s going on there is the reasoning humans have conscious access to is just not that computationally demanding. We have some understanding, and it is a part of the very early optimism about AI.

We understand that when a human is consciously manipulating numbers or symbols or actually casting their attention to anything, they’re just not doing things that fast. A human is lucky if they can be doing 100 operations per second. That’s insane if a human is able to multiply numbers at that kind of speed that implies that or something. You’re like, “Wow, that’s incredible.”

But when a human is doing, underneath that there’s this layer, which is using vastly, vastly more computation. In fact, a lot of the difficulty, especially if you’re in compute-centric world is when you look at the task, you say, “How hard is that task for humans relative to a machine?”

A lot of the questions are like, “How well is a human leveraging all the computational capacity that they have when they’re doing that task?”

For these tasks, any task that is involved in conscious reasoning, maybe it’s less likely, at least the conscious part is not doing anything computationally interesting. Then you have this further issue for things like board games, where it’s like a human is not under much selection pressure to use– A human has not really evolved to play board games well. They’re not using much compute in their brain very well at all. Best guess would be if you evolved like much, much tinier animals that are much much better at playing board games than humans.

Robert Wiblin: Is it not the case that the human brain just has a ridiculous fraction of itself devoted to visual processing that has just required a ton of compute and I guess also evolution to tease that part of the brain well.

Paul Christiano: Yes. I don’t know off hand what the number is, but we’re talking about like the log scale, it just doesn’t even matter that much. It uses a reasonable– Vision uses a reasonable chunk of the brain and it’s extremely well optimized for it. It’s like when people play board games, they are probably leveraging some very large faction of their brain. Again, the main problem is like, the visual cortex is really optimized for doing vision well. They’re really using their brain for all that.

Usually, the luckiest case when you’re doing mathematics or playing a game somehow has enough– Makes enough intuitive sense or maps on well enough intuitively, you can build up these abstractions to leverage the full power of your brain through that task. It’s pretty unusual. This is not obvious, a priori [inaudible 01:00:39] this is just an after the facts story. You could imagine that there are people who are actually able to use their entire machine of visual processing to play some board games. You can imagine that.

I think that’s actually a live possibility. We talk about Go for example and we look at the way that we’ve now resolved Go. The amount of compute you would need to beat humans at Go using entirely a brute force strategy, using alpha-beta search or something, is a lot compared to your visual cortex or the individual system more broadly. You can make a plausible case that people are able to use a lot of that machinery– They are able to use a lot of machinery in playing Go and to a slightly lesser extent, chess for doing position evaluation, intuitions about how to play the game.

Robert Wiblin: You’re saying that you think the part of the brain that does visual processing is getting brought online to notice patterns in Go and is getting co opted to do the board game work.

Paul Christiano: Yes, at least that’s possible and consistent with our observations of how hard it is to automate the game. We just don’t know very much. Lot’s of things are inconsistent with our observations.

Robert Wiblin: Do you hope to find out whether we’re constrained by compute or algorithmic progress?

Paul Christiano: Yes. I generally think– In some sense it’s not going to be being constrained by one or the other, it’s going to be some marginal returns to each. What is the rate of substitution between more compute and more algorithmic progress? In general, I think it seems better from a long-term perspective, if it takes a lot of algorithmic progress to substitute for small amount of compute.

The more you’re in that world, the more concentrated different actors compute needs are. They are probably building really powerful AI systems. Everyone who’s building them is going to have to use– You’re going to have to be paying attention. They’re going to be using a very large fraction of their computational resources and any actor who wants to develop very powerful AI will be also using a reasonable fraction of the world’s resources and that means that it is much easier to know who is in that game, it’s much harder for someone to unilaterally do something.

It’s much easier for the players to be having a realistic chance of modern reinforcement and also just have a realistic chance of getting in a room and talking to each other. Probably not literally a room but reaching understanding and agreement. That’s one thing. Maybe the other thing which is harder is for algorithmic progress to substitute for hardware progress to slow the subsequent rate of progress is likely to be relative to what we’ve observed historically.

If you’re in a world where it turns out that just clever thinking really can drive AI progress extremely rapidly and the problem is just that we haven’t had that much clever thinking to throw at the problem, you can really imagine as one skills up AI and is able to automate all that thinking, having a pretty fast ongoing progress which might mean there’s less time between when long-term alignment problems become obvious and start mattering and AI can start helping with them and the point where it’s catastrophic to have not resolved them.

Generally if clever ideas can shorten that period a lot, it’s a little bit bad. It’s a little bit less likely that the automation, like the AI will have an incredible overnight effect on the rate of hardware progress and will also presumably accelerate it. Automation will help there as well but–

Robert Wiblin: You think if compute is what predominantly matters, then it’s going to be a more gradual process. We’ll have longer between the point when machine learning gets, starts to get used for important things and we start noticing where they work and where they don’t work and when a lot of things are getting delegated to machine learning relative to the algorithmic case where it seems like you get like really quite abrupt changes in the capabilities.

Paul Christiano: Yes, I think a lot of that. This could also change the nature of AI research. A lot of that is from hardware being this very immature industry with lots of resources being thrown at it and performance being really pretty well understood and it would be hard to double investment in that and also it’s not that sensitive, it’s a weird question about quality of human capital or something. You just sort of understand it. You have to do a lot of experimentation. It’s relatively capital intensive.

Robert Wiblin: There’s quite big lags as well.

Paul Christiano: Yes. It just seems like generally it would be more stable. Sounds like good news. This is one of the reasons one might give for being more excited about faster AI progress now. You might think that probably the biggest reason to be excited is like, if you have faster AI progress now, you’re in the regime where we’re using– If you manage to get some frontier, we’re using all the available competition as well as you could then subsequent progress can be a little more stable.

If you have less AI progress now and at some point, people only really start investing a bunch once it becomes clear they can automate a bunch of human labor, then you have this more whiplash effect where you’d have a bust of progress as people really start investing.

Thoughts on Pushmeet episode

Robert Wiblin: A few weeks ago, we published our conversation with Pushmeet Kohli who’s an AI robustness and reliability researcher at DeepMind over in London. I guess to heavily summarize Pushmeet’s views, I think he might’ve made a couple of key claims.

One was that alignment and robustness issues and his view appear everywhere throughout the development of machine learning systems, so they require some degree of attention from everyone who’s working in the field and according to Pushmeet, this makes the distinction between safety research and non-safety research somewhat vague and blurry and he thinks people who are working on capabilities are also helping with safety and improving reliability also improves capabilities for you because you can then you can actually design algorithms that do what you want.

Secondly, I think he thought that an important part of reliability and robustness is going to be trying to faithfully communicate our desires to machine learning algorithms and this is analogous, although a harder instance of the challenge of just communicating with other people, getting them to really understand what we mean. Although of course it’s easy to do that with other humans than with other animals or machine learning algorithms.

A third point, was, I guess just a general sense of optimism that DeepMind is working on this issue quite a lot and are keen to hire more people to work on these problems and I guess they sense that probably we’re going to be able to gradually fix these problems with AI alignment as we go along and machine learning algorithms, will get more influential. I know you haven’t had a chance to listen to the whole interview, but you skimmed over the transcript. Firstly, where do you think Pushmeet is getting things right? Where do you agree?

Paul Christiano: I certainly agree that there’s this tight linkage between getting AI systems to do what we want and making them more capable. I agree with the basic optimism that people will need to address in getting assistance to tackle this ‘do what we want ’problem. I think it is more likely than not that people will have a good solution to that problem. I think even if you didn’t have sort of long termist, maybe there’s this interesting intervention of, “Should long termists be thinking about that problem in order to increase the probability?”

I think even if the actions of the long termist are absent, there’s a reasonably good chance that everything would just be totally fine. In that sense, I’m on board with those claims, definitely. I think that I would disagree a little bit in thinking that there is a meaningful distinction between activities whose main effect is to change the date by which various things become possible activities, whose main effect is to like change the trajectory of development.

I think that’s the main distinguishing feature of working on alignment, per se. You care about this differential progress towards being able to build systems the way we want. I think in that perspective, it is the case like the average contribution of AI work is almost by definition zero on that front, because it’s bringing the entire– If you just increased all the AI work by a unit, you’re just bringing everything forward by one unit.

I think that doesn’t mean there’s like this well-defined theme which is, “Can we change the trajectory in any way?” and that’s an important problem to think about. I think there’s also a really important distinction between the failure which is most likely to disrupt like the long term trajectory of civilization and the failure which is most likely to be an immediate deal breaker for systems actually being useful or producing money and maybe one way to get at that distinction is related to the second point you mentioned.

Communicating your goals to an ML system is very similar to communicating with a human. I think there is a hard problem of communicating your goals to an ML system which we can view as a capabilities problem. Are they able to understand things people say? Are they able to form the internal model that would let them understand what I want or understand– In some sense, it’s very similar to the problem of predicting what Paul would do or it’s a little slice of that problem, like briefing under what conditions Paul would be happy with what you’ve done.

That’s most of what we’re dealing with when we’re communicating with someone. We’d be totally happy. If I’m talking with you, I would be like completely happy if I just managed to give you a perfect model of me then the problem is solved. I think that’s a really important AI difficulty for making AI systems actually useful. I think that’s less core to the– That’s less the kind of thing that could end up pushing us in a bad, long-term direction mostly because we’re concerned about the case– We’re concerned about behavior as AI systems become very capable and have a very good understanding of the world around them, of the people they’re interacting with and the really concerning cases are ones where AI systems actually understand quite well, what people would do under various conditions, understand quite well like what they want we think about as normal communication problems between people that are not motivated to act in sort of understand what Paul wants but aren’t trying to help Paul get what he wants and I think that a lot of the interesting difficulty, especially from a very long-term perspective is really making sure that no gaps opens up there.

Again, the gap between the problems that are most important in the very long run perspective and the problems