This week I’ll start an interview with Eliezer Yudkowsky, who works at an institute he helped found: the Singularity Institute of Artificial Intelligence.

While many believe that global warming or peak oil are the biggest dangers facing humanity, Yudkowsky is more concerned about risks inherent in the accelerating development of technology. There are different scenarios one can imagine, but a bunch tend to get lumped under the general heading of a technological singularity. Instead of trying to explain this idea in all its variations, let me rapidly sketch its history and point you to some reading material. Then, on with the interview!

In 1958, the mathematician Stanislaw Ulam wrote about some talks he had with John von Neumann:

One conversation centered on the ever accelerating progress of technology and changes in the mode of human life, which gives the appearance of approaching some essential singularity in the history of the race beyond which human affairs, as we know them, could not continue.

In 1965, the British mathematician Irving John Good raised the possibility of an "intelligence explosion": if machines could improve themselves to get smarter, perhaps they would quickly become a lot smarter than us.

In 1983 the mathematician and science fiction writer Vernor Vinge brought the singularity idea into public prominence with an article in Omni magazine, in which he wrote:

We will soon create intelligences greater than our own. When this happens, human history will have reached a kind of singularity, an intellectual transition as impenetrable as the knotted space-time at the center of a black hole, and the world will pass far beyond our understanding. This singularity, I believe, already haunts a number of science-fiction writers. It makes realistic extrapolation to an interstellar future impossible. To write a story set more than a century hence, one needs a nuclear war in between … so that the world remains intelligible.

In 1993 wrote an essay in which he even ventured a prediction as to when the singularity would happen:

Within thirty years, we will have the technological means to create superhuman intelligence. Shortly after, the human era will be ended.

You can read that essay here:

• Vernor Vinge, The coming technological singularity: how to survive in the post-human era, article for the VISION-21 Symposium, 30-31 March, 1993.

With the rise of the internet, the number of people interested in such ideas grew enormously: transhumanists, extropians, singularitarians and the like. In 2005, Ray Kurzweil wrote:

What, then, is the Singularity? It’s a future period during which the pace of technological change will be so rapid, its impact so deep, that human life will be irreversibly transformed. Although neither utopian or dystopian, this epoch will transform the concepts we rely on to give meaning to our lives, from our business models to the cycle of human life, including death itself. Understanding the Singularity will alter our perspective on the significance of our past and the ramifications for our future. To truly understand it inherently changes one’s view of life in general and one’s particular life. I regard someone who understands the Singularity and who has reflected on its implications for his or her own life as a "singularitarian".

He predicted that the singularity will occur around 2045. For more, see:

• Ray Kurzweil, The Singularity is Near: When Humans Transcend Biology, Viking, 2005.

Yudkowsky distinguishes three major schools of thought regarding the singularity:

• Accelerating Change that is nonetheless somewhat predictable (e.g. Ray Kurzweil).

• Event Horizon: after the rise of intelligence beyond our own, the future becomes absolutely unpredictable to us (e.g. Vernor Vinge).

• Intelligence Explosion: a rapid chain reaction of self-amplifying intelligence until ultimate physical limits are reached (e.g. I. J. Good and Eliezer Yudkowsky).

Yukdowsky believes that an intelligence explosion could threaten everything we hold dear unless the first self-amplifying intelligence is "friendly". The challenge, then, is to design “friendly AI”. And this requires understanding a lot more than we currently do about intelligence, goal-driven behavior, rationality and ethics—and of course what it means to be “friendly”. For more, start here:

• The Singularity Institute of Artificial Intelligence, Publications.

Needless to say, there’s a fourth school of thought on the technological singularity, even more popular than those listed above:

• Baloney: it’s all a load of hooey!

Most people in this school have never given the matter serious thought, but a few have taken time to formulate objections. Others think a technological singularity is possible but highly undesirable and avoidable, so they want to prevent it. For various criticisms, start here:

• Technological singularity: Criticism, Wikipedia.

Personally, what I like most about singularitarians is that they care about the future and recognize that it may be very different from the present, just as the present is very different from the pre-human past. I wish there were more dialog between them and other sorts of people—especially people who also care deeply about the future, but have drastically different visions of it. I find it quite distressing how people with different visions of the future do most of their serious thinking within like-minded groups. This leads to groups with drastically different assumptions, with each group feeling a lot more confident about their assumptions than an outsider would deem reasonable. I’m talking here about environmentalists, singularitarians, people who believe global warming is a serious problem, people who don’t, etc. Members of any tribe can easily see the cognitive defects of every other tribe, but not their own. That’s a pity.

And so, this interview:

JB: I’ve been a fan of your work for quite a while. At first I thought your main focus was artificial intelligence (AI) and preparing for a technological singularity by trying to create "friendly AI". But lately I’ve been reading your blog, Less Wrong, and I get the feeling you’re trying to start a community of people interested in boosting their own intelligence—or at least, their own rationality. So, I’m curious: how would you describe your goals these days?

EY: My long-term goals are the same as ever: I’d like human-originating intelligent life in the Solar System to survive, thrive, and not lose its values in the process. And I still think the best means is self-improving AI. But that’s a bit of a large project for one person, and after a few years of beating my head against the wall trying to get other people involved, I realized that I really did have to go back to the beginning, start over, and explain all the basics that people needed to know before they could follow the advanced arguments. Saving the world via AI research simply can’t compete against the Society for Treating Rare Diseases in Cute Kittens unless your audience knows about things like scope insensitivity and the affect heuristic and the concept of marginal expected utility, so they can see why the intuitively more appealing option is the wrong one. So I know it sounds strange, but in point of fact, since I sat down and started explaining all the basics, the Singularity Institute for Artificial Intelligence has been growing at a better clip and attracting more interesting people.

Right now my short-term goal is to write a book on rationality (tentative working title: The Art of Rationality) to explain the drop-dead basic fundamentals that, at present, no one teaches; those who are impatient will find a lot of the core material covered in these Less Wrong sequences:

• Map and territory.

• How to actually change your mind.

• Mysterious answers to mysterious questions.

though I intend to rewrite it all completely for the book so as to make it accessible to a wider audience. Then I probably need to take at least a year to study up on math, and then—though it may be an idealistic dream—I intend to plunge into the decision theory of self-modifying decision systems and never look back. (And finish the decision theory and implement it and run the AI, at which point, if all goes well, we Win.)

JB: I can think of lots of big questions at this point, and I’ll try to get to some of those, but first I can’t resist asking: why do you want to study math?

EY: A sense of inadequacy.

My current sense of the problems of self-modifying decision theory is that it won’t end up being Deep Math, nothing like the proof of Fermat’s Last Theorem—that 95% of the progress-stopping difficulty will be in figuring out which theorem is true and worth proving, not the proof. (Robin Hanson spends a lot of time usefully discussing which activities are most prestigious in academia, and it would be a Hansonian observation, even though he didn’t say it AFAIK, that complicated proofs are prestigious but it’s much more important to figure out which theorem to prove.) Even so, I was a spoiled math prodigy as a child—one who was merely amazingly good at math for someone his age, instead of competing with other math prodigies and training to beat them. My sometime coworker Marcello (he works with me over the summer and attends Stanford at other times) is a non-spoiled math prodigy who trained to compete in math competitions and I have literally seen him prove a result in 30 seconds that I failed to prove in an hour.

I’ve come to accept that to some extent we have different and complementary abilities—now and then he’ll go into a complicated blaze of derivations and I’ll look at his final result and say "That’s not right" and maybe half the time it will actually be wrong. And when I’m feeling inadequate I remind myself that having mysteriously good taste in final results is an empirically verifiable talent, at least when it comes to math. This kind of perceptual sense of truth and falsity does seem to be very much important in figuring out which theorems to prove. But I still get the impression that the next steps in developing a reflective decision theory may require me to go off and do some of the learning and training that I never did as a spoiled math prodigy, first because I could sneak by on my ability to "see things", and second because it was so much harder to try my hand at any sort of math I couldn’t see as obvious. I get the impression that knowing which theorems to prove may require me to be better than I currently am at doing the proofs.

On some gut level I’m also just embarrassed by the number of compliments I get for my math ability (because I’m a good explainer and can make math things that I do understand seem obvious to other people) as compared to the actual amount of advanced math knowledge that I have (practically none by any real mathematician’s standard). But that’s more of an emotion that I’d draw on for motivation to get the job done, than anything that really ought to factor into my long-term planning. For example, I finally looked up the drop-dead basics of category theory because someone else on a transhumanist IRC channel knew about it and I didn’t. I’m happy to accept my ignoble motivations as a legitimate part of myself, so long as they’re motivations to learn math.

JB: Ah, how I wish more of my calculus students took that attitude. Math professors worldwide will frame that last sentence of yours and put it on their office doors.

I’ve recently been trying to switch from pure math to more practical things. So I’ve been reading more about control theory, complex systems made of interacting parts, and the like. Jan Willems has written some very nice articles about this, and your remark about complicated proofs in mathematics reminds me of something he said:

… I have almost always felt fortunate to have been able to do research in a mathematics environment. The average competence level is high, there is a rich history, the subject is stable. All these factors are conducive for science. At the same time, I was never able to feel unequivocally part of the mathematics culture, where, it seems to me, too much value is put on difficulty as a virtue in itself. My appreciation for mathematics has more to do with its clarity of thought, its potential of sharply articulating ideas, its virtues as an unambiguous language. I am more inclined to treasure the beauty and importance of Shannon’s ideas on errorless communication, algorithms such as the Kalman filter or the FFT, constructs such as wavelets and public key cryptography, than the heroics and virtuosity surrounding the four-color problem, Fermat’s last theorem, or the Poincaré and Riemann conjectures.

I tend to agree. Never having been much of a prodigy myself, I’ve always preferred thinking of math as a language for understanding the universe, rather than a list of famous problems to challenge heroes, an intellectual version of the Twelve Labors of Hercules. But for me the universe includes very abstract concepts, so I feel "pure" math such as category theory can be a great addition to the vocabulary of any scientist.

Anyway: back to business. You said:

I’d like human-originating intelligent life in the Solar System to survive, thrive, and not lose its values in the process. And I still think the best means is self-improving AI.

I bet a lot of our readers would happily agree with your first sentence. It sounds warm and fuzzy. But a lot of them might recoil from the next sentence. "So we should build robots that take over the world???" Clearly there’s a long train of thought lurking here. Could you sketch how it goes?

EY: Well, there’s a number of different avenues from which to approach that question. I think I’d like to start off with a quick remark—do feel free to ask me to expand on it—that if you want to bring order to chaos, you have to go where the chaos is.

In the early twenty-first century the chief repository of scientific chaos is Artificial Intelligence. Human beings have this incredibly powerful ability that took us from running over the savanna hitting things with clubs to making spaceships and nuclear weapons, and if you try to make a computer do the same thing, you can’t because modern science does not understand how this ability works.

At the same time, the parts we do understand, such as that human intelligence is almost certainly running on top of neurons firing, suggest very strongly that human intelligence is not the limit of the possible. Neurons fire at, say, 200 hertz top speed; transmit signals at 150 meters/second top speed; and even in the realm of heat dissipation (where neurons still have transistors beat cold) a synaptic firing still dissipates around a million times as much heat as the thermodynamic limit for a one-bit irreversible operation at 300 Kelvin. So without shrinking the brain, cooling the brain, or invoking things like reversible computing, it ought to be physically possible to build a mind that works at least a million times faster than a human one, at which rate a subjective year would pass for every 31 sidereal seconds, and all the time from Ancient Greece up until now would pass in less than a day. This is talking about hardware because the hardware of the brain is a lot easier to understand, but software is probably a lot more important; and in the area of software, we have no reason to believe that evolution came up with the optimal design for a general intelligence, starting from incremental modification of chimpanzees, on its first try.

People say things like "intelligence is no match for a gun" and they’re thinking like guns grew on trees, or they say "intelligence isn’t as important as social skills" like social skills are implemented in the liver instead of the brain. Talking about smarter-than-human intelligence is talking about doing a better version of that stuff humanity has been doing over the last hundred thousand years. If you want to accomplish large amounts of good you have to look at things which can make large differences.

Next lemma: Suppose you offered Gandhi a pill that made him want to kill people. Gandhi starts out not wanting people to die, so if he knows what the pill does, he’ll refuse to take the pill, because that will make him kill people, and right now he doesn’t want to kill people. This is an informal argument that Bayesian expected utility maximizers with sufficient self-modification ability will self-modify in such a way as to preserve their own utility function. You would like me to make that a formal argument. I can’t, because if you take the current formalisms for things like expected utility maximization, they go into infinite loops and explode when you talk about self-modifying the part of yourself that does the self-modifying. And there’s a little thing called Löb’s Theorem which says that no proof system at least as powerful as Peano Arithmetic can consistently assert its own soundness, or rather, if you can prove a theorem of the form

□P ⇒ P

(if I prove P then it is true) then you can use this theorem to prove P. Right now I don’t know how you could even have a self-modifying AI that didn’t look itself over and say, "I can’t trust anything this system proves to actually be true, I had better delete it". This is the class of problems I’m currently working on—reflectively consistent decision theory suitable for self-modifying AI. A solution to this problem would let us build a self-improving AI and know that it was going to keep whatever utility function it started with.

There’s a huge space of possibilities for possible minds; people makethe mistake of asking "What will AIs do?" like AIs were the Tribe that Lives Across the Water, foreigners all of one kind from the same country. A better way of looking at it would be to visualize a gigantic space of possible minds and all human minds fitting into one tiny little dot inside the space. We want to understand intelligence well enough to reach into that gigantic space outside and pull out one of the rare possibilities that would be, from our perspective, a good idea to build.

If you want to maximize your marginal expected utility you have to maximize on your choice of problem over the combination of high impact, high variance, possible points of leverage, and few other people working on it. The problem of stable goal systems in self-improving Artificial Intelligence has no realistic competitors under any three of these criteria, let alone all four.

That gives you rather a lot of possible points for followup questions so I’ll stop there.

JB: Sure, there are so many followup questions that this interview should be formatted as a tree with lots of branches instead of in a linear format. But until we can easily spin off copies of ourselves I’m afraid that would be too much work.

So, I’ll start with a quick point of clarification. You say "if you want to bring order to chaos, you have to go where the chaos is." I guess that at one level you’re just saying that if we want to make a lot of progress in understanding the universe, we have to tackle questions that we’re really far from understanding—like how intelligence works.

And we can say this in a fancier way, too. If we wants models of reality that reduce the entropy of our probabilistic predictions (there’s a concept of entropy for probability distributions, which is big when the probability distribution is very smeared out), then we have to find subjects where our predictions have a lot of entropy.

Am I on the right track?

EY: Well, if we wanted to torture the metaphor a bit further, we could talk about how what you really want is not high-entropy distributions but highly unstable ones. For example, if I flip a coin, I have no idea whether it’ll come up heads or tails (maximum entropy) but whether I see it come up heads or tails doesn’t change my prediction for the next coinflip. If you zoom out and look at probability distributions over sequences of coinflips, then high-entropy distributions tend not to ever learn anything (seeing heads on one flip doesn’t change your prediction next time), while inductive probability distributions (where your beliefs about probable sequences are such that, say, 11111 is more probable than 11110) tend to be lower-entropy because learning requires structure. But this would be torturing the metaphor, so I should probably go back to the original tangent:

Richard Hamming used to go around annoying his colleagues at Bell Labs by asking them what were the important problems in their field, and then, after they answered, he would ask why they weren’t working on them. Now, everyone wants to work on "important problems", so why areso few people working on important problems? And the obvious answer is that working on the important problems doesn’t get you an 80% probability of getting one more publication in the next three months. And most decision algorithms will eliminate options like that before they’re even considered. The question will just be phrased as, "Of the things that will reliably keep me on my career track and not embarrass me, which is most important?"

And to be fair, the system is not at all set up to support people who want to work on high-risk problems. It’s not even set up to socially support people who want to work on high-risk problems. In Silicon Valley a failed entrepreneur still gets plenty of respect, which Paul Graham thinks is one of the primary reasons why Silicon Valley produces a lot of entrepreneurs and other places don’t. Robin Hanson is a truly excellent cynical economist and one of his more cynical suggestions is that the function of academia is best regarded as the production of prestige, with the production of knowledge being something of a byproduct. I can’t do justice to his development of that thesis in a few words (keywords: hanson academia prestige) but the key point I want to take away is that if you work on a famous problem that lots of other people are working on, your marginal contribution to human knowledge may be small, but you’ll get to affiliate with all the other prestigious people working on it.

And these are all factors which contribute to academia, metaphorically speaking, looking for its keys under the lamppost where the light is better, rather than near the car where it lost them. Because on a sheer gut level, the really important problems are often scary. There’s a sense of confusion and despair, and if you affiliate yourself with the field, that scent will rub off on you.

But if you try to bring order to an absence of chaos—to some field where things are already in nice, neat order and there is no sense of confusion and despair—well, the results are often well described in a little document you may have heard of called the Crackpot Index. Not that this is the only thing crackpot high-scorers are doing wrong, but the point stands, you can’t revolutionize the atomic theory of chemistry because there isn’t anything wrong with it.

We can’t all be doing basic science, but people who see scary, unknown, confusing problems that no one else seems to want to go near and think "I wouldn’t want to work on that!" have got their priorities exactly backward.

JB: The never-ending quest for prestige indeed has unhappy side-effects in academia. Some of my colleagues seem to reason as follows:

If Prof. A can understand Prof. B’s work, but Prof. B can’t understand Prof. A, then Prof. A must be smarter—so Prof. A wins. But I’ve figured out a way to game the system. If I write in a way that few people can understand, everyone will think I’m smarter than I actually am! Of course I need someone to understand my work, or I’ll be considered a crackpot. But I’ll shroud my work in jargon and avoid giving away my key insights in plain language, so only very smart, prestigious colleagues can understand it.

On the other hand, tenure offers immense opportunities for risky and exciting pursuits if one is brave enough to seize them. And there are plenty of folks who do. After all, lots of academics are self-motivated, strong-willed rebels.

This has been on my mind lately since I’m trying to switch from pure math to something quite different. I’m not sure what, exactly. And indeed that’s why I’m interviewing you!

(Next week: Yudkowsky on The Art of Rationality, and what it means to be rational.)

Whenever there is a simple error that most laymen fall for, there is always a slightly more sophisticated version of the same problem that experts fall for. – Amos Tversky

Related