Transcript

Robert Wiblin: Hi listeners, this is the 80,000 Hours Podcast, where each week we have an unusually in-depth conversation about the world’s most pressing problems and how you can use your career to solve them. I’m Rob Wiblin, Director of Research at 80,000 Hours.

Today’s episode was especially exciting for us – we love the book Algorithms To Live By, and really wanted Brian’s thoughts on how some of his ideas apply to big picture career choice questions. It’s likely to be both entertaining and useful to almost all of you listeners out there.

Though because we really wanted to get answers we could recommend on the explore vs exploit trade-off, the conversation just continues to dig deeper and deeper into that problem between 1h 40m and 2h 20m. If you ever feel like you’ve had enough of it, skip on to the final section, which is a buffet of the most interesting bits from the book, starting around 2h 19m.

I also have a really important favour to ask. Once a year, 80,000 Hours tries to figure out whether all the things we’ve been working on have actually been useful to you or not.

This year we’ve published 61 hours of the podcast, dozens of articles and hundreds of high impact jobs we’d like to fill.

Our donors need to know we’ve been changing people’s careers, so they know it’s a good idea to keep funding us, and we need to know that we shouldn’t just give up and get different jobs.

If no one had told us their stories in the past, 80,000 Hours just wouldn’t exist today.

So if anything 80,000 Hours has done, including this podcast, our website or our coaching has changed your career plans or otherwise helped you, please go to 80000hours.org/survey and take a few minutes to let us know how.

You can also let us know about any ways we’ve led you astray, or could be doing better.

That’s 80000hours.org/survey.

Alright, here’s Brian.

Robert Wiblin: Today I’m speaking with Brian Christian. Brian is a non-fiction author, best known for The Most Human Human and Algorithms To Live By, which he co-authored with Tom Griffiths and which became a number one US bestseller in the non-fiction category. He studied computer science and philosophy at Brown University, and since 2012 has been a visiting scholar at UC-Berkeley. Since 2013, he’s been the Director of Technology at McSweeney’s Publishing and an open source contributor to projects like Ruby on Rails. He’s appeared in the New Yorker, the Atlantic, the Paris Review, and The Daily Show with Jon Stewart. Thanks for coming on the podcast, Brian.

Brian Christian: Thanks for having me.

Robert Wiblin: Today I expect to mostly talk about Algorithms To Live By, which is a really outstanding book that a lot of listeners should go out and read after they listen to this episode. The primary reason is that we have a lot of questions about how to apply some of the algorithms to real-life decisions, especially career decisions. And for that purpose, I’m joined today by Ben Todd, founder and CEO of 80,000 Hours, who was especially keen to have this interview, because he has some very pressing questions about the explore/exploit tradeoff. So welcome, Ben.

Benjamin Todd: Yeah, I’m really excited about this interview. Over the last year, this was one of my favorite books, and it really felt like it introduced me to a lot of new mental models that I feel like I should have been taught in school but wasn’t. In particular, very excited to talk about how it can apply to specific career decisions. Yeah, the book covered a lot of things I’d vaguely heard about before, such as the secretary problem, but it went into way more depth than it’s normally covered, but still remaining really clear, really easy to understand. So I found it really interesting.

Robert Wiblin: Great. We’ll get to that in a minute. But first to Brian, what are you working on now?

Brian Christian: I’m working on a new book right now, which is about normative issues in computer science, so the question of how do we try to capture human values in, for example, machine learning? This covers things like, for example, the fairness, accountability, and transparency movement within the ML community and also things like the value alignment problem that people are thinking about in AI safety. So I think it touches on a number of things probably of interest to 80,000 Hours listeners, and I’ll be excited to talk more about that next year.

Robert Wiblin: What’s that book called, and when it’s coming out?

Brian Christian: The title is currently wobbling between a few different possibilities, so I don’t want to say until we finally determine it.

Robert Wiblin: Not until we’re sure, yeah.

Brian Christian: But this will be out, my guess would be sometime in the fall of next year.

Robert Wiblin: What made you choose to write about that topic?

Brian Christian: I think partly we are seeing kind of the confluence of, I think, two major trends. One is just this kind of explosive progress in machine learning as a discipline, particularly with the rise of deep learning, starting from 2012 to the present. And that has in turn created this reaction of both jubilation and also concern, that has really launched this subfield unto itself of technical AI safety. And you have things like Nick Bostrom’s book, obviously, turning into what I think is this really remarkable technical research agenda. I’m really interested in how some of these big ideas are actually getting cashed out, in terms of PhD theses and so forth.

Brian Christian: At the same time, you’ve got this societal adoption of machine learning systems, increasingly into kind of morally and ethically relevant domains, driving being one obvious example, but also things like arraignments and sentencing. Increasingly, we are thinking about how to translate the social contract into explicitly algorithmic terms. That is very intriguing to me as being an area where philosophy and computer science are on this collision course, and I think that’s only to be a more pressing issue in the next few years. So that has really captured my attention.

Robert Wiblin: Your first book, The Most Human Human, was also about artificial intelligence, and that was, I think, back in 2011. What did you write about then?

Brian Christian: Yeah, The Most Human Human is about the Turing test, and in particular, my experience as what’s called a human confederate in the Turing test competition. I was one of these people hidden behind a curtain, trying to convince a panel of scientists that I was in fact a human being and not a chatbot claiming to be a human being. This was kind of a fascinating and bizarre experience for me, and led me on an investigation, both into the history of the Turing test, the history of chatbot technology, and also into just this broader linguistic question of what should you do if you are in this competitive scenario where your objective is to convince someone else that you are a human being? How does that manifest into actual linguistic strategies? I had a lot of fun researching that and learning about both the technology and also kind of the nature of human conversation.

Robert Wiblin: Yeah, what did you do, broadly speaking, to try to seem incredibly human?

Brian Christian: Well, I looked into the way that most chatbots of the time were being made. It’s funny, the book was published in 2011, which was before Siri, so it feels like another time, another age. One of the main strategies that chatbot developers were using, and this is still true in some ways today, is that they would essentially be sampling shards of conversation out of some huge corpus of previous human conversations. So for example, you could just download the entire chat transcripts of some message board and then map some user input to find the place-

Robert Wiblin: Nearest match.

Brian Christian: … in that archive that’s the nearest match, and then say the next thing that a real person said in that situation. Systems built this way are at times uncannily impressive. In the book I detail my interactions with one program called Cleverbot, where I said, “What’s 2 plus 2?” It says, “4.” I say, “What’s the capital of France?” It say, “Paris.” I say, “What’s the capital of Romania?” It says, “Uh, Budapest? I don’t know,” which in some ways is even more impressive, because the correct answer is Bucharest, and so this is like an example of graceful degradation and sort of a meta-level analysis of its own uncertainty, which is extremely impressive from a machine learning context.

Brian Christian: But the problem is that in a real human conversation, you are not only getting locally appropriate answers to each particular question, but you are building a model of the other person, you’re building a conversational history. That’s then going to influence the things that happen later. And so, for example, if you interact with these programs and you say, “Are you married?” it would say, “Yes, I’m happily married.” If you said, “Do you want to go on a date?” and it said, “Sure, I’m free on Friday.” They’ll sometimes say the word “color” with the British “u”, sometimes with the American spelling. The real tell, it’s not the sense that you aren’t interacting with a human, it’s that you sense that you aren’t interacting with a human.

Robert Wiblin: It has no memory.

Brian Christian: Exactly, yeah, and no long-term coherence. One element of my strategy, for example, was to go out of my way to very self-consciously tie all of my answers together into this broader narrative of who I was. So if they said, “Nice weather, huh?” Well, the Turing test took place in Britain, so it was actually really bad weather at the time, and I remember saying, “Yeah, this is pretty crappy, but I’m from Seattle, so that’s par for the course.” Later when we were talking about music, I would reference the grunge scene or something like that, just very explicitly flagging, I’m the same guy who answered that other question. Things like that, that I could do to kind of increase the long-term complexity of the interaction, I felt like put the truth on my side.

Robert Wiblin: You mentioned that it’s kind of a different era in chatbots now. What’s changed since 2010 or 2011?

Brian Christian: Yeah, I mean for me, I think one of the most fascinating things about Turing tests from a contemporary perspective is, it has essentially become woven into the fabric of everyday life. You get an email from someone that says, “Here are some exciting discounts on Viagra,” you’re not going to reply to your friend and say, “You might want to check with your doctor before using that.” You might rather say, “You better reset your password,” or something like this.

Brian Christian: In a way, communication in the 21st century is effectively a Turing test, and when you send a link to your friend, you now have to sort of performatively put in two or three sentences so they know it’s really you and not just some automated message. And I think really the climax of this is the 2016 election, where now in hindsight, we’re looking back and saying there was a huge amount of kind of automated, insincere activity happening in social media, and people couldn’t tell the difference. In some ways, that first book of mine, I now think about it from the perspective of the citizenry of a democracy in the 21st century needs training in order to navigate online discourse.

Brian Christian: I think it’s interesting to think about the idea that if social media discourse were at a higher level, or higher bandwidth, or more thoughtful, or more articulate, then the ruse wouldn’t have worked nearly as well. It’s partly an indictment of just the poverty of the actual medium and the way that language is used. Poets have always been interested in using the language as articulately and uniquely and expressively as possible, but I think increasingly this is also a question of national security, which to me is scary but fascinating.

Robert Wiblin: Do you think that’s a losing battle, trying to detect people who aren’t real on Twitter or Facebook? I guess, on the one hand, that the bots will be getting more and more sophisticated, but then also the technology for detecting them will also get more sophisticated. But I suppose at some point, surely they would approach humanity almost exactly, and then you just can’t tell the difference.

Brian Christian: Yeah, as I understand it there’s something of an open question just in the theoretical community, people that look at GANs and adversarial examples and so forth. Will we find that the long term fixed point is advantaged to the attacker or advantaged to the defender? I’ve heard arguments from people I respect on both sides of that, and my conclusion is we don’t really know. So, yeah, I think long term it’s a bit concerning, short term I do think there are relatively simple things that one can do, I mean, even just speaking or tweeting or writing in complete sentences, rather than in broken sentences, makes it easier to find out that someone is not a native English speaker who’s claiming to be. So, there are these little things that we can do to raise the level of discourse. Longer term, I don’t know, it’s a little bit spookier.

Robert Wiblin: Seems like Twitter hasn’t been trying that hard to get rid of these bots, so they could probably make quite a lot of progress if they just put some effort in, and actually were willing to pull some of them. I think, they’ve started doing it now, under a lot of heat.

Brian Christian: Yeah, I wouldn’t mind, I mean in the way that one has verified accounts for celebrities or something. You could imagine some Turing test required to get some badge on your account or something like this.

Robert Wiblin: Yeah, it’s interesting. I think there was some discussion of them having verified it in like you would have to send in like a scan of your passport to get an account, which people hated because that would prevent anonymous whistle blowing via Twitter and things like that. Right, but I guess the Turing test would-

Brian Christian: And then they would then-

Robert Wiblin: Yeah, having a conversation like that proves you are a coherent person would work as well. Although I guess you would have one person just doing that again and again and again for many accounts, strictly.

Brian Christian: That is true, although at least, you would limit them to the throughput of one guy working all the time. But I’ve seen this even just in online gaming, I’ve had the experience personally of being on some first person shooter gamer server. An admin shows up and literally forces you to just make small talk with them and if you don’t, then they’ll kick you out. So, we’re starting to enter this uncanny valley where … yeah, again, the Turing test … I think what would shock Alan Turing perhaps the most if he were in the 21st century is that, this had become a sort of banal nuisance. It’s no longer a thought experiment, it’s just this annoying thing that we have to do time and time again in the course of a day.

Robert Wiblin: So, hopefully we are to talk about machine learning again next year when your book comes out. Let’s talk about Algorithms to Live By for now. What is that book about in broad strokes?

Brian Christian: The basic idea is, there is a set of problems that all of us face in everyday life, whether it’s finding a place to live or deciding whether to commit to a partner or deciding where to go out for dinner or how to rearrange your messy office or how to schedule your time. These often emerge as the function of limited time, limited information. We tend to think of them as kind of uniquely and innately human problems. The message of the book is simply, they are not. In fact they correspond, really precisely in some cases, to some of the fundamental problems of computer science. So, I think this gives us an opportunity—having made that identification of the underlying computational structure of human life—to really learn something by studying the nature of those problems and their optimal solutions. I think, that gives us payouts, I would say at maybe three different scales. At one level, computer science can in some cases give you just very explicit advice. Do this, it will succeed this amount of the time. In other cases, a parallel may hold more loosely but it still gives you an understanding of the structure of the problem, the structure of what optimal solutions look like, and a vocabulary for understanding the parameters of that space.

Brian Christian: I think most broadly, it’s a way to think about the nature of human rationality itself. That the problems that the world poses to us are computational in nature and this makes computers not only our tools but in some sense, our comrades. We are confronting a lot of the same issues. And computer science paints, I think, a very different picture of what rational decision making looks like than you might find in, say, behavioral economics. Because one of the first things that any computer scientist takes into account is computational complexity. Once you incorporate the cost of thought itself, I think you end up with a picture of rational decision making particularly in some of the hardest classes of problems—that looks a lot more familiar and a lot more human.

Brian Christian: So, I think it’s a more approachable and a more recognizable version—or vision, I should say—of what human rationality should be.

Robert Wiblin: How much do you think people can gain from understanding these issues in their day to day life? Do you think it’s really important that people know these different models and try to apply them?

Brian Christian: I think so. I mean, I think, perhaps the average person doesn’t need to go personally into the, wading into the technical literature and looking at specific theorems and so forth, but I think that having a basic vocabulary for, “Oh, I’m in an optimal stopping problem. Oh, I’m in a explore/exploit tradeoff.” Is very useful because these things come up all the time. I think … We can get into, in the course of our conversation, what some of the psychological studies show us about what people do by default. In many cases people’s defaults are reasonable, but I think understanding a little bit about the types of problems that we face and being able to recognize and identify when you are in that situation is a really good first step and for me having that vocabulary has been really invaluable. That there is a set of concepts that map to these things, and just literally having access to those words I think had been really useful.

Robert Wiblin: What do you think is most fun about the book?

Brian Christian: Gosh, I think there is a lot of things for me that are really fun. I mean, one thing that was really fun about writing it and researching it was getting to interview all of these different experts. The book covers a sufficiently wide swath of terrain over computer science, operations research, psychology, cognitive science, there is really no one person that’s an expert in all those things and so my co-author, Tom, and I went on an expedition to try to find the people in each of these domains that were the most well informed, the most expert in each case.

Brian Christian: It was really fun (a) getting to hear the stories behind how they discovered some of these different breakthroughs, and also getting to put the question to them of whether their research has impacted their own life and the way they think about things day to day. I would say, maybe 50% of people said, “Oh, that’s really interesting. I’ve never thought about that.” And the other 50% said, “Oh, of course. Absolutely, no question.” It was really satisfying just getting to hear those stories.

Robert Wiblin: One interesting thing is a lot of the original contributors are still alive because a lot of this was discovered quite recently.

Brian Christian: That’s right. Yeah, I know. It’s pretty incredible, I guess the original generation of founders of computer science, Von Neumann and Turing, none of them are still around. But I would say the generation after that, a lot of those guys are still alive and it’s really incredible. We interviewed Tony Hoare, who is the inventor or discoverer of Quicksort. We asked him, how did you come up with Quicksort, this incredible algorithm? And he was like, “Well. I just thought, how would I sort something? And that was the first idea that came to my mind.” I think it’s really incredible to look back on a time when the discipline was so young, that you could make this career defining discovery just by being like, “How should I sort something? Let’s try this. Look, hey, it works.”

Benjamin Todd: Wasn’t that how Gauss came up with how to sum a geometric series or something when he was a school kid?

Brian Christian: In some ways I think yeah, one feels envious for just the low hanging fruit that was around at that period of time.

Robert Wiblin: Perhaps not envious of the lifestyle they had given those discoveries they had made, but yeah.

Brian Christian: Yeah.

Benjamin Todd: And even now there’s, this is partly why this is so interesting because I think many of these ideas have been discovered so recently, they haven’t made it into our general consciousness, where maybe say the heuristics and biases literature that’s become pretty well known recently with like, I think Thinking, Fast and Slow in the last decade or so. That this is another way of research on human decision making that I think is way less widely known than that. Even maybe more important in some cases.

Brian Christian: Yeah, I think that’s right. Part of our mission, I think, in the book to some degree was to try to speed that process up, or help it along. One case was, in the context of the explore/exploit tradeoff, there are a set of ideas that emerged in computer science that have become really interesting to people who think about medical ethics. We can get deeper into that question later, but watching the FDA start to come to an understanding of like, “Wow, computer scientists have had this best practice for like 40 years, it seems relevant to this domain in which human lives are on the line. Maybe we ought to think about evaluating some of those ideas and importing them into, say, clinical trials.”

Brian Christian: So, that was an area where I hadn’t expected to in a way put on an activist hat and really feel like, oh, I can use this book to try to actually nudge that adoption process forward and say like, “Yeah, you really should look into this.”

Robert Wiblin: One of my favorite blog posts ever looks into this question of why is it that so many of the intellectual greats seem to have been from hundreds or thousands of years ago, rather than today, even, despite the fact that there are so many more people around today, and so many more academics, so many more researchers. And there is lots of potential reasons for that. But probably the key one is that there was much more low hanging fruit 2,500 years ago. You could make enormous philosophical breakthroughs just by clarifying the most ordinary concepts. Actually sitting down and doing that, but today you have to, you have to spend 30 years training to get to the frontier, and then you find something like a slightly new idea that someone else hasn’t had.

Robert Wiblin: All right, so, just to signpost where we are going, I think mostly we are going to talk about three different models, which each have three different chapters in the book. One is explore versus exploit, the next one is optimal stopping and the third one is introducing randomness or simulated annealing. Each of them, they are related in different ways, they are all about tradeoffs that you have between trying out different things and getting information versus choosing the best that you’ve found so far. They can seem to blur into one another, but we are going to explain later, I guess, try to give clear criteria for which one you would want to use in different cases.

Robert Wiblin: I think the cases we should keep in mind as we are going through would be things like choosing which profession to go into as you advance in your career from undergrad to your early jobs, to your mid-career. Thinking about what specific jobs to accept when you are on a job search at a specific moment, perhaps deciding what city you are eventually going to spend the rest of your life in or who to date and whether to get married and things like that. Are there any other … yeah, archetypal models that you think people should have in mind as they are thinking about these models?

Brian Christian: I think some of them we may just bring up in the course of it. I mean, optimal stopping is famously applicable to being in a car, where it’s generally difficult to turn around. Part of what’s interesting is there are literally physical embodiments of some of these concepts. There are also conceptual embodiments. But I think here, it maybe easier to draw that out in the context of-

Robert Wiblin: What’s between them.

Brian Christian: Yeah.

Benjamin Todd: Let’s say, in 80,000 Hours’ career guide, we cover a lot of key questions that we will face in that [crosstalk 00:21:25] such as like which problems to focus on, should you invest in yourself and gain more skills or try to have an impact right away, and one of the really big questions of the career decision is basically how much to explore versus just go with your best guess. So, the big decision you are going to have in mind is should I go down one path, become an academic, or should I try work in government or should I work in nonprofits? That’s the key question we are addressing, and I think it’s a lot that many of these models might be able to say about that question. And we are going to basically try to attack from a bunch of different angles.

Robert Wiblin: I think it’s fair to say that this question of whether people should explore more or whether they already explored too much is an uncertainty we’ve had since the beginning. We’re confident that people don’t consider enough options. They don’t put enough options down on the page when they’re just considering what could I possibly do with my life? But then whether they do too many internships or too few internships between the ages of 18 and 25 is a bit harder to say.

Benjamin Todd: Unlike if you’re working on a job and it doesn’t work out, how quickly should you switch versus marching on. Should you try several jobs or just find the thing you think is best and go pretty hard into that?

Robert Wiblin: All right, explore/exploit. What’s the classic explore/exploit dilemma? Set the scene here, Brian.

Brian Christian: Right. So, first, I’ll make a linguistic note which is, in the explore/exploit tradeoff, this is the tension between spending your time and energy trying new things, gathering information versus spending your time and energy leveraging the information that you already have to get a pretty safe, good outcome. So, in English we’ve stacked the deck linguistically towards exploration because we think of exploitation as pejorative. But we have to think about these from the perspective of computer science and treat them as value-neutral terms.

Brian Christian: So, the canonical explore/exploit problem in computer science is what is called the multi-armed bandit problem. The basic idea is, you walk into a casino, there are n slot machines, some huge number of slot machines, and you are going to be in the casino for a while, let’s say an afternoon. And this is a bit of a strange casino because some of the machines pay off with different probabilities than others. You don’t know in advance, of course, which are which. So, the problem is quite simply, how do you make as much money as possible over the period of time that you are going to be there?

Brian Christian: Intuitively, we might imagine that there’s some combination of exploring—that is, trying different machines out, seeing which ones appear to be giving you higher payoffs on average than others—and exploiting—which is, biasing yourself towards of course cranking the handle of the machines that do in fact seem the most promising. But exactly what that balance should be, and what our strategy ought to be in that situation, has this wonderful and colorful history in the field, where for most of the 20th century it was considered an unsolvable problem and career suicide. In fact the Allied, the British mathematicians during World War II joked about dropping the multi-armed bandit problem over Germany as the ultimate intellectual sabotage, to waste the brain power of the Germans.

Robert Wiblin: So, when was this question first specified?

Brian Christian: That’s a good question. I think William Thompson was looking at a version of the multi-armed bandit problem in the 1930s. That literature ended up getting kind of buried and wasn’t rediscovered until much later. It came up again in the early 1950s. It had this reputation for, as I said, being this kind of brain teaser, but not being an actual thing that you could work on. The first paper on it came in, I think, 1952 by Herbert Robbins, where he was talking about a strategy that he came up with called Win-Stay, Lose-Shift. Which just means, if you pull the slot machine handle and it just paid out, pull it again. If it didn’t, try something else. He was able to prove that that strategy is better than something all the handles at random, which is such a modest result: “Here’s an algorithm that’s better than chance.” That was as much as could be said at the time. But in some ways that was that first handhold on the problem of maybe we can actually start to getting somewhere on this.

Robert Wiblin: And if I remember correctly, it was then Bellman, who came up with the theoretically correct answer to this question, but it was not really computable, it was just too difficult to ever actually figure out what it was even if you had the formula.

Brian Christian: That’s right. So, Bellman, in 1957, comes up with his famous idea of dynamic programming, which involves working backwards from the end and saving or memoizing different solutions of these possible endings and then using them to work your way backwards towards where you are now. Which is quite ingenious and is this incredibly important technique even today. But in the context of the multi-armed bandit problem, it relies on a few assumptions, that make it not really ideal in practice.

Brian Christian: It does require a lot of computing, it requires that you know advance exactly how many machines there are, how many times you are going to pull the handle total, things that may not be realistic or may not be useful in a practical real world situation.

Brian Christian: So, it’s a funny history because in some sense you get the definitive solution to the problem in 1957, on the other hand, it leaves open, and it’s sort of unsatisfying for all these different reasons.

Robert Wiblin: So, we got the first really practical solution from Gittins, I think in the 70s or 80s? Is that right?

Brian Christian: That’s right, yeah.

Robert Wiblin: Do you want to describe his approach?

Brian Christian: Yeah, so, there is this lovely story, I mean, I think, one of the things that I just love about the history of mathematics in general is sometimes people think they’re solving a very specific problem, and what they come up with has this level of generality that they don’t even anticipate. John Gittins, in the 1970s, he’s now a math professor at Oxford. At the time he was doing some consulting for the Unilever corporation. They wanted to know, basically, how they’ll allocate their money across different projects.

Brian Christian: So, you have pure research and development of new drugs, you also have marketing of profitable drugs, how much of our budget should we spend on one, how much on the other. Gittins immediately recognizes this as being kind of like a multi-armed bandit problem. Where you have these different levers you can pull, you don’t know in advance how well they’ll pay out. There’s a particular twist here, which I think is quite fascinating.

Brian Christian: Gittins is thinking about it from the perspective of the Unilever corporation, which wants to exist, theoretically, forever. They are not interested in maximizing their revenue over any particular time period, but indefinitely. At the same time, it’s better to have that money now than later.

Brian Christian: So, he approached the problem saying, instead of there being some finite sequence of rewards, what if there is an infinite sequence of geometrically discounted rewards? So, if a dollar tomorrow is worth as much as 99 cents today, and that extends all the way into the future, is there a way that we can think about the problem in this context?

Brian Christian: It was really fascinating thinking about how he approached the problem, because he sighed and thought, well, unfortunately we all know that the multi-armed bandit problem is unsolvable, but let me at least think of about what would give me a good approximate answer. And he comes up with this strategy that we now know as the Gittins index, which basically says, for each machine, imagine a guaranteed payout so good that you would never play that machine even one more time. For every machine there is some price that you would rather just take that reward again and again and again than even try the machine once.

Robert Wiblin: Try another machine once?

Brian Christian: Yeah.

Robert Wiblin: A different machine.

Brian Christian: Yeah. And he called this the dynamic allocation index; we now know it as the Gittins index. His thought was, well, you could just calculate that independently for each machine, it wouldn’t depend on which other machines existed. You could just play the machine with the highest Gittins index. He thought, “Well, this might be a reasonable approximation to the problem.” And then, to his own surprise, this is the solution to the problem. So, I think that’s this wonderful, again, I’m just … These mathematicians following their instincts and saying, humbly, “Well, here’s an idea, let’s try it.” And it turned out to be the answer.

Brian Christian: So, this is another case where the Gittins index is the gold standard for dealing with the multi-armed bandit problem, with infinite discounted rewards, geometrically discounted rewards. And yet there are still reasons that we may not want to use it in practice. For one, it relies, like I said, on geometric discounting, which there are a number of studies which suggest humans don’t do, although perhaps should. So, if you are doing hyperbolic discounting then you are in a different paradigm. It deals with this idea of infinite rewards, which may or may not be applicable to a particular situation. And lastly, it’s just non-trivial to compute the Gittins index for every given machine. It’s hard to do it in real time.

Robert Wiblin: There’s an awesome table in your book.

Brian Christian: Yeah.

Benjamin Todd: So you point out, if you are at a restaurant and you get that out and try to say, “Well, we’ve had seven good meals here so far and two bad. Now, should we switch to another restaurant next time?”

Brian Christian: Right. Exactly.

Benjamin Todd: Your friends will probably stop listening long before that point.

Brian Christian: Yeah, we encourage you to cut out this table and carry it in your wallet, but of course you and your friends also have to agree on the discounting function.

Benjamin Todd: So, yeah, what are some more rules of thumb solutions to the multi-armed bandit problem that someone might be able to kind of bear in mind in a real life situation?

Brian Christian: Sure. I think there are a few big picture ideas here. One of the key ideas as I see it is, if you are dealing with the finite horizon case, then one of the things you see by looking at the exact solutions that dynamic programming offers you is that you should basically front load your exploration, and do the bulk of your exploitation at the end. This makes sense, I think, for three different reasons.

Brian Christian: The first is that odds that a new machine you try is better than the best one you already know about, can only go down as you get more information. An analogy that I like to use is, if you have taken a work transfer to Spain, and you are going to be there for a year, the first restaurant you go out to, the very first night you are in Spain, is guaranteed to be the best restaurant you’ve ever been to in Spain. The second restaurant you try has a 50% chance of being the best restaurant that you’ve ever been to in Spain. This of course goes down as a function of your experience.

Brian Christian: So, the chance that trying a new thing will yield something better than what you already know about can only go down. What’s more, the value of making that discovery can also only go down over time. So, if you find an incredible restaurant in your last week in Spain, that’s strictly worse than finding that restaurant on your first week in Spain. Both the chance of making a discovery, and the value of that discovery, can only go down over time. On the other hand, the value of just doing your favorite thing, or going with the best option, can only increase over time—again, as a function of your experience.

Brian Christian: For all of those reasons, it makes sense to think about ourselves as kind of on this trajectory from exploration to exploitation, as a function of where we perceive ourselves to be within this finite horizon. What I think is really interesting about that idea is that it offers us a way of thinking about the human lifespan at its broadest level. And we are seeing for example, psychologists like for example Alison Gopnik at UC Berkeley drawing on the technical literature of the explore/exploit tradeoff, to make the argument about infant cognition, saying, you know, there’s this huge body of evidence that suggests that infants are highly random, they have the huge novelty bias. They always want to look at an unfamiliar object. No matter how carefully you’ve chosen their Christmas gift they’re just relentlessly interested in the next thing and the next thing and the next thing and the next thing … And it can be tempting to view this as, kind of, just a failure of willpower, or attention span, or that kids are just this kind of defective version of adults. In fact, you can appeal to the explore/exploit tradeoff and make this argument that, no, these kids have just burst through the doors of life’s casino. There are machines everywhere. They’re going to be there for 80 years. They really should begin their process by just flailing around, pulling those levers at random—you know, putting every object in their mouth at least once.

Brian Christian: So, we can think about the stigma of child cognition as actually being the optimal strategy given where they are in that finite horizon.

Benjamin Todd: So, okay, as a general principle we want to explore more fully and then move more towards committing and using what we already know, which we’re calling exploit. Can we know get a little bit more quantitative about that? How much should you explore early verses searching to exploit? Like, when … Suppose you’re on a two week holiday, like, how many days might you explore and then exploit?

Brian Christian: Yeah. I would love to be able to give you a specific threshold. I feel like it probably depends on how many restaurants there are in that town and exactly what the distribution of their food quality is that you’re drawing from. So, I mean, this is one of the problems with dynamic programing is that we might have to actually crunch the numbers. But, I think more broadly there’s this idea that front loading your exploration strictly, you know, so that the first x number of nights you only try new things. And then after some point, you only do the best thing. That’s an algorithm that’s called epsilon first. It turns out that epsilon first has this particular downside, which is that it offers what’s called linear regret.

Brian Christian: This kind of takes us from the 70’s to the 80’s, and the next big breakthrough in studying the multi-armed bandit problem came from Herbert Robbins, again, 30 years after his initial discovery. He’s back to advance the plot again with one of his collaborators. And they were able to frame the multi-armed bandit problem in the context of what’s called regret minimization. In every human life we have this idea that we want to minimize the number of regrets we have in the future. In the context of the multi-armed bandit problem, that has this beautifully explicit form, which is, your regret is all of the money that you left on the table. All of the money that you could’ve made, if only you knew at the beginning, everything that you knew by the end.

Brian Christian: Robbins and Lai looked at this question of, if you’re following the optimal strategy, what’s the best you can do with regards to regret? What they found is that using the optimal strategy, your regret will grow logarithmically. So, this is kind of a good news/bad news thing. You know, the bad news is, even if you’re doing the optimal thing, you will continue to leave more and more money on the table. You’ll still be making mistakes. But, the frequency and intensity of those mistakes will flatten gradually over time. So, this gave theorists another tool in their toolbox for thinking about how to approach the multi-arm bandit problem. Which is to say, we know that the best case scenario is that we can have strategies that offer logarithmic regret.

Brian Christian: What are simpler strategies, than, for example, the Gittins Index, that still offer that still offer this really nice property? Earlier, we were talking about epsilon first, which is the strategy that you explore for a fixed period of time and then exploit every, you know, forever more after that. So, the reason that that strategy is linear in its regret is that, the amount of exploration you did gives you some fixed chance that you are wrong in identifying the best slot machine and the best restaurant. At the limit as n goes to infinity, there’s just a percentage chance that you make the wrong decision forever after that. That’s your linear regret.

Robert Wiblin: So, every round, you regret goes up by the difference in the average between the one that you chose to pull forever and the optimal one that you could have chosen to call. So it just keeps on growing, then?

Brian Christian: It just keeps, yeah … every single pull is another small burden to bear, right. There’s been a lot of really exciting work starting in the mid 80’s and continuing through the 21st century of trying to identify simple intuitive strategies that offer this guarantee of logarithmic regret.

Benjamin Todd: And, yeah … what’s the …

Brian Christian: What’s the … yeah, so what are they? One of them is called epsilon decreasing. So, if you have a certain fixed chance that you are going to try something random and explore, but that you slowly decrease that percentage according to some kind of schedule, then you can prove that this strategy proves logarithmic regret.

Benjamin Todd: The way the strategy would work is… suppose you have a process, which is like, 80 % of the time I’m gonna pull a thing that I think, pull the lever that I think’s best right now and 20 % of the time, which is the epsilon, you’re gonna just pull a random lever?

Brian Christian: Yeah.

Benjamin Todd: And then you slowly decrease that percentage as you go on?

Brian Christian: Yeah, that’s right. So, let’s see, there may be specific technical results about your pulling schedule in order to achieve that result. We can direct interested readers into the technical literature on that. The basic intuition is, yeah, if every day you start with some fixed chance of, let’s say 20 % that you’re gonna try something random. But, every day that fixed chance goes day, let’s say it’s multiplied by .99 or something.

Benjamin Todd: Mm-hmm (affirmative)…

Brian Christian: Then, this is the kind of strategy that avoids the pitfalls of epsilon first because there’s always some chance—now, granted it will dwindle, of course, in the long run—but there’s always some chance that if you’ve made a mistake in identifying which is the best machine, you’re still leaving the door open a crack to getting new information that could change that. But, of course, you are sort of tapering that down, in some ways appropriately, as you’re gathering more and more information which makes it less and less likely that you have made a mistake.

Robert Wiblin: So, how does that compare with upper confidence bound algorithms, which you spend a fair bit of time on in this chapter?

Brian Christian: So, one of the other strategies that’s very simple and intuitive, but also offers this property of logarithmic regret, is what’s called upper confidence bound. The basic idea here is that you compute a … what’s called a one-sided confidence interval for each of these machines. For people with statistics background, you’re used to seeing the error bars above and below a quantity, you know, on a bar chart. What’s interesting about upper confidence bound is it says, we’re not actually interested in the expected value of the machine. And we’re not interested in the lower bound. We’re only interested in the upper bound in how good it could be. And so you just always play the machine with the highest upper bound. This is an idea which I think elegantly synthesizes exploration and exploitation because something that you have less information about is going to naturally have wider bounds. As you learn more and more about it, those bounds are going to tighten. I think it’s a really sort of beautiful way of synthesizing, both the idea that we want to optimize for quality, but we also want to optimize for information, and bringing that together into a single idea.

Brian Christian: I think it’s also, there’s just something kind of poetic about the idea that it’s essentially the rational case for optimism; that you are only interested in the reasonable best case scenario. In some ways, you almost don’t even care about what you expect will happen.

Robert Wiblin: Mm-hmm (affirmative)

Brian Christian: And I think there’s a principle there which I just kind of find encouraging. It’s one of these results that you feel, sort of, happy, knowing that that’s the case.

Benjamin Todd: And wait, so … yeah, if we zoom out a little bit, you can imagine, you’re about to pull one of the levers. Your best guess is that every time you pull a lever you get $10.00, say. So, that’s the expected value of the lever. But then, you’re saying, now you want to think about, what’s my upper 10% confidence interval? So, I think, maybe there’s a 10 % chance that I actually got $15 from this lever. If it turns out to be better than my best guess. Then, you want to do that for all the levers and go for the one where you think there’s your kind of, 10% level is actually highest, rather than what your best guess is highest.

Brian Christian: Yeah, that’s right.

Benjamin Todd: And, I mean, I guess there’s a tension in the literature on whether you should be kind of using, like, 10% confidence interval or 50%, or …

Brian Christian: Yeah, the paper where they, kind of, make the proof that this is regret minimizing uses what’s called the Chernoff-Hoeffding bound. So, I can give you the exact prescription, which is, you want to play the machine that maximizes your expected value plus the square root of two times the natural log of the total number of handles that have been pulled divided by the number of times you’ve pulled that handle.

Benjamin Todd: So, the number of times …

Robert Wiblin: I see why you did not put that in the core of the book.

Brian Christian: Yeah, some of this stuff ends up buried in endnotes for a reason.

Robert Wiblin: Yeah.

Benjamin Todd: This is the number of handles you pulled in the past so far?

Brian Christian: Yes. So, it’s the square root of a fraction … the top of the fraction is two times the natural log of the total number of pulls in the casino divided by the number of pulls of that specific machine.

Benjamin Todd: And that’s not the number of arms, it’s the number of past pulls you’ve done so far.

Brian Christian: That’s right.

Benjamin Todd: Okay. And so generally that means that the confidence interval is, you’re using a narrower one over time?

Brian Christian: Yeah. And so that’s gonna go down …

Benjamin Todd: Which is, again, the same characteristic we just covered.

Brian Christian: Yeah, exactly. I mean, that is the specific bound that they used for their proof, but I think the intuition is …

Robert Wiblin: Is pretty clear.

Brian Christian: Yeah, and it withstands you using different statistical measures of upper confidence. I mean, I think it’s also just an intuitive idea of when you’re in a situation, what is a reasonable best-case scenario? You know, if I go out to dinner, the reasonable best-case scenario is not that my dinner companion gives me a million dollars, but it might be that they give me an idea for a book that I am going to write. So you know, I think some of these things cash out into more intuitive notions of what the upper confidence interval would be, but as an idea I think that it’s pretty robust and kind of suggestive across a much broader swathe.

Benjamin Todd: And like, thinking about careers again a little bit, the idea just on that intuitive level would be consider the career which might plausibly turn out to be best rather than your best guess on which one is better?

Brian Christian: Yeah and I …

Benjamin Todd: So, if you’ve got two, which are kind of maybe roughly think they’re similar, but one you could see that could be this amazing scenario and the other one doesn’t have that amazing scenario thing than you should probably try out the amazing scenario one first.

Brian Christian: Yes.

Benjamin Todd: And is the intuition behind this optimism kind of heuristic? That one way of seeing why that makes sense is that if you do the optimistic, kind of, the thing that’s plausibly best, that that turns out not to work, then you can just switch to something else.

Brian Christian: That is exactly right.

Benjamin Todd: Then, if it turns out to work then you’ve made this amazing discovery. You’re now on this really good path and you can just carry on with that. So, like, it pays to be optimistic earlier because it might let you play out this amazing thing that you would’ve missed otherwise.

Brian Christian: Yeah, that’s exactly right. Yeah, your point that the costs are limited is, I think, an important subtext here. So, in the classic version of the multi-armed bandit problem, the machines either pay out some fixed amount or they pay out zero. So your losses are bounded. You know, if you’re in a world where the machine might explode and kill you and then you can’t continue gambling on anything, then you probably do want to consider the lower end of the confidence interval.

Brian Christian: So, it’s partly a function of the nature of the canonical multi-armed bandit problem that if you put a dollar in the machine that your losses are bounded at $1.00. It’s also one of the assumptions of the problem that you can just effortlessly walk over to the next machine. So, in some ways, the maximum cost for trying something and concluding that it was a waste of time is $1.00. In reality, of course, it may take you much more than a single metaphorical pull of the lever to determine that a career isn’t for you, or it may be more difficult to switch back to what you were doing before, after you’ve left an organization or something. There are versions of the multi-armed bandit problem that include what are called switching costs that add friction to these things. We can include some links if people want to go into that literature, too. That’s one of the variations that the book considered.

Benjamin Todd: Yeah, that would be very interesting. I mean, also, the issue of lower confidence interval might maybe be important is maybe a bit separate. Because you say, all you’re losing is the money you could’ve gained on that lever, but in a real career decision, you can actually lose more than you put in. So, you go into a job and you turn out to hate it. You get depressed and burned out. Then, you’re actually in a worse position than where you started rather than just getting zero instead of some positive payoff.

Brian Christian: Yeah.

Robert Wiblin: You’ve gone backwards.

Benjamin Todd: Or maybe even we’re thinking more on a social impact. Whenever you can imagine … in some areas it’s easy to make things worse rather than better.

Robert Wiblin: Mm-hmm (affirmative) right.

Benjamin Todd: So like trying to do policy change, it’s very easy to have unintended consequences at that. You might actually again make this area of the problem you’re trying to work on worse rather than better. So, that was one of our questions. How would you factor in, kind of, you could get negative payouts rather than just 0 or 1 payouts.

Brian Christian: Yeah, I think that’s an important question. So, I tend to make the reverse argument. Just thinking about an individual employee trying to decide what career is best for them. One anecdote here is, a good friend of mine was an engineer at Google and he was trying to decide whether to leave Google and start a startup. His manager said, well, you know, you’re on this great trajectory, you’re making all this money, do you really wanna try something that will in all likelihood fail? Then where will you be? He said, well, come on, you and I both know that if I fail and I come back to you in 18 months time and want to rejoin your team, are you gonna say no out of spite? And the manager was forced to admit that, no, in fact, he would gladly take him back at his, you know … existing salary if not more, and so forth … I think that’s an example where people can get a little bit spooked and perhaps overrate the downside.

Brian Christian: So, stepping away from a job for a year, crashing and burning in the startup game and then getting right back in where you left off. I think that’s really an argument for being willing to take that risk. In fact, I counseled my friend, literally with the explore/exploit literature, and said, I think you should really pull that new lever. I think from an employee’s perspective it makes sense to be fairly optimistic. I think most people, in my experience, if anything, are not optimistic enough. I think from an organizational perspective, especially if you’re doing some kind of major intervention that could have some huge unintended consequence, you know, you go into some country and you give everyone free wheat, but then you destroy the local wheat economy or something like this. That’s certainly a case where you’re in something that probably doesn’t really represent the multi-armed bandit problem at that point. You’ve unintentionally imploded the casino. Or something like this … You’re probably closer to something that’s an MDP and that’s a whole other kettle of fish.

Benjamin Todd: What does that stand for?

Brian Christian: It’s Markov Decision Process.

Benjamin Todd: Okay.

Brian Christian: So, an environment where the actions you take change the state that you find yourself in. So, one of the nice things about the abstraction of the canonical multi-armed bandit problem is that your actions don’t really do anything to the environment. Like, you’re … you get x money or not, but then you’re right back where you found yourself. In something like a Markov Decision Process, you know, you take some action and … if you think about an Atari game as a Markov Decision Process, you use some power-up and now you don’t have it. So, you’ve changed the set of options that you have or the state that you’re in. That’s just an even more complicated domain. So, I think identifying … this goes back to this question of trying to identify the situation that you find yourself in and asking yourself if this feels like a multi-armed bandit problem, then, let me kind of painlessly and cheerfully explore a bit because the downside is capped.

Benjamin Todd: Yeah, I mean, I totally agree when you’re thinking about normal career decisions. People maybe don’t appreciate that the downside is relatively capped and it’s okay to explore more than people often do. But, yeah, I think it’s when you start to think about some more of these social impact issues. You could imagine often dealing with these cases where there could be like, really good upsides or significant downsides if you’re like, finishing through a major policy change or something like that.

Brian Christian: Yeah.

Benjamin Todd: It’s more like a multi-arm where the machine could actually like, force you to pay. You pull it and it’s like, 9-10 and now you like, owe the machine money.

Brian Christian: Yeah, exactly.

Benjamin Todd: Or it could be like plus 20 or something.

Brian Christian: Yeah, I mean, the other thing that’s worth unpacking here is that the other assumption in the multi-arm bandit problem is that you get the feedback immediately. It’s not like you pull the handle and ten years later, a check comes in the mail, it’s for 10 cents or whatever. This is something that may or may not be true in a lot of situations, right? So, one of the reasons that tech companies really like multi-armed bandit algorithms is, for something like ad optimization, you show an ad, the user clicks on it or not, and you’ve gotten that feedback immediately. So, you really can model that as a multi-arm bandit problem. The feedback is instantaneous. So, you can really adapt and adjust your ad probabilities basically in real time.

Brian Christian: So, as some of these ideas I mentioned earlier are making their way into the medical literature. In something like a clinical trial, the first clinical trial to use what’s called an adaptive method, which is basically, you’re changing the percentage of people that are receiving the experimental drug versus the conventional drug in real time, rather than waiting until the end of the trial. The first case in the medical literature that used this was for something called ECMO. This was back in the 1980’s. Infants that were going into pulmonary arrest and their lungs were stopping. The conventional treatment was really bad. It had only worked I think something like 60% of the time. So, someone got this idea, we want to try this new crazy experimental technique called ECMO. We think it could work considerably better. It could also be a total disaster. It has a risk of embolism and all of these things.

Brian Christian: One of the reasons that, just from a formal perspective, it made sense to use some of these multi-arm bandit algorithms, is if someone goes into pulmonary arrest, you either save their life within five minutes or they die within five minutes. You know, obviously it’s a tragic scenario to have to deal with, but in some sense there’s this mathematical silver lining, which is that, it makes it much easier to rapidly identify whether some new technique is better or worse than the status quo. You don’t have to administer it and then track these people longitudinally over the rest of their lives. So, this is another one of these parameters where you can sort of identify, am I getting immediate feedback? If so, then this is more like a multi-arm bandit problem. If I’m not, then I may want to adjust my strategy and not rely so heavily on that framework.

Benjamin Todd: Okay, so you’ve covered a couple of different complications. One is that you might sometimes have negative payouts. Now, we’ve just covered, you’ve gathered some kind of imperfect information, imperfect feedback, might take them several years to really figure out how a certain path unfolded. And you also mentioned just earlier you’ve got switching costs.

Brian Christian: Right.

Benjamin Todd: Where in practice and real life you can’t just switch between the arms, you have to like, do a whole job application process, which takes you months.

Brian Christian: Yeah.

Benjamin Todd: I mean, do you have any intuitions about how these might affect which strategies are best? I mean, my guess is it’s gonna generally mean you should do a bit less exploration because the costs of exploring are higher. You’re getting less information and you’re less able to use the information because you have to switch.

Brian Christian: Yeah, that’s exactly right. So, the intuitive answer is that’s exactly right. So, the higher the switching cost, the more reluctant you should be to abandon an option even if it seems like it’s not working. Or, the more reluctant you should be to try something frivolously because you’re gonna pay that switching cost twice. Once to go in and another time to get out. Another thing … another assumption that is kind of underneath this whole conversation is that the quality of the options that we’re evaluating is static. The restaurant doesn’t fire their chef and get a new guy who’s not as good. Or the company doesn’t lose its way or change their management or whatever.

Brian Christian: So, when the payout probabilities of these different machines can change, then you find yourself in what’s called the “restless bandit problem.” Which is NP-complete and it’s … there’s no effective solution that’s gonna get you there all the time. For me, there’s an interesting footnote here, which is that, people actually seem very good at dealing with restless bandit problems in practice. Here’s a case where the computer scientists are in fact turning to the cognitive scientists and saying, how are you guys modeling the human decision making process? Because it seems that people have a really good heuristic for dealing with this, like, known intractable problem. We’d love to know what it is because that’ll give us some insights that we can use in a purely computational context.

Benjamin Todd: So, if I remember from the book, when you give people multi-armed bandit problems in a lab, they actually explore too much?

Brian Christian: Yes.

Robert Wiblin: Mm-hmm (affirmative)

Benjamin Todd: Which I found very surprising because normally it seems the general theme in this kind of literature is like, people don’t really explore enough, they stick with the status quo, they have sunk cost fallacy … but actually here, they should’ve just carried on pulling the best guess lever and they kept switching.

Robert Wiblin: I’ve got major objections to that experiment.

Brian Christian: Oh, really? Okay, great.

Robert Wiblin: Yeah, but maybe set it up first.

Brian Christian: Okay, so yeah, let me tee it up.

Brian Christian: So, one of the canonical experiments in this area was done by Amos Tversky in the 1960s. The basic idea is that you have a box with two different lights on it. You have an option to press a button and either observe which of these two lights comes on; or make a bet on which of the two you thought was going to turn on, but you don’t get to observe it. So, you don’t know until the end of the study whether your bet paid off or not. And I think these lights, one of them lit up 60% of the time, the other went 40% of the time. I believe participants were told that, but I’m not 100% sure about that. So, the basic idea is, again, how do you want to maximize your total take, your total earnings over, I believe in this case, a thousand trials? It turns out that the optimal strategy is, observe the first 38 times, and then blindly make a series of 962 bets on whichever light happened to have come on more in those first 38 and then you’re done.

Brian Christian: Is that what human subjects did? No, not even close. People would observe for a while, bet for a while. Observe a little more again and then bet a little bit more again. I wanna say, on average, people observed 500 times instead …

Robert Wiblin: 505.

Brian Christian: 505?

Robert Wiblin: 505 out of 1,000. Yeah.

Brian Christian: Yeah. And so this is a case … I mean, my read on this is that the participants were told that these probabilities were fixed, but for me to be a bit more sympathetic or charitable towards the subjects. They knew they were in a psychology experiment. There’s a long story history of being lied to by experimenters in psychology studies. So, they didn’t necessarily want to take the experimenters’ word for it. So, they were effectively acting as if they were in a restless bandit problem where let’s say the payoff probabilities are on a random walk, they can go up and down.

Benjamin Todd: That’s what I was just thinking. Maybe people explore more because they think the probabilities might be changing.

Brian Christian: Yeah, so … that’s one way to model the data that they saw. Was, people were establishing a certain level of confidence that enabled them to switch into this betting mode, but as time went by, their uncertainty started to grow. Once it hit a certain threshold, they gathered a bit more information.

Benjamin Todd: And it makes sense because real life is a restless bandit problem rather than a multi-armed bandit problem. Because especially in careers, like, the landscape is always changing. So, maybe our intuitions have evolved more to deal with that one rather than the more kind of artificial where everything’s stable situation.

Brian Christian: Yeah.

Robert Wiblin: So, this experiment is terrible. That’s one of the issues, I think. Yeah, they said it’s stable, but they might not … even if they believe them, all their intuitions about how much to explore and how much to exploit are based on life where things are changing all the time. So, it’s impossible for it to get through to their intuitions. Even if on an explicit level, they kind of believe it; but that does explain why they alternate between exploring and exploiting rather than just like, dual explore and then all exploit.

Robert Wiblin: But the much more severe issue is that in this experiment is that while you’re exploring you didn’t get any benefit. You couldn’t draw any benefit from the levels that you’re pulling. Which seems very artificial, not like a typical case at all. If you were able to derive the benefit which was only between like, 40% or 60%, depending on the lever, that reduces the cost of exploration so much that it wouldn’t surprise me that if this, like, 500 out of a thousand sample wouldn’t, while you’re exploring, wouldn’t be kind of reasonable. Again, like, people’s intuitions all gonna be about cases where, while you’re exploring you derive benefit.

Brian Christian: Yeah.

Robert Wiblin: Then, like, another issue is that they made the differences between the levers really small so it’s like 40% versus 60%. Like, in real life, things like are often more varied than that. So, people’s intuitions again are more in favor of exploration because the difference of the odds are so limited.

Brian Christian: Right.

Robert Wiblin: Oh, and another objection is that …

Brian Christian: Keep going, keep going …

Robert Wiblin: Is that people maybe just enjoyed the novelty of trying the levers and exploring rather than pulling it and not seeing any response at all. Because in the exploit phase you don’t find out whether you’re benefiting. Whenever does that happen? It’s all so artificial. You’re … so you’re exploiting and you don’t even find out whether you’re winning? That’s like, very weird. I think this whole thing was … the deck was completely stacked to produce this surprising result. But I don’t know we can learn anything about the real world from this difference between the theoretical optimum and what people do.

Brian Christian: Yeah, I think that’s all right. I mean, you know, imagine a stock market in which as soon as you buy a stock you cease to know that value of that stock. I mean, it’s just very strange.

Robert Wiblin: Right, yeah.

Brian Christian: It’s … I would be hard pressed to even think of an analogy. I think also, the value of exploration, even in this context, carries over beyond the walls of the game itself. If you were expecting that you might be asked to do a different version with a different box, then any understanding that you gained in the first condition might be useful in subsequent conditions. So, I mean, in general, I think there’s a lot of evidence that humans and animals are designed to get pleasure from learning how things work. So, it makes sense that part of what you want to do is be like, okay, I’m in this new environment, this new contraption. I don’t really know how long I’m going to be in this situation or how many other similar situations I’ll be in, so let me just try and figure out what’s the deal. That seems totally reasonable.

Robert Wiblin: So, the ideal strategy is that the person should sit there and press a button without getting any feedback 962 times in a row. That sounds very boring and I think maybe…

Brian Christian: Yeah.

Robert Wiblin: Would you actually do it? You’re literally just pressing a button. Yeah, I dunno. So, this is an objection that people have to the biases literature more broadly. They set up these incredibly artificial scenarios where the deck is stacked towards people’s intuitions about the cases being bad and then they’re like, oh, people don’t do the theoretical optimum in this stupid game that I created to engineer that result. It’s not always like that. You could take that criticism too far.

Robert Wiblin: But many listeners know there’s this whole other school called the Heuristics. So, there’s the biases school and heuristic school. The heuristics people … so actually people are incredibly good at answering these very complex questions in a good enough way and that the people who are focusing on how we’re biased are like, picking up edge cases, particularly unusual cases where people’s intuitions, the heuristics they’re using, don’t work. But those are the odd cases rather than the raw.

Brian Christian: I think that’s right and I would just add, too, that there’s this second argument that I think resonates with the heuristics school. So, there’s one argument you could make which is just yeah, evolution kind of tuned our parameters for a certain type of environment. Surprise, surprise, when we’re in a totally different environment we don’t do the right thing. So, yeah, that makes sense to me. In general, computer science has a lot of what’re called no-free-lunch theorems that basically say, if you optimize for a given environment, you will necessarily be worse on other environments that aren’t like that. There’s often no way to improve uniformly across all environments.

Brian Christian: There’s a separate argument, I think, that also goes in the same direction, which is simply, you are paying a cost to think; you’re paying a cost to deliberate, to hesitate. So, part of what we’re trying to do at the broadest level in this book is paint a more recognizable picture of rationality that takes computational constraints into account and says, once you start to think about information processing itself as a cost, you end up with a notion of optimality that does look a lot closer to some of these ideas that come out in a heuristic context.

Brian Christian: There’s this concept in experiment design called information leakage, which is basically, the subjects gleaned more than we strictly told them. It’s very difficult to actually grapple with that. We interviewed one researcher who studies optimal stopping problems in human subjects and he says, “Yeah, it turns out that our subjects were just getting bored. It’s not irrational to get bored, but it’s hard to model that rigorously.” I think, you know, in general, when there’s a conflict between our models of what decision making should be and what people actually do, we have a choice. We can say, oh, people are stupid or irrational or that they have these heuristics that are tuned for a different environment. Or, we can say we have the wrong model, we have an incorrect formal description of what it is that these people are doing or the problem that they think they’re solving.

Robert Wiblin: So, if I could just recap with the explore/exploit section. So, we talked about the Gittins index, then epsilon decreasing strategies, and then kind of a variant on that is the upper confidence bound algorithms, which is kind of appealing because it seems like it will be easier to apply in everyday life; to think about, you know, what would be a very good case here. Not the very best imaginable, but a very good case and then always go for the thing that has the highest, very good case. At least early on in your life, maybe later in life not so much. You need to be more reasonable, more realistic. Then, there’s various different issues that arise when thinking whether these models are good description of real life. So we’ve got the discount rate.

I’m not too bothered by that, because I think at least early in life people probably should just have a geometric discount rate, then they would have to choose what that discount rate is. Discount rate’s one. Then you’ve got a question of not really knowing how long or like how many pulls of a lever you’re going to get in your life, like, yeah, how long do you have to spend at a job before it counts as like a pull of the lever and you’ve got the measurement, so there’s a bit of like arbitrariness there.

Robert Wiblin: Then you’ve got some more severe issues that nothing changes in these environments, and like we don’t have simple algorithms once things start changing, which is how the world is, and also you’re not changing the environment at all, which in some cases would be quite important in real life. Uh, you’ve got switching cost potentially, so changing job is difficult, whereas that’s–although that seems like you can modify the algorithms, it sounds like, to account for switching costs, but we’ll have to look those up …

Benjamin Todd: Explore a little bit less. Yes.

Robert Wiblin: Yeah. Explore a little bit less is the rule of thumb there. Then you’ve got, it seems like some of these describe cases where you pull a lever and you either get one or zero, or I guess like one or minus one, because you have to pay or something to pull a lever.

Brian Christian: Yeah, this is called the Bernoulli Bandit, by the way, if people are interested.

Robert Wiblin: Yeah.

Brian Christian: You either get zero or one. Yeah.

Robert Wiblin: But in real life we talked about how like there’s downsides, not only upside, but that doesn’t seem like it’s so severe because we just shift the distribution. Yes, you’ve got like, one or zero are the outcomes. But you could, imagine it could be a normal curve, a normal distribution of outcomes, or perhaps a log-normal distribution, so like much more spread out. Or it could be power-law distributed, so like very massively different outcomes depending on the option they choose. And all of those mean there’s more variance in the outcome, so you have to explore more, and also there’s like more risk of choosing one early that misses the top tail, misses the one, in fact has the highest expected value, but you didn’t realize that because you didn’t sample enough to pick up the upper-best tail, or potentially the lower-terrible-case tail.

Brian Christian: Right.

Robert Wiblin: So … And then. Another one is that all of these are kind of modeling you as coming in with no information, so perhaps you just have a uniform prior-belief about the possible different outcomes. That the levers have … Whereas in real life, almost everyone listening to this is going to be at least sixteen, and they have kind of a model of the world, of what’s the plausible distribution of outcomes of different actions that they can take.

Benjamin Todd: Yeah, but you’re factoring that in like, say with the upper confidence interval one, you’re using everything you know at that point to make your best guess at what the upper confidence interval is.

Robert Wiblin: Yeah. I agree in principle it’s incorporated here, but we haven’t really talked about what things would look like if you’re like seven-hundred through an eight-thousand draw. Like most of the tables describe, like after three pulls, like when you’ve got, say, two wins and one loss, whereas in real life it seems like we have much thicker information than that–we’re very rarely coming in blind, and so it might be better to model it as like a Bayesian issue where you have like a prior and you update based on each pull, which I think may well end up resembling solutions I’ve got in here anyway.

Brian Christian: Yeah, I mean of course, yeah in real life you’re … You’re making judgements not only about the machine you’re pulling, but also about the nature of the game itself that you’re playing, so if slot machine A pays out much less well than you thought it would, you might start to extrapolate and be like, “oh, maybe slot machines just aren’t as good of an investment as I thought they were.” And you see this in people who are very superstitious about gambling, where they’re sort of promiscuous in what they attach their success and failure to. They get some payout, and then they update their priors on the value of wearing their lucky baseball cap, but also the value of it being 12:04 PM with the sun at this angle, and being at this particular machine. And so, yeah, I think all of these things, needless to say, point towards the enormous complexity of what real-world decision making actually looks like in most cases.

Benjamin Todd: Just with the, the restless bandit problem which is where the payoffs are changing at the different arms. So could I just recap that you were saying that, actually, people are almost better at doing that than these simple algorithms we’ve developed? Or … ?

Brian Christian: Yeah. I mean the restless bandit problem is what’s called intractable, which means that there is no efficient solution to the problem—“efficient” has a technical definition which we can look into if you want but people seem in a way untroubled by the daunting formal complexity of the problem and they just do stuff. And the stuff they do seems to work. And this, I think, has created a certain amount of interest in the computer science community of trying to figure out “how can we characterize, you know, a computational model of what people are actually doing, and is there a rigorous way to analyze just how good their instincts actually are? Can that lead us to, ideally, some sort of algorithmic breakthroughs that we can then use in practice?”

Benjamin Todd: But are there any rules-of-thumb about ways we can modify some of the things with the algorithms we’ve seen earlier? That would still get you like a better-than-random payoff when doing restless bandit problems? I mean, it sounds like one thing again is like you should be a little bit more keen to explore.

Brian Christian: That’s certainly true. So the more, if you think about this at the, in the limit of a completely random environment, then you might as well just pull the handles at random …

Benjamin Todd: Yeah

Brian Christian: … If the payouts are just jumping all over the place. And so, yeah, in general it is true that the more volatile the environment is, the more restless you should be yourself, and not settling for something and not kind of continuing to act on stale information. So it makes sense … I mean, I might have to check the literature on this, but I would imagine that the Win-Stay, Lose-Shift principle is still reasonably better than chance even in the restless condition because if something paid out, you’d … you know, it’s a reasonable assumption that you should pull it at least one more time. So there are, I think, very basic heuristics that hold, but in general it is true that the more restless the environment, the more restless you should be too.

Robert Wiblin: You’re going to end up basically discounting old information. Old pools get weighted less in measuring like the fraction that it succeeded, so you get some kind of moving average. But, yeah, I guess for some reason that ends up like being computationally intractable.

Brian Christian: Yeah. And I mean you also can consider like, do you know going in how restless the environment is? Or are you building your model of the noise in the environment based on your experience? Which is obviously even more complex.

Benjamin Todd: Maybe as a way of kind of summing up the discussion as well I’d be interested to talk more about trying to get very concrete about specific career decisions.

Brian Christian: Yeah.

Benjamin Todd: And you know like, it does seem like in a way you could think of, well you have all your different career options open to you, and one way of thinking about it is each career stat takes like one to three years, which is kind of like, a job …

Brian Christian: Yeah.

Benjamin Todd: And then you have a forty-year career. So you’ve got, you know, ten or twenty pulls of the lever. And then the question is, you know, which one should you go for? And I just wondering if you wanted to say like, how we might attack that based on some of the models we’ve covered?

Brian Christian: Yeah. I think, I mean it’s also, I can’t help thinking about this in the context of my own career. You know, I don’t know how illustrative that is, or how useful that is to listeners, you know from just this anecdotal perspective of how I became a writer, but I was very conscious of the idea that I would take a crack at writing as a profession and find out fairly quickly whether I would succeed or fail. And then just do something else.

Brian Christian: So, in my case, having a computer science background, it was … Well, I can always just roll up to some, you know, big corporation and get some job, and so I didn’t have to worry about becoming destitute if I failed in my writing ambition. And so, speaking personally, I felt very emboldened by that to do something very risky and try to write a book proposal and so forth.

Benjamin Todd: That was a little bit like the upper confidence sense, for where you’re like, being a writer would be a real like kind of dream job for me; I’m not really sure if I could make it work, but, like, it’s worth giving it a go and I can always just switch back to the kind of like normal job path after.

Brian Christian: Yeah. And I remember having a conversation with my undergraduate writing mentor, and I was talking to him about should I go into graduate school and so forth, and his advice to me was “I highly recommend that you only go to graduate school if you can go to a program that’s funded, because part of what you are trying to do is make a life as a writer, and if you graduate from even a really good program with, you know, fifty-thousand dollars or a hundred-thousand dollars of debt, then that is going to rapidly put pressure on you to either immediately professionalize as a writer or immediately abandon that path because you’ve got to make your loan payments.”

Brian Christian: And I thought that was really kind of astute advice, and that’s not the kind of advice that fits on the axis of, you know, “go for your dreams, or not.” I thought it was very pragmatic, and sort of had this eye to the option-value of being in a position to make slightly risky moves as an adult. I thought that was really kind of astute advice. So that’s something that I think people can think about from the perspective of making those really early decisions about whether to get–you know for example if you get a law degree or a medical degree, those degrees are so expensive that it is very hard to do anything other than law or medicine, in part because you need to pay off your law and medicine training and those are generally lucrative ways to do that.

Benjamin Todd: That’s a good example of a multi-armed bandit that kind of has a negative payoff, because if you go and try and qualify as a lawyer and then you realize you hate this, you’ve actually invested a ton of money, so you’re worse off than when you started.

Brian Christian: Yeah. Yeah. And so, I mean, I think you guys and the eighty-thousand hours community surely have thought more about this than I have explicitly, but I think, on the whole, people probably spend less time testing those waters than they should. Particularly because they come with these big switching costs.

Benjamin Todd: Well, yeah, so the advice in our career guide that’s currently up is, kind of like, if you are pretty confident that this path seems best, then like probably figure out how to go for that, but obviously have a back up plan, but you know go for your main-line option. But if you’re uncertain, which many people, if you feel you’re uncertain which many people are then we encourage people to make a plan to try out several things over a few years, and one way you can do that is, before graduate school you often have like a couple of year period and then you can kind of do something a bit different and then you can go to graduate school, and …

Brian Christian: Mm-hmm (affirmative).

Benjamin Todd: That’s a way of kind of ordering things that lets you try out some things. But then when I actually read the book and thought about upper confidence in schools, I wondered if the advice of kind of “go and try out a couple of things” is not actually quite the right advice. Instead, you should think “which things seem plausibly best,” and then just do that straight away. [Laughter] And like switch later.

Benjamin Todd: I mean, that’s obviously ignoring many of the complications we’ve covered, but it made me, made me pause for thought that maybe the advice should be along the lines of “do the kind of plausibly best thing, rather than kind of plan to try out lots of things.”

Brian Christian: Yeah. Well, in a way, you’re describing the tension between epsilon-decreasing and upper confidence bound …

Benjamin Todd: Yeah.

Brian Christian: … And you can be buoyed by the fact that they’re, that they both offer you, you know, asymptotically logarithmic regret. [Laughter]

Brian Christian: They’re both part of …

Robert Wiblin: You can sleep soundly in your bed knowing that.

Brian Christian: Yeah. Yeah, actually I’ll mention one, a third algorithm that’s in that same family which I think is intuitive, which is called Thompson sampling, and that is “do something with the percentage of your energy or time or money that is the likelihood of it being the best thing.” So if you’re ninety-nine percent sure that you want to be a doctor, then spend ninety-nine percent of your time being a doctor. If you’re fifty-percent sure then spend half your time. And I think it’s just this wonderfully intuitive idea. And it fits perfectly within a Bayesian framework.

Benjamin Todd: Fifty-percent of your time over what time horizon? Like … ?

Brian Christian: Well, again, this is in the multi-armed bandit problem, so it’s just your next pull.

Benjamin Todd: Okay. Yeah. Okay.

Brian Christian: With probability point-five, you pull that, and then you get feedback and then you re-evaluate. So it’s a little bit different again in a sort of slower feedback mode. It does seem though, I mean in the context of advising someone that’s really young, to think about, you know, information gathering for its own sake. So if you’re someone who’s twenty-four, you’ve been doing, you know you’re in your first job out of college, and you really like it. In some ways there’s this argument, at least from, you know, epsilon-decreasing, that says, “I don’t care how good it is; try something else anyway.”

Brian Christian: You’re at that period of time where that’s what you need to do, is just try stuff.

Benjamin Todd: Yes. That’s why I was going to zoom back, like, if we had, say, if you’ve got ten or twenty pulls of jobs over your career, I mean the epsilon-decreasing advice is, like, you know for my next career stuff I should almost, I should basically flip a ten-sided coin, and if it’s like twenty-percent of the time I should go and do some like random other option, and otherwise I should carry on with the thing that I think is best, which, you know … You sometimes see people doing advice like that where they’re like “Well I’ve been in this thing for a while; I’m not really sure it’s doing something for me, so I’m just going to go and like do this like pretty unusual different thing and see where it takes me.” But it, on the other hand, feels like very counter-intuitive advice just to like do a randomly-chosen different job, like some fraction of the time.

Robert Wiblin: I think a case where that doesn’t work too well are industries where it’s kind of winner-takes-all. So in order to get anything you have to be the best. And I guess writing is actually a little bit like this.

Brian Christian: Yeah.

Robert Wiblin: Music, academia to some extent, if you’re doing your Ph.D. In those cases, exploring too much, or trying out lots of things and basically conceding failure that you’re never, that you’re not going to become the best musician if you like only expend a third of your time doing music because the competition’s so harsh …

Brian Christian: Mm-hmm (affirmative).

Robert Wiblin: … And people only want the best, whereas there’s other cases where exploration works okay because you just get kind of linear returns to like being better.

Brian Christian: That’s right. So I mean this is sort of a case where the machine, the payout on the machine grows with the number of times you’ve pulled that handle. If you pull the, like playing-the-violin handle, the first time you just get, you know, nothing. But the ten-thousandth time … You get an angry phone call from your neighbors.

Brian Christian: Yeah, it’s a, I mean that’s yet another way in which you know the multi-armed bandit framework is sort of an imperfect lens for thinking about some of these things. And I think in general I also find that this is something, I mean, yeah not to preach too much, but I mean younger people don’t quite appreciate the degree to which career paths put you by default on a trajectory where doing more of that thing becomes increasingly attractive, and doing other things you know less-so. Most corporate jobs are structured in this way very, I think, cannily so.

Robert Wiblin: They put you in “golden handcuffs”, I think is the expression.

Brian Christian: Exactly.

Robert Wiblin: Or like you’re always waiting the next six months to get the bonus from the previous year.

Brian Christian: Yeah. I mean even in the structure of the way that jobs are set up, so I mean just, this is an anecdotal example, but, being an author, you have this funny kind of life-rhythm where, you know, a draft will come back from your editor or your proofreader or something like this and you’ll have to work fourteen hours a day seven days a week for two weeks, and then you hand it back in and then you have nothing to do for two weeks. And this goes on a few times. And then the book goes to production and you have, you know, six months of relative peace, and then this huge publicity tour.

Brian Christian: It is not the kind of thing that you can do while smoothly segueing into your other job … You know, let’s say you want to switch careers so you take a full-time position, but you say to them “I have this publicity tour where I’m going to need to be away for like six weeks, you know, three months from now.” That’s not going to go over particularly well. And so, I mean that’s just an anecdotal example, but you start to notice that this particular window of time is exactly the right amount of time to start researching a new book proposal, and then do the publicity and then go back to researching the next book.

Brian Christian: It’s not as conducive to getting, you know, a full-time job at the corporation. So I think a lot of careers, they’ll have their own version of this, where you find yourself, you know, you open this door …

Robert Wiblin: It’s path dependency.

Brian Christian: That’s exactly right.

Robert Wiblin: So, yeah, podcasting is a bit like this, because, like each episode you gain more subscribers, and so each episode is more valuable than the last because more people hear it. So you usually end up in a situation where I never should have started this …

Robert Wiblin: … So you could easily end up thinking, “Well, I never should have started, but now that I’m here I should definitely continue.”

Brian Christian: Yeah. That’s right.

Benjamin Todd: Okay. So we’ve started to talk about careers where you kind of have to commit to them and once you get off it’s hard to get back on. And so maybe these might be better models as optimal stopping problems, which is another really fascinating chapter you have in the book. And so maybe we can start by just quickly saying how it’s different, and then …

Brian Christian: Mm-hmm (affirmative).

Benjamin Todd: … Then some of the approximate solutions to those as well and how they might apply.

Brian Christian: Yeah. Great. So there’s a second genre of problems that are called optimal stopping problems, and this has to do with being presented with a sequence of opportunities, one after another, and at each point in the sequence you either commit to that particular option, at which case the game’s over, or you decline and continue to progress through the sequence but, critically, you can’t change your mind and go back. And so the canonical optimal stopping problem is what’s called the secretary problem, and the basic idea here is, you imagine you’re hiring a secretary, you field n different candidates, they show up in a random order, and then you evaluate them, you interview them one after another. And because of whatever constraints, you either have to hire that person on the spot and dismiss everybody else, or you send them away, in which case you lose the ability to change your mind and hire them later.

Brian Christian: And so the problem here is how do you attempt to hire the very best candidate in the pool, given that you are establishing a baseline essentially as you go? And so there’s a risk of course that you stop too soon. There’s a risk that you establish too high of a standard and then no one after that point exceeds it. And this is another one of these math problems with this kind of wonderfully colorful history through the mid-twentieth century, and it also has this wonderfully elegant solution, which is that you should spend exactly one over e, or approximately thirty-seven percent of your search, just establishing a baseline. So interview thirty-seven percent of the candidates without an intention of hiring any of them, no matter how promising they seem, and then, after that point, be willing to immediately hire the next person who’s better than everyone you saw in that first thirty-seven percent.

Brian Christian: And this is due to a fascinating mathematical symmetry–your odds of success in this scenario are also one over e, or thirty-seven percent. And that in itself is kind of an intriguing detail, which is that following the optimal strategy you still fail sixty-three percent of the time. It just turns out to be a hard problem. But the optimal strategy and the odds of success are identical regardless of the size of the pool. So as n goes to infinity you still want to follow this thirty-seven percent rule, and incredibly you still have a thirty-seven percent chance of success. Even if the pool is, like, a million people, which seems crazy. You know, given random chance you would only have one-in-a-million shot of identifying the single best candidate out of all one million.

Robert Wiblin: I suppose it’s cancelled out by the fact that you get even more time to collect evidence or something like that? So …

Brian Christian: Yeah.

Benjamin Todd: I mean you have a really nice explanation of how the derivation works in the book, so really encourage people to check that out.

Brian Christian: For people who really want to like go down the calculus wormhole, yeah.

Benjamin Todd: So with how this might apply to career decisions is, you can kind of imagine, you can like either, you can keep trying out jobs or you can commit to one and then you kind of run with it for a while. And that kind of approach makes more sense in these careers where you kind of can’t easily like jump in and out of them, but you have to kind of just commit and maybe you want to think about that more as an optimal stopping problem rather than multi-armed bandit, where with multi-armed bandit you can always just like switch to a different lever.

Brian Christian: Yeah.

Robert Wiblin: Perhaps the best match might be relationships, because it’s particularly hard to dump someone and then get back together with them after you’ve like tried someone else. People wouldn’t take too kindly to that.

Brian Christian: Yeah.

Robert Wiblin: It’s probably easier to go back to your previous job than it is to do that.

Brian Christian: That’s true. And this is actually something that people who studied optimal stopping model explicitly. In the literature it’s called recall, which is the ability to return to a previous candidate or a previous opportunity. And in fact in the book we tell these amusing mini-biographies of mathematicians and scientists applying, in some cases explicitly applying a thirty-seven percent rule to their dating life, with mixed success.

Brian Christian: But someone who embodies this idea of trying to return to a previous option is Johannes Kepler. So, after the death of Kepler’s first wife, he embarks on this kind of epically arduous series of courtships to try to find the perfect second wife to help him raise his kids and so forth. And he’s very frank about this in his letters, talking about, you know, he really liked the fourth woman that he was courting for her tall build and athletic body–just strange to hear this famous astronomer speaking this way. The fifth woman got along with the children; she was even better than number four, but he still persisted, and ultimately after he spends several years courting a total of eleven different women, he realizes “oh, no, no, no, it really was number five all along,” and he goes back to her and, you know, musters his best apology and says “I’m sorry for the half-dozen other people I’ve been dating in the meantime, but if you’re not spoken for and, you know, you can find it in your heart to forgive me, I’d love to get back together.”

Brian Christian: And, fortunately for Kepler, she agrees, and according to his biographers the rest of their lives is quite happy indeed.

Benjamin Todd: This is actually one of the bits I found more interesting in the book, because I kind of heard of the secretary problem before and the thirty-seven percent solution, but then you point out that if you add in the complication, that you can try to go back to a previous option, which you often do have in real life. But you say there’s only a fifty percent chance of that working, then how does that change the percentage, and if I remember correctly it means you should actually try out more like half the pool.

Brian Christian: Sixty-one percent.

Benjamin Todd: Sixty-one.

Brian Christian: … If you have a fifty percent change of your apology being accepted.

Benjamin Todd: So it means, but it means you should explore …

Brian Christian: That’s right.

Benjamin Todd: … which is intuitive.

Brian Christian: That’s right. And so it’s interesting, you know, in his diary Kepler bemoans what he calls his “restless and doubtfulness,” of, you know he’s kind of beating himself up,