0:33 Intro. [Recording date: August 26, 2016.] Russ Roberts: One of the great titles of all time, Weapons of Math Destruction. What are they? Cathy O'Neil: They are algorithms that I think are problematic. And I can define them for you. They have three properties. The first is that they are widespread--which is to say they are being deployed on many, many people to make very important decisions about those people's lives. So it could be how long they go to jail, whether they get a job or not, whether they get a loan. Things that matter to people. That's the first characteristic. The second is that they are secret in some sense: either there's a secret formula, that the people who get scored by these algorithms--usually a scoring system--it's either a secret formula that they don't really understand, or sometimes even a secret algorithm that they don't even know that they're being scored by. And then finally, they are destructive in some way: they have a destructive effect on the people who get badly scored or they sometimes even create feedback loops--pernicious feedback loops--that are overall destructive to society as a whole. Russ Roberts: Let's talk about those feedback loops, because you give some examples in the book of where I would call it a misunderstanding of a false correlation--or not a false correlation: a correlation that's not causative--is misinterpreted and it feeds back on itself. So, can you give us an example of that? Cathy O'Neil: Sure. Pretty much every chapter in my book has an example of one of these problematic algorithms. But I guess one of the ones I worry about the most, if we want to jump in, is a family of models, actually, called 'recidivism risk scores,' that judges all across the country-- Russ Roberts: That's 'recidivism,' right? Cathy O'Neil: Recidivism risk, yeah. Russ Roberts: The risk of getting back on the bad side of the law and ending up in jail, for example. Cathy O'Neil: Right. So they are basically--they are scored for people who are entering jail or prison. And 97% of people eventually leave. So the question is: How likely is this person to return? And so these algorithms measure the likelihood for a given criminal defendant to return. And they are given, like, basically--there are categories: either it's low risk, medium risk, or high risk. And that score is given to the judge in sentencing. Or, sometimes in paroling, or even in setting bail. But I'll focus on the sentencing. So, it might not be obvious, and it's actually not obvious. We can talk about it. But if you are a higher risk of recidivism, then the judge tends to sentence you for longer. And so we can get into what I think is problematic about the scoring systems themselves. But let me just discuss the feedback loop. The feedback loop here, which I consider extremely pernicious, is that when you are put in jail for longer, then by the time you get out of jail, you typically have fewer resources and fewer job prospects, and you are more of an outsider--more isolated from your community, you have fewer community ties. And you end up back in jail. So it's a kind of--it creates its own reality. By being labeled high risk, you become high risk. If that makes sense. Russ Roberts: Yeah. So, that's a theory--right?--the idea that prison is not much of a rehabilitation experience and that in fact it could be opposite. Right? It could be an opportunity if you spend more time with people who, instead of making you a more productive person in legal ways make you a more productive person in illegal ways when you do get out. Do we know anything about whether that's true? It's a hard question to answer. Cathy O'Neil: There certainly have been studies to this effect. And, by the way, I'm not claiming that this is inherently true. I mean, it's theoretically possible for prisons to be wonderful places where people have resources and they learn--you know, they go to college and they end up, because they spent a full 4 years there instead of 3, they end up with a college degree. And it actually improves their life after prison. But the studies that we know about don't point to that.

5:32 Russ Roberts: Okay. So carry on. But that's a fact of--that's an issue of how, whether present sentences should be structured the way they are and whether prisons should be, what the experience should be like of being in prison. Some would argue it could be a deterrent effect; maybe it's not in practice. But how does the data part of this interact--the riskiness and the length of the sentence, to have a feedback loop that's pernicious? Cathy O'Neil: Right. So, the scores themselves are calculated in problematic ways. So the first thing to understand about these scoring systems is that they basically--there's two types of data that go into the recidivism risk scores. The first is interactions with the police. And the second is kind of questionnaires that most of these scoring systems have. And then they use all of this information--the kind of police record with the answers to the questions--and they have a logistic model that they train to figure out the risk of coming back to jail. Russ Roberts: A logistic model is just a technical style of--an attempt to isolate the impact of the individual variables in this kind of 1-0 setting: Come back or not come back. Cathy O'Neil: Right. Well, it's actually a probability, but you have a threshold. If it's above, like, 65% or something, you'll say it's likely to come back. I don't know the exact thresholds they set. Nor do I actually have a problem with using a logistic regression. I don't even have a problem with calculating this probability. What I have a problem with is sort of interpreting the score itself. So, to be clear, if we have to take a step back and understand how data and the justice system works, and what kind of data we are talking about here. And so, you know, everybody who has been alive for the last few years, has seen, has looked around and seen all these, you know, black lives matter movement issues. A lot of--the Ferguson Report, the recent Baltimore Report, reported in the Chicago Police Department Commission Report--all point to police practices which, at the very least we can all agree upon are uneven. So there's much more scrutiny of poor and minority neighborhoods. There's just many, many more police interactions in those communities. Um, which leads to an actually biased data set coming out of that practice. So, I already have a problem with that kind of data, going into these recidivism risk scores. If--and I just want to be forward, I want to object. I want to make the point that if we were only taking into consideration violent crimes, I would have less of a problem. But we're not. We're taking into consideration a lot of things that we consider broken-windows, policing type interactions with the police. Russ Roberts: Explain what that is. Cathy O'Neil: That's the stuff like nuisance crimes. Like, having a joint in your pocket. Peeing on the sidewalk. Things that are associated with poverty, more or less. And things for which poor people are much more likely to get in trouble with the police than richer people or whiter people. So, that's one of the problems: it is that the data coming in from the police interactions is biased. The other thing is that often the questions that are asked in the corresponding questionnaire are actually proxies for race and class as well. So, there's a very widespread version of this recidivism risk score called the LSI-R (Level of Service Inventory-Revised). One of the questions on the LSI-R is, you know, 'Did you come from a high-crime neighborhood?' So, it's a very direct proxy. The answer to that question is a very direct proxy for class. There is another question which is, 'Did your, do family members, in your family, have they historically had interactions with the police?' This is obviously again--it goes back to if you are a poor, black person, then the chance of your saying yes to that are much higher. I would also point out that that's a question that would be considered probably unconstitutional if we were asked in an open court--if a lawyer said, 'Oh, this person's father was in jail, Judge, so please sentence this person for longer.' That would not fly. But because it's embedded in this scoring system, it somehow gets through. And the reason it gets through is because it's mathematical. People think that because it's algorithmic and because it's mathematical-- Russ Roberts: It's science-- Cathy O'Neil: It's scientific, yes. That they think it's objective and fair by construction. And so, the biggest point of my book is to push back against that idea.

10:20 Russ Roberts: And that's where you and I have tremendous common ground. Right? So, in many ways--we'll turn to some other examples in a minute--but in many ways a lot of the examples that you give, are just, to me, really bad social science run amok. Which becomes more possible when there's more data. Which is what the world we're increasingly living in-- Cathy O'Neil: Yeah. I would make sure that--right up front that I'm not against using data. Russ Roberts: I know. Cathy O'Neil: But I'm not-- Russ Roberts: That's good to say. I know you're not, but it's good to say. Cathy O'Neil: I'm a data scientist. And I promote good uses of data. What I'm seeing more and more, and the reason I wrote the book, is very unthoughtful uses of data being used in very high impact situations. Unfairly. And so we might agree completely. I don't know if we have disagreement, Russ. But I'm sure you'll find it if we do. Russ Roberts: We'll dig up some. But it's an interesting example. You are a data scientist. I'm an economist. And of course we're in favor of using data and evidence and facts, but using them well. And using them wisely. It's an interesting challenge, how to react to that: if it becomes increasingly difficult to do that. So, to come a narrative that you write about as well in the book, which is financial issues: I have friends who argue, 'Well, of course we have to use technical, mathematical measures of risk, because that's the best we can do.' And that's certainly true: That's the best that we can do in most cases. Sometimes. But what if, by putting the risk into this mathematical formulation, you become insensitive to it? You start to think you have it under control? That, psychologically, even though you know it's a flawed measure, and you know when you could list all the assumptions that went into it that you know were not accurate about, say, the distribution of the error function or the likelihood of a black swan--even though you are totally aware of that day after day, of looking at the data and your model and saying, 'Everything's fine today,' you get lulled into a false sense of security. In which case maybe this is a weapon of math destruction. And it's very difficult for technically trained, rational, left-brained people to say, 'Yeah, I shouldn't overuse that because I'm prone to use it badly.' Cathy O'Neil: Yeah. You bring up a really important point. I don't have a simple answer to it. But the truth is, it's really difficult even for trained professionals to understand uncertainty on a daily basis. With a lot of these things, the uncertainty is extreme. It's not the same thing as, say, the Value at Risk measure, which can be deceiving, even for people who kind of understand its failings. If that's an example you had in mind. Russ Roberts: That is what I had in mind. Cathy O'Neil: I mean, let's just go there. Value at Risk--I was a researcher at RiskMetrics, which kind of developed and marketed and sold for Value at Risk. It was clearly flawed. Of course, it was easy for me to say--I actually got there in 2009. But I feel like, if somebody had been in charge of being worried about Value at Risk being misinterpreted, they wouldn't have had to go too far to find the way people were--and I'll use shorthand here--the way people were stuffing risk into the tail in order to gain the 95-var risk measure. And I don't want to get too wonky here. But the point being that we had a sort of industry standard of worrying about 95 var. Sometimes 99. What that meant was that we never looked further afield than that kind of risk. Russ Roberts: Right. That's a perfect example. I assume by 95 or 99 you mean 1 in 20 or 1 in 100 chance. Cathy O'Neil: One in 20. Exactly. The worst return in 20 days. Russ Roberts: So, when you have a 99 and that's your standard and it never gets close to it, after a while you start to think everything's great. And of course that's not true. Let's go back to the prison example. You are a consulting firm--I assume; this is a privately designed, for money, for profit measure that some Department of Justice grant has funded or is paying for. And who wants to say that, 'Oh [?] I'm not sure we should really use this because it's got all these proxies that might not be accurate for what we're trying to measure. So, I would just use it as a crude rule of thumb. But I wouldn't rely on it.' But that's not really a very good career move. It's not a very good move for a person at the Department of Justice, let alone the consulting firm. So isn't that part of the problem here, is the temptation to soft-pedal the problems in these kind of models when you are being paid, on either end, as the buyer or seller? Cathy O'Neil: I mean, great point. I would even emphasize that in the case of the justice system, what we're dealing with currently is a very, very problematic situation, where judges are probably less reliable than these terrible models. So, in other words, I wouldn't say, 'Hey, let's go to the old days,' when we just relied on judges who were often more racist than the models I'm worried about. What I am worried about--and yes, so that's one thing. The next thing is, 'Yes, I built a model but it's not very good.' Right? No one wants to say that. Russ Roberts: 'But it's still a bargain. You got a good deal, trust me. It's great for what it is.' Cathy O'Neil: That actually is the context for the--they could probably honestly say, 'I built a model and it's better than what you have.' Right? Yeah. And there's another thing going on, by the way. I interviewed somebody, like, you know, on background, who is a person who models, who builds recidivism risk models. And I asked him what the rules were around his models. And in particular I said, 'Well, would you use race directly as an attribute in this logistic regression?' Russ Roberts: Let me guess. Cathy O'Neil: And he said, 'Oh, no, no, I would never use race--' Russ Roberts: Of course not-- Cathy O'Neil: 'because that would be--that would cause racial disparities in the results, in the scoring.' And I said, 'Well, would you ever use zip code?' And he said, 'Yeah, maybe.' Well, that's a proxy for race. In a segregated country like ours, what's really the difference? And he said, 'Yeah, no you're right, but it's so much more accurate when you do that.' It is more accurate. But what does that mean? When you think about it, what that means is, well, police really do profile people. So, yes, it is really more accurate. In other words, this doesn't--we want mathematical algorithms and scoring systems to simplify our lives. And some of them do. Like, I'll tell you one of my favorite scoring systems. If you've visited New York City, it's the restaurant grades. You know, there's a big sign, a big piece of paper in every restaurant window saying, you know, what their score was, last time they got the Sanitation Department came and checked out their kitchen. And you know not to go to a restaurant that doesn't have an A grade. Right? Why does that work so well? Because it simplifies a relatively thorny and opaque question, which is: Is this a hygienic restaurant? And we don't know if it's a perfect system. But it does really have this magic bullet feel to it, which is: That's all I need to know. Thank you. Russ Roberts: Well, we know it's not a perfect system because on the night you ate there maybe the people didn't wash their hands that day; and it was three weeks after the inspector and everybody's falling back into [?] behavior-- Cathy O'Neil: Of course, of course. Absolutely. Russ Roberts: You raise an important issue throughout your book, which is: These kind of simple indices, like, what's the probably of recidivism--which is a big, complicated thing, obviously, that's very person-dependent but we're going to simplify it as a function of 8 variables. Or the same thing is true from the grade from the Department of Health. The problem with a lot of these is of course that they can be gamed by the people to achieve a high score that doesn't represent high quality. Cathy O'Neil: So, it can be. And actually there was an interesting blog post about the prevalence of restaurant scores--so they started out as numbers, I guess, and then they turned into grades--that are just above the cutoff. So, there is clearly something slightly unstatistical about that. But at the same time, you know--and we also don't really know what we need in a clean restaurant. But it is, crudely put, a good way for us as consumers. Russ Roberts: There's some information there. That's what I would say. Cathy O'Neil: There's some information there. The problem with recidivism scores is what we've done is we've basically given the power to a class of scientists, data scientists, who focus on accuracy only. And when, again, when I talked to the person I interviewed, I said, 'You know, is accuracy--is the only thing we care about accuracy?' I would care more about causality, right? And you mentioned the word 'causal.' Like, the question should not be, 'Is this person poorer?' And are they poor minority people. The question should be 'Is this person going to commit another crime that we can prevent?' And, like--in other words, they can't do anything about having grown up in a poor neighborhood. For that fact to be used against them doesn't seem right.

20:28 Russ Roberts: I want to dig into this a little deeper, because if things go as planned this episode will add shortly after a conversation with Susan Athey, who is a machine learning econometrician, who makes a distinction in our interview between prediction and causation. And that's what you're talking about, I think--we should clarify this and go a little deeper. When you say 'accurate,' it very well may be the case that people from this particular zip code or people with these characteristics have a higher chance of committing a crime when they come back out of jail. And therefore ending back in jail. And that would be the "prediction" part: it fits the data well. These characteristics "predict"--they may not predict for this person very well but they do predict with these classes of people--these groups--according to the variables that you've actually measured. And that is not necessarily what we care about in a justice system; because, I think your argument--correct me if I'm wrong--you're argument if we observe in these neighborhoods a lot more police presence we may actually see more types of police interaction and even arrests and sometimes crimes of smaller versus larger amounts that will confirm the model in the sense that it's "predictive," but it's not really describing the fact that these people are more likely to necessarily be bad people, but they are just more likely to get swept up in a police problem. Is that kind of what you're getting at? Cathy O'Neil: Yeah. That's a really good description. Let me just reframe that a little bit, which is: I would look at the system as a whole. And it's not just police. It's also the way our jobs work for poor people, or don't work. The way our economy offers opportunities to [?] or doesn't. But I guess the simplest way to put it is that when you give someone a score this way and then you hold them accountable in a certain sense--by which I mean judges actually sentence people to longer if they have higher scores--in a very direct sense you are punishing them for that score. And so you are laying the blame on them. You are pointing a finger at them; you are saying, 'You have a bad score; I'm holding you responsible for that.' And the question is, of course, 'Why do you have a bad score?' Is that because of what you've done? Russ Roberts: And who you are. Cathy O'Neil: Or is it because of the police system you live in? Is it because of the economic opportunities you are given or not given because of who you are, how you were born, how you were raised? And the point is that that's a very hard question which I'm not equipped to answer by myself. But I am equipped to say that as a data scientist it should not be my job to decide this. Russ Roberts: Yeah. I just want to clarify what I said before, because I think it might be somewhat confusing. If I fit the data on what's the probability of somebody coming back into prison, I may have variables in there that correlate with that probability, but they are not causal. It just happens to be the case that people from these neighborhoods because of a police presence at certain time or different allocations of resources or whatever it is--school quality--it may turn out to be true. It doesn't imply that this person in particular, when they go back into that neighborhood, will have that experience. Because there could be a correlation that's not causal. And I think that's the distinction that machine learning is unable to make--even though "it fit the data really well," it's really good for predicting what happened in the past, it may not be good for predicting what happens in the future because those correlations may not be sustained. Cathy O'Neil: And we hope them aren't, in that situation. Let me give you another example; and you said it very well. It's a thought experiment that your listeners might enjoy. I'm imagining that there's a tech company and they want to hire engineers. That happens a lot, actually. And they decide to--they are having trouble finding good engineers, so they want to use a machine learning algorithm to help them sort through resumes. And of course they have their own history of hiring people, and those people either succeeded or they didn't succeed in their company. But they have to define success for this model to sort through the historical data and look for people who look like they have succeeded. That's basically what--when you want to build a model you have to define your data set; you have to say what success looks like; and [?] to feed the algorithm--you should choose an algorithm--but once you've chosen the algorithm you have to tell it, 'Look for this; look for patterns of people that look like this success story.' Now imagine that they define success as someone who has been there for 3 years and has been promoted at least twice. Now imagine that they run this machine algorithm; it gets trained on their historical hiring practices; and they set it on the new data set, which is new applications for engineering jobs. And they find that, like, no women get through the filter: that the algorithm literally rejects all the women applicants. What would that mean? Russ Roberts: It obviously means women aren't good at being engineers. Cathy O'Neil: I've set it up, an extreme case; probably not happening. Russ Roberts: Playing straight-person to your-- Cathy O'Neil: Right, right. Thank you: Straight man. I set it up to be extreme, but the point being like the algorithm would not say, 'Hey, you guys should check to make sure your culture is welcoming to women.' Right? It would instead just say, like, 'Women do not succeed at this company; throw them out.' Russ Roberts: Or it could be that the applicants--there aren't very many women in the data set because you have a poor history in the past and there's a lot of noise in the data, so women are just not matched to those characteristics that you found. But certainly the culture example would be more dramatic, right? If you have a sexist culture, women are going to look like they can't get those promotions, and as a result you are going to be encouraged not to hire them in the future by the machine learning. And then you'll see how smart you were--you'll think you're really smart. Cathy O'Neil: If you don't like that example-- Russ Roberts: I like that example. Cathy O'Neil: Well, I'm just going to say, think about Fox News and women anchors. It's not that they don't have any women. It's that the women that they have are pushed out. Right? Cathy O'Neil: I don't know if that's true. Cathy O'Neil: I'm not saying that this is actually happening in a given engineering firm. I'm just making the point that a machine learning algorithm is dumb. They don't understand the 'why.' The only understand the 'what happened.' Russ Roberts: I think that's important to emphasize. There are patterns; sometimes patterns are very dramatic. But that doesn't mean they'll be sustained in the future or that they should be sustained. Right? Cathy O'Neil: Exactly.

27:20 Russ Roberts: A friend of mine worked at a company and said he noticed that everyone there--he was an intern--he said he noticed that everyone there who had a permanent job, had only gone to 3 different universities. I don't think that was a coincidence to start with for their resumes. And it's not a bad place to start. Obviously there are good universities; I'm not going to name them; I don't remember them, actually. But they were good universities; but that's not necessarily--that's one way to reduce the cost of sifting through a lot of resumes. It's a very crude and perhaps not a terrible way to save time and cost. But as you get to these more sophisticated methods, as you point out, you get this opportunity to make false conclusions. Right? It's pretty straightforward. Cathy O'Neil: I mean, it's interesting. Because, you know, it's kind of obvious once you say it. But these algorithms, you know, as sophisticated as they are--and they sometimes are: they deep learning, they are all network algorithms--I wouldn't call it 'sophisticated' but they are certainly unintelligible. Russ Roberts: They are fancy. Cathy O'Neil: They don't make moral decisions. They literally only pick up patterns that already exist. So, it would be great--and sort of the Big Data promise is that you throw data against a wall and truth falls out. The Big Data promise is that somehow the truth is embedded in historical practices. But that's only true if historical practices are perfect. So, as soon as we have a firm that has--an engineering firm that has like really mastered what it means to find good engineers--as soon as we have that then we should make a machine learning algorithm to mimic that. But I don't think we have that yet.

29:20 Russ Roberts: And I think the other point you make which I think is important--I'm not sure I agree with it in all the cases you give: there's not always a mechanism for making the model better. So, in the case of the engineers, you'd consistently hire men. You slowly would weed out the women in that case, or you wouldn't hire them to start with. And you'd have a model that you'd be foolishly thinking had worked pretty well, but in fact you've made a mistake. Now, I would argue that firms that do that have an incentive to at least think about whether they are making a mistake: whether their big data models are serving them well. And I think we are in early days. So, one argument would be, against your pessimism about these models, would be, 'Well, we're just starting. Sure, they make some mistakes now but we're going to get better.' In fact, the evangelists would say, 'It's just going to get better and better. Of course they're imperfect.' What are your thoughts on that optimism and pessimism? Cathy O'Neil: I'm actually one of those people. I know we're going to get better. What I'm trying to point out is that we can't assume we're already good. What I'm objecting to are high-stakes decisions being made when there's no actual check or monitor on the fairness or the actual meaningfulness of the scores themselves. And I say, 'meaningfulness,' because I'm thinking about the teacher-value-added model-- Russ Roberts: I was just going to ask you about that. Cathy O'Neil: Yeah. I don't think the problem there is discrimination, per se. Like, actually a lot of the teachers are women. It's a very diverse field. There might be some discrimination issues around it. But the biggest problem is that it's not very meaningful. We have these scores that are typically between 0 and 100. And some work has been done to see just how consistent the scores are. And it's abysmal. Russ Roberts: Let's back up. Put the uses and the Value-added model in context, because listeners won't know what it is. This is an attempt to evaluate teacher quality and use that evaluation to either--typically to fire the worst teachers under various mandates. Right? Cathy O'Neil: Yeah. It goes back a couple of decades and a few Presidencies. The idea is: Fix education by getting rid of the bad teachers. And we have this myth of these terrible teachers that are ruining education. And I'm not saying there aren't-- Russ Roberts: Yeah; I wouldn't call that a total myth. I think there are some lousy teachers. Cathy O'Neil: There absolutely are bad teachers; and there are bad schools. But, I'm just claiming--and I'll repeat myself--that, you know, there might be a problem but if you have a solution that doesn't actually solve the problem then you are getting nowhere. And I think the value-added model for teachers is an example of that. So, what they've done, the first generation of teacher assessment tools, was pretty crude and obviously flawed. And that was to sort just count the number of students in a given teacher's class who, like, were proficient in their subject by the end of the year. And the reason that was super-crude was that essentially performance on standardized tests is highly correlated to poverty. Across the nation. And across the world, in fact. And when you discounted the number of students in a given class that attained proficiency and that punished the teachers who had very few of those students, then you are punishing basically teachers of poor students. And it was pretty clear that that wasn't good enough. Like, that wasn't--it wasn't discerning enough as a way of finding bad teachers. Or another way of thinking about it was, 'These kids weren't proficient in Third Grade. Why would they suddenly be proficient in Fourth Grade?' Russ Roberts: Yeah. You are not controlling for the initial quality of the students that the teachers had to deal with. So that's clearly wrong. Cathy O'Neil: Exactly. Right. So that's clearly wrong. So, they wanted to do exactly what you just said: they wanted to control for the students, themselves. So, what they've developed is this, what I call a 'derivative model.' So, it depends on another model, which is in the background, which estimates what a student, a given student, should get at the end of their fourth grade year. Let's say. And is based on what they got at the end of third grade--reasonably enough--as well as a few other attributes like what school district they are in, like whether they qualify for school lunches--which is a proxy for poverty. Various things. So, now, just imagine: Everybody in your class--you are a teacher, a fourth grade teacher--everybody in your class has an expected score at the end of the year. What is your score ending up? What's your Value Added score? It's going to be essentially the difference between--the collection of differences because you have a bunch of students--the differences between what your students actually get versus what they were expected to get. So, if you are a student-- Russ Roberts: Which is a good idea, on paper. Right? Cathy O'Neil: It is. It's absolutely a good idea. Russ Roberts: That's exactly what you want to try to measure. Cathy O'Neil: Right. So, if Tommy was expected to get an 80 but Tommy got an 88, then that's +8 points. That's good for you. If Sarah got a 60 when she was supposed to get a 65, that's not good for you. So, you kind of--again, the idea is--and this is kind of reminiscent of what we were talking about with the recidivism risk scores--you are held accountable for all these differences between what your students were supposed to get versus what they actually got. And I'm simplifying it because there's all sorts of complicated, sophisticated mathematics going on as well. But let's put that aside. This is more or less the idea. The problem, statistically speaking, with this, is that the original model is just not very accurate. Russ Roberts: Yeah. A lot of noise. Cathy O'Neil: And when you are dealing with the differences between actual and expected, that's called something: It's called the error term, in a bad model. So, as a teacher you are being held accountable essentially for the average error term of a bad model. Which is also, by the way, is also called 'noise.' For a reason. And it's just simply a bad scoring system. It's not consistent enough. I interviewed someone named Tim Clifford, who is a middle school English teacher in the New York City public schools. He's been teaching for 26 years. He has a bunch of awards, etc. He got a 6 out of 100-- Russ Roberts: That's a low score-- Cathy O'Neil: the first time he got a value added [?] model. Terrible score. He got a 96 the next year. Russ Roberts: He must have gotten smarter in the meantime. He took some classes on how to teach well. Cathy O'Neil: So, one of the things--I characterize 'weapons of math destruction' by saying they are widespread. So, this is all over the country. Most states now use some kind of version of this--that it's secret. So, this is what really gets to me about this. There's actually been quite a bit of uproar around these teacher assessment scores. And the New York Post actually filed a Freedom of Information Act (FOIA) request, and got the names and the scores of all the teachers in New York City--first year, I believe it was the first year it came out. And they published them. It was kind of like a public shaming of the teachers. I tried FOIA--I tried to get the--I filed a Freedom of Information Act request to get the source code for that same scoring system, under the assumption that if you can get the scores, public access, probably I can get the system, the scoring system itself. I was denied the actual code. And moreover I found that under the licensing agreement that this company, this big data company, had written with the City, New York City, nobody in the Department of Education could see the source code, either. So literally nobody actually understood how these scores were being built. So, a final word is that I kind of gave up and I didn't know what to do after that. But this really smart guy, who is actually a high school teacher at Stuyvesant High School, a math teacher, what he did was he took the stuff that the New York Post had published, he took that same data, and he found some teachers that were actually listed twice. Quite a few, actually. Hundreds of teachers were listed twice. They had maybe taught 7th grade math and 8th grade math, so they'd gotten scores for both classes. And he just graphed them. He just looked at how consistently these teachers were scored. Russ Roberts: That's pretty good. Cathy O'Neil: And he found very wide discrepancies. If you plotted on a scatter plot, it looks almost like uniform distribution. Whereas what you'd expect a line, y=x, just right down the middle. It's nothing like that. Russ Roberts: Although it is possible, of course, that a teacher has a particularly annoying class or a particularly challenging class--some classes will get more time and effort and energy from a teacher. They don't spread their time equally. And they probably don't do the same job in each class. But you'd expect some correlation. So the fact that it's virtually zero would be disturbing. Cathy O'Neil: It's not 0; to be clear, it's actually 24%. But that's like, for a teacher with themself. Russ Roberts: It's not so good. Cathy O'Neil: I'm not saying there's no information in that at all. What I'm saying is: It's not very good information. It's really not. And at the same time, it's being used for high-stakes questions. So, for tenure decisions. I interviewed a woman named Sarah Wysocki who was a Washington, D.C. area teacher. She got fired because she had a bad growth scored, value-added model score. She actually got fired over this. She had plenty of reason to believe that her score was actually caused by a previous teacher cheating on their students' tests because there was a bonus involved. It's complicated. But the point being that these scores are simply not accurate enough to fire people, to have large decisions based on them.

39:32 Russ Roberts: Yeah, so--the Wysocki example is tremendous because, it's just a phenomenal example of how if the incoming class grows or are artificially inflated the year before by cheating, or by some teacher is really good at teaching to the test and you are not as good at teaching to the test, and you are not as good at teaching to the test but you are a great teacher--you can get a lower score and seemingly worthy of being fired. I think it's important to add that: It's a horrific system. It's a horrific system, the public school system. And, you know, we could take turns--and I found myself taking turns as I read your book--feeling bad for the teachers or the students. So, it's true: That's very unfair to a teacher; and I think that's a crude and a very lousy way to evaluate teachers. And I'd also add that it's masquerading as objective when it's not. But the also truth is that these students get awful teachers who can't be fired. And so you have to have--you don't have to--but the current system because it's so entrenched: there's no way to get rid of bad teachers. And I think that's the tragedy, to me, and I come from a different ideological place than you do, but I think--you know, I don't know anything about this Value Added Model--it sounds awful to me for lots of reasons. I think it's incredibly difficult to predict expected scores. But the idea that somehow there's a good alternative--there isn't a good alternative in the current system, it seems to me. Cathy O'Neil: Yeah. I mean, listen: I'm glad I've convinced you of my main point, which was that this is not a solution. And we could talk about political solutions to bad teachers, which I agree are a problem. And if you wanted to know my personal opinion, like, 'Let's pay them much better, and remove tenure and get rid of bad ones in a thoughtful way.' I also think that data has a place in education. But I think that education, the way data and algorithms and models should work, has to be intrinsically a feedback loop between the teachers and the test scores. Right? And, you know, the teachers have to not just get a score, but like feedback about what they should do better. What--you know, 'Hey, we did this interesting test. The test actually measured the students' understanding of these various dimensions; and we see that your students were lacking in this dimension; and this is how you teach that.' In other words, feedback that the teachers can--that good teachers--can actually reliably use to improve their teaching. Which is not what we have here. Russ Roberts: I agree with that. The problem is that we are stuck because of the nature of the public school system. I think. We are stuck with objective, un-messaroundable things like test scores. Test scores are a terrible way to measure teacher quality. On so many dimensions. My wife's the head of a math department in a high school, and if I told her, 'Okay, what I want you to do is evaluate your teachers based on hos their kids do on test scores,' she would be so offended. She spends hours in the classrooms of her teachers. She wants people to be in her classroom when she teaches. And what makes a good teacher is a subtle--and there are a lot of dimensions to it. And certainly not only how somebody does on a test score--even if it's a huge improvement. Which is a good thing. I'm not denying--I don't think test scores are irrelevant. But I think it's bizarro that we assess teacher quality based on a score. And the reason we do that is it can be defended. It's sort of--to me, it's sort of a meta-version of what's wrong with the more complicated systems that you are talking about. It's not the way anybody would do it if they had to design it from scratch. Cathy O'Neil: I could not agree more. And I think that the philosophical question that is raised by your venting just now, which I completely agree with, is: When do we see these magic bullet algorithms be used? When do people say, 'I'm going to solve this very thorny, complex, related, complicated societal-wide problem' with this stupid algorithmic scoring system? Which doesn't answer the original question and leads to all these unintended consequences. And I think the answer is: The more complicated and societal, and, you know, taboo, a topic is, the more likely you are to come up with, to see something emerge along these lines.

44:18 Russ Roberts: Yeah. But--but--you've given a lot of examples from the private sector that are not as societal, that are different. And I want to turn to a couple of those because they are very interesting to me. And I want to defend--my only criticism, serious criticism of your book is that you don't spend much time talking about any of the benefits. So, it's--you emphasize the costs. Which perhaps is the right way to start, at least to get people's attention. But one example you use that came to mind was you talked about U.S. News and World Report and their attempt to measure university quality. Which is absurd. Obviously it can't be done. And they end up doing it. You know, just mindless-- Cathy O'Neil: [?] could do it. Russ Roberts: And they rank universities; they rank MBA (Master of Business Administration) programs; they rank graduate schools. And it's--we all understand that it's to sell magazines. It is to start arguments. And it is very effective at both of those things. But it also changes lives in all kinds of unexpected and not-so-attractive ways. And it creates what you call an arms race among universities trying to pad their scores, because they know it goes into the index. Having said that, it also forced a lot of schools that had great reputations to actually serve their students better than they had before. In my opinion. So, do you want to expand on the bad part? And do you want to accept my good part? Or do you want to disagree with that? Cathy O'Neil: Maybe I--if you have good evidence that the U.S. News and World[?] Report arms race among college administrators has actually had positive effects on student learning, I haven't seen that. Russ Roberts: Well, I wouldn't suggest it has much for student learning. I think what it did in places like in MBA programs, which is where I saw it way too up close and personal as a former faculty member in a business school that was really desperate to get in the Top 20 and stay there: there were some pernicious things that you talk about--that people did things to make the scores look better when in fact they weren't any better. But there was an enormous revolution among business school programs to make their degrees, I think, more useful to students. And I think that was a good thing. The rest of it, at the college level, you might be right; or I'm sure there's a lot of truth to what you say: which is, a lot of all it did was change the way people gamed the ranking system in lots of silly ways, and it's not what people should be spending their time thinking about--rather than they should be thinking about how to make the university better. Rather than trying to-- Cathy O'Neil: Yeah, I mean--okay. I'd be happy to look at what you are saying. And, you know, I'm not claiming that I've spent that much time on the MBA level of this stuff. I think my biggest criticism is that, if you are going to make a score of quality for colleges, especially if it's going to be aimed at parents of high school kids--and I'm one of them: my son is entering junior year of high school, which is like, critical moment, start worrying about college. Right? It is abominable to me, and I'm sure to you, that you do not--that you actually create a model that is blind to cost. As if we're a bunch of Rockefellers who can send our kids to whatever school is the best--you know, ignoring cost. Of course cost is a major factor. And the consequence of their ignoring cost, back in 1983--which they had many, many years to resolve, which they have not--the consequence of that is that tuitions have risen in direct relationship to how much these colleges are fighting each other to outrank them on this one list[?]--

48:14 Russ Roberts: Explain that. Because that was really interesting. I'm not sure I agree with it, but it's really interesting. So, talk about that connection. Cathy O'Neil: I'm not the only person making this case. But everybody knows that the number of administrators at these colleges has ballooned. And partly that's due to all sorts of things that they now have to--regulations that they now have to make sure that they are following. But a lot of that is directly due to the gist that many people in universities' job is to keep an eye on their ranking, and to make sure that, you know, they're competitive for incoming freshmen. Which means that they sort of like, the colleges at a given tier are all fighting for the best students that they can hope to get for that tier. And what that often means is they want to get these student athletes. So they have to build these new stadiums. They want to get really nice dorms. They have dorms that have, like, water parks embedded-- Russ Roberts: Yeah. It's unbelievable. Cathy O'Neil: in the dorms. It's like--forget about--I mean, I'm sure you have your story, too. I went to U.C. Berkeley in 1990. We had to find our own housing. It was very bare bones. We got a great education. It was very, very affordable. Especially for in state. We didn't get coddled: we were grown-ups. And I just feel like--it of course is part of a larger societal issue of like when do kids actually get to be called a grown-up in this day and age. Russ Roberts: For sure. Cathy O'Neil: But it is completely outrageous, and way too expensive. It's something that I as a parent would never agree to. But it's being--this money is being spent. And then charged to me, because of the fight for the U.S. News and World Report ranking. Russ Roberts: Yeah. That's an interesting question. That's a great example. I don't know if it's true. I think some of it's true. Because, the idea is here, you want the highest SAT (Scholastic Aptitude Test) students; you want to be selective so you want lots of applicants and you want to reject a bunch of them. Because that makes you look like a better school because you are more selective. None of which--you know, it doesn't make you a better school, obviously. It just makes you look like you are better school. Cathy O'Neil: And moreover I would argue we are just all fighting--and when I say 'we' I mean colleges--are just all fighting for the same group of kids. It's not like the kids change. You know, it's just the same group of kids. We're just sorting them slightly differently because after all this ranking situation. And we're putting them in very fancy dorms. Russ Roberts: Yeah, with really nice food and athletic facilities to play in. And maybe not always a water park, but lots of--it's a resort-like experience. Now, the question is: Is that because of these rankings? Or is it because we are a really rich country and rich people send their kids to these schools and they want their kids to have a pleasant experience? They don't want them to have your Berkeley experience or the experiences I had that are much more bare bones. Because you look at the high schools that these kids come from. They also look kind of unusually fancy. Cathy O'Neil: Well, listen. I mean it depends on who you ask. Obviously. I think that the kids that have, go to fancy high schools enjoy their fancy colleges. I think if you talk to a bunch of Millennials right now about their student debt, and ask them, 'Would you trade in your student debt for fewer perks in your college dorm?' they would trade it in a second. I also don't think this all is completely deliberate. It is not somebody's plan. I don't think there was anybody who--like, I don't even think the U.S. News and World Reports were like, 'Oh, we're going to screw the lower classes, and the middle class; and they're going to have huge amounts of college debt in the next 20 years. That wasn't--it wasn't like that. I wanted to give an example of what feedback loops can really do. And it's a natural. It happens, it arises naturally, because of the trust that we put into these rankings. We have actually endowed--as parents, we have endowed these rankings with power way beyond what they deserved. Russ Roberts: Yeah; I don't [?] that, but I know a lot of my friends do. Because I think, having taught at 5 universities, having been in the kitchen, I'm much less concerned about the grade that the Department of Health gives. And a little bit more maybe about what's actually going on, and therefore, you know, it sounds like I've got to get my kid in this kind of school, I'm thinking it's not really worth it. But it is--it's an interesting question. I think it's a question of magnitude of these effects. I don't know how much of it is driven by the U.S. News coming into existence. And I say that because one of the things I do know the data on, if you look at the amount of government subsidies to education over the last 10 years, 15 years, it's rather extraordinary. And the number of students going to college has increased. I think--it's a shocking number. I think who graduated--it's either go or graduated--was up, I want to say 50% between 2000, 2010. It's a huge increase over a very short period of time. Could argue maybe that's a response by the political process to the increased demand. I don't know. But there's a lot going on there. That's all I'd say. Cathy O'Neil: Yeah. There is a lot going on. I'm not saying this is the only factor. I also--I think the Federal Aid system is a factor--like, it's made things, it's made it easier for people to borrow money to go to school-- Russ Roberts: Which pushes up the demand-- Cathy O'Neil: which obviously is a very good incentive for schools to raise their tuition. Russ Roberts: Correct. It pushes the demand up. Cathy O'Neil: So, I absolutely don't claim this to be the only factor, but I do think that it is an important one. And I get that from my research from listening to administrators say when they install fancy stadiums. Russ Roberts: By the way, a separate issue--it's not obvious. I know administrators like to say that fancy stadiums and good sports teams encourage applications and improve rankings. There's a debate on that. And that may be an example where correlation isn't causation, either. You use the example the Flutie effect, where Doug Flutie threw a miracle pass at the end of a U. of Miami game; put Boston College on the map; and their admissions went up 30% over 2 years. But there are other things going on. It's not obvious that it was just due to that. But I think administrators like to invoke that as an excuse for fancy sports teams. Cathy O'Neil: Yeah. Again, it's not the only thing going on. I do think that alumni giving is one of the factors that the U.S. News and World Reports counts as a sign of quality. And I think that people who used to be on the football team are more likely to give money. But, again, I don't want to quibble. It's an example of a very, very influential algorithm. And it's an old example. So I just wanted to say: The algorithms have power. And we have a bunch of new algorithms that we are just blindly trusting and empowering, and we have to be careful.

55:06 Russ Roberts: You talk a bit about an issue that's come up recently on the program, which is A/B testing at tech firms like Google or Facebook or Quora, where, I interviewed Adam D'Angelo recently on that--of all the experiments that they are running daily. And Google is famous for that. This is a really cool thing; it's a really cool thing for data scientists. They have this incredible laboratory where they can change the color and change the font. And those are kind of harmless. Some of them are not so harmless, though, you suggest. So, talk about what worries you about, inside these tech firms, with proprietary experimentation going on. Cathy O'Neil: Are you talking about the predatory advertising? Russ Roberts: Anything you want. The bright side is, 'Oh, it's great. Everybody gets what they want. They make it work to customize for you.' And it sounds good. I think a lot of it is good. They show me books that I want to see. They show me things I want to buy, rather than things I don't want to buy. And on average that's good. But it's more than that. Cathy O'Neil: Yeah. I actually worked at an ad tech firm after leaving finance. And there's a story I say in the book about a venture capitalist who was considering investing in our Series B funding round. And he talked to the whole company, which I was I think at the time 50 people or so. And he talked about this sort of glorious future which he was imagining, where he would only see offers to vacations to Aruba and jet skis, and he would never again have to endure a University of Phoenix ad. Because those are for people like him. And when he said that, people laughed. And I was like, 'Wait a second. What?' We hear, the ad-tech guys are always talking about the opportunities and how tailored ads are a feature, almost like people should be grateful for them: 'Oh, thank you; I was thinking of buying that lamp. I'm so glad you showed it to me.' Russ Roberts: Right. 'You knew. You knew.' Cathy O'Neil: And in some sense they are right. There are often opportunities. Sometimes there are coupons. There may be nuisances or distractions when we are trying to get some work done. But in the worst case scenario, they are actually predatory. In the worst case scenario, going back to the Federal aid program, is for-profit college, which specifically target people who are vulnerable to this kind of really hard-core recruiting, and are eligible for the financial aid that goes straight from the government to the for-profit college. So, you know, and that's one example. There's another example of payday, at Payday Lenders. And the reason I think it's so important to understand that the worst case scenario is that it's quite predatory is that--I've been in Finance; I've been in Data Science. In Finance, when we had a weapon of math destruction, which was the Triple-A ratings on mortgage-backed securities, when that model failed, it failed spectacularly; and everyone in the world noticed, because in the financial crisis, the financial system was at risk. But what I fear about data science algorithms that fail, or that create pernicious feedback loops like the one I just described with for-profit colleges with debt and cycles of poverty, is that they are absolutely failing the people they are targeting, but we will not see it. It's exactly what the venture capitalist visiting my company said: He doesn't want to see it. He wants to be siloed and segregated and put into a position where he's like treated like the first class citizen that he is. And he wants other people, who are being preyed upon, to be separated and away from view. And that's the thing that bothered me the most. Actually, that was the moment when I decided to write the book. Russ Roberts: Yeah. I don't have any opinion on for-profit colleges. It is--think it's a--I don't know how predatory they actually are. They don't come across very well in your book. Which is maybe justified. The question is: What is to be done about that? Should we warn people that they are bad places? Let's start with the assumption, again I'm agnostic on it, I don't know anything about it--that they don't serve their clientele well and that there's a scam element, that things are being foisted on them that are not productive. Suppose that's true. Does that mean we should warn people about it? We should stop letting people borrow money for those uses? Or does it mean--which is what you focus on--that we should be wary of algorithms inside, say, Google or Facebook or elsewhere, that push certain type of people toward certain types of approaches that "aren't good for them"? Which is really what you are saying, I think. Cathy O'Neil: Yeah; I mean, at the very least I want people to stop promoting tailored advertisement as a purely benign, if not a positive force. It really is a segregating force. And for those of us who have money in our pockets and are well-educated it serves as an opportunity. And for other people it doesn't. And as far as the for-profit colleges go, I don't want to only single out for-profit colleges, because, the truth be told, like some of them are probably fine; some students probably have good [?] experiences. And then some other colleges are probably not fine. I think the answer to that--and if I were in charge of the world, which I'm not--would be: Yes, to cut off Federal aid. Because they are essentially leaches on the Federal aid system of loans. Which I think, between you and me should be completely changed; and we should just have free--like, very rare 4-year[?] State Schools and maybe forget about Federal aid. But that's just my opinion. Russ Roberts: I agree with half of that. Cathy O'Neil: Yeah. No. We don't have to agree on everything. And I'm not trying to say, 'Hey, yeah, everyone: Agree with me.' What I'm trying to say is like, 'This is happening.' Like, advertising online, these guys know a lot about you; they can target you; they know if you are poor; they know if you are single mom; they know if you are desperate. They find you. And they say, 'I'm going to solve all your problems. Just sign here and you're going to get online education,' and, you know, at the end of 4 years you'll be saddled with a lot of debt and you'll have a diploma that is often not worth more than a high school diploma.

1:01:58 Russ Roberts: So, there are going to be examples like that. But I want to disagree at least--at your response--to this idea that it's okay for me and you to get tailored ads but not poor people. So, I was thinking of buying a watch, and I did some Google searches on watches. And all of a sudden, watches started showing up in my searches as the ads. And I bought a watch. And they keep showing up. They are going to keep showing up. And that--to use this example for it: It's not that smart. It takes a while. Or maybe they are hoping I'll buy another one. Which I'm not. But, so I'm glad I saw them. Some of them. And I rejected some. I clicked on some, maybe. Don't remember. But I like in general that the ads are tailored for me because otherwise it's just clutter. But why is it that a poor person--aren't there things that a poor person would like to know about to buy, that are good for them and that they would profit from having? And shouldn't they be free? And wouldn't it be better for them to get those products that they are desperately eager to get a good deal on rather than they jet skis? And isn't it okay for them to get tailored ads, too? Cathy O'Neil: If they are looking to buy something and they don't have a lot of money, I of course want them to find a good deal. The problem is that they are worth, in this situation, they are worth way more to a potential payday lender or a potential college, [?] of a college, that can get just tons of money through the Federal Aid system than they are worth to a purveyor of cheap whatever--you know, the actual products that these people can afford. I don't know if you know the way that Google auctions work, but essentially, given a space on advertising goes to the highest bidder. So, you know, the different companies that are vying for space in front of that person, they each value that person in different ways. And right now, for poor people, it's not so surprising to hear: the predatory industries value them the most, because they can make the most profit off of those people. Russ Roberts: But they don't have to be predatory. Cathy O'Neil: They don't have to be predatory. No. I'm not saying they are. Russ Roberts: Even payday loan, lenders may not be predatory. Because these people that we are talking about maybe don't have a bank. Maybe they don't have access to capital. They can't--it's good for them. They want those things. Should we not let them have them? Should they be banned? Cathy O'Neil: I mean, I think they should. Depending--look, if it's a true payday lender. I mean, we don't, we're going to go into the weeds here. Let me just make the one point, which is if you had two lenders that are vying for the space on a webpage that a poor person is looking at, and one of them is predatory and charges enormous fees and makes enormous profits, and the other one is much more reasonable as a lender, the person that makes more money is going to be able to offer more money in the auction. And they are going to win. Do you see what I mean? Russ Roberts: I do. But it doesn't have to be the person, the most predatory. It could be, if there's competition between them, that there are costs-- Cathy O'Neil: There is competition between them. There is competition. But if I opened a bank today and I promised myself, I'd made it my mission to make really reasonable loans and I would make much less profit off those loans, I wouldn't have that much money to pay for tailored advertising. So I wouldn't win those Google auctions. Or those, whatever-the-advertising auctions. Russ Roberts: But that's because the audience that you are trying to attract is evidently more expensive. So, you could choose, as a matter of charity--you could raise money to create an NGO (Non-Governmental Agency) or a nonprofit that would outbid and offer lower rates of interest. But evidently that's not the market rate. But you are right: we are way off in the weeds here.