0:33 Intro. [Recording date: December 15, 2014.] Russ: Joshua Angrist is the author, with Steve Pischke, of the book Mastering 'Metrics: The Path from Cause to Effect and also with Pischke, the author of "The Credibility Revolution in Empirical Economics: How Better Research Design is Taking the Con Out of Econometrics," which was published in the Journal of Economic Perspectives (JEP) in the spring of 2010. That article and the book are our topic for today's conversation, and I want to thank David Beckworth and Adam Ozemik [?] for suggesting professor Angrist. Josh, welcome to EconTalk. Guest: Thanks, Russ. It's a pleasure to be talking to you this morning. Russ: So, the world's a complex place, and the goal of econometrics is usually to try to assess the impact of one variable on another. What are some of the techniques that the field uses to do that? Guest: Economics, or applied economics is evolving, and there are many different ways to look at the causal relationships, the effect of something on something else. I have my favorites; and those are outlined in the book and the article in the JEP you mentioned, then in our other book, the other book I wrote with Steve, Mostly Harmless Econometrics, which is focused on graduate students. We take as an ideal the kind of randomized trial or field trial that's often used in medicine to determine cause and effect or to gauge cause and effect and that's increasingly popular in empirical work in economics and in other social sciences. An important theme of my work and the book, and the new book in particular, is that even when we can't do a real randomized trial in the sense of going out and dividing people up into comparable treatment and control groups as if by a coin toss, there are methods that we can use, econometric methods that we hope will approximate that. Russ: And how do they do that? Guest: Well, different ways. Different methods, different ways. Different sorts of assumptions. Everything of course is built on assumptions, and we're always alert to the foundation of our work and the need to probe it and see whether it's solid and whether it supports the conclusions that we are trying to draw. The simplest empirical strategy--we identify 5 core methods in the new book. The first one is the randomized trial. And that's both a method and an ideal or a model, where people are actually divided on the basis of random assignment, and we have two very important examples where that was done in social science, both related to health care. The first one is the RAND Health Insurance experiment from the 1970s, which is really a landmark in our field. And the second one is a much more recent work by my colleague, Amy Finkelstein and a team of co-authors looking at random assignment of health insurance in Oregon. So that's both showing how it can be done and explaining why it's valuable. The other alternative, that is, the non-experimental approximations of random assignment involve various sorts of strategies. The first of these and the most common is just regression, which I imagine many of your listeners will be familiar with. Just a way to control for things, to try to hold the characteristics of groups that you're trying to compare fixed. So, there's an example in the new book where the question is the economic returns to going to a more selective college or a private college. This is based on empirical work by Alan Krueger and Stacy Dale. And the idea is that we can produce a well-controlled comparison by knowing the schools to which you applied and where you were admitted. And this produces a very striking finding, which is if we compare people who went to, say, private colleges--think about perhaps Boston U.--versus U. Mass. (University of Massachusetts), or even Harvard or MIT (The Massachusetts Institute of Technology) versus U. Mass., naively you'll see that the people who went to the private colleges earn a lot more. But conditional on where people were admitted, they do about equally well. There's no advantage. And that suggests that most of the observed difference, perhaps all of the observed difference in earnings between people who went to private and public universities is due to the fact that the people who went to the private universities were destined to do better anyway; they were on average people who were either more ambitious or had higher test scores. But those characteristics are reflected in their application decisions and their admissions results. Conditional on where they applied and where they got in, there doesn't seem to be any earnings advantage. So that's the example we use to illustrate regression. Of course, it isn't really a randomized trial, but we can tell that it looks very controlled because we can see that after appropriate conditioning--and in this case we think that 'appropriate' means holding fixed an individual's own assessment of how qualified they are for different sorts of schools and of course we're also holding the admissions' offices issues constant--how the admissions office gauged the applicants. Conditional on that, it looks like a good experiment in the sense that people who went to different sorts of schools have similar family backgrounds and they have similar measures of ability, like SAT (Scholastic Aptitude Test) scores. Russ: So this is not a finding that you wave around too much in front of your administration, presumably. Guest: In fact, it's a little awkward. I work at a very selective school and I'm friendly with our admissions officer, Head of Admissions here. And we discuss these results fairly often. There may be reasons why you'd like to come to MIT besides the earnings advantage it's likely to give you. Russ: Absolutely. Guest: But certainly on economic grounds alone--I'm not speaking specifically about MIT, but the difference between Penn (U. of Pennsylvania) and Penn State is not apparent in the data. So that's a very striking finding, and it shows the power of regression to produce a better, a more well-controlled comparison, if not a slam dunk; and in particular to eliminate some of the obviously sources of selection bias that are likely to be misleading. Let me just say parenthetically, 'selection bias' is an econometric term for differences observed between groups that are not in fact causal effects. So, for example, we observe that people who have health insurance are healthier than people who don't. That's mostly selection bias. The people who can afford or have access to health insurance tend to be healthier people, without regard to the fact that they have the insurance. And we know that, actually, from the results in the RAND study and from Amy Finkelstein's work. Russ: I want to come back to that in second. First I want to say something nice about your book. There's something that is very special about the book. It's a real rarity in economics writing, at least in my experience, which is that it's mainly about the intuition and less about the formal results. The formal results are there; but they are in the Appendix. Usually it's the other way around: we put the formal results in the book and then in a footnote or two we say, 'Oh, by the way, you should take this into account,' or 'that's what this is trying to accomplish.' But what I love about the book is that it's really an extended conversation about the nuance in art and craft of econometrics, which is something I think is extraordinarily missing from both the literature and the instruction. When I taught econometric or statistical analysis to Master's students, I wanted to teach them how to think like an econometrician, not what the formal results are; and it's remarkable how difficult it is to find material to help people to do that. The easy thing, of course, is just to give people tests on various formal results; and they're easy to grade. But to teach people and to grade them on craft is really, I think, the gold standard. And your book is really a step in that direction. Guest: Well, it's wonderful that you see the value in that. Steve and I both, of course--we're researchers, but we're also teachers. And we were well aware of the enormous gulf between the way econometrics is taught and the way it's done. And we see our job in this book and also in Mostly Harmless, our earlier book for graduate students, to try to bridge the gap between econometric practice and the econometric syllabus. And I hope that we're successful in that. That's really what we're trying to do.

10:24 Russ: Now, having said that, I have some disagreements with it. So let's turn to some of those. Guest: Okay. I didn't finish all the other--I don't know if you want me to go through those. Russ: Oh, go ahead. Please do. Go ahead. Guest: So, we start with random assignment. We talk about regression next, not because it's the best method but because it's a natural starting place. And I can't imagine seeing an empirical paper about cause and effect which doesn't at least show me the author's best effort at some kind of regression estimates where they control for the observed differences between groups. That may not be the last word, but it ought to be the first word. The other methods are Instrumental Variables (IV), regression discontinuity designs, and differences in differences. Each of these is any attempt to generate some kind of apples to apples comparison out of observational data--that is, data that were not generated by some sort of purposeful random assignment on the part of researchers. Instrumental variables is a strategy for leveraging naturally occurring random assignment, or something that looks like naturally occurring random assignment. So, the example we start with there is--well, let me add also that sometimes instrumental variables is a method for leveraging experimental random assignment in complicated experiments, where the treatment itself cannot be manipulated but there is an element of manipulation in the treatment--there's a kind of a partial manipulation. The first example in the instrumental variables chapter is a study of charter schools; and there we're interested in whether kids who go to charter schools--charter schools are essentially publicly funded private schools, an important part of education reform that's growing in many states, including Massachusetts where I live, but also elsewhere, like New Orleans is now an all-charter district in the Recovery School District in New Orleans. So, there's a big public controversy about the sort of semi-privatization of public schools, at least insofar as their operation goes; and a big debate about whether the charter schools are actually doing better than the public schools that they serve alongside with, or even replace in some cases. So, to answer that question, we use the fact that oversubscribed charter schools pick their students by lottery. That is, when they have more applicants than seats, they use a lottery to allocate the seats. And that creates an instrumental variables situation where we compare kids who are and are not offered seats at a charter school and then we adjust for the difference in the likelihood of attending the charter school that that tool generates, that that manipulation generates. And that's a great, simple example of IV estimation of causal effects. We also have an example from a randomized trial where the intervention is the arrest of suspected batterers in the cases of domestic abuse in the city of Minneapolis. This is a real randomized trial; it's a very famous criminological study from the 1980s. In that study, police officers who were called to the scene in cases where there was a presumption of assault--ordinarily a policeman has to make a decision about how to handle it. In this case the policeman was encouraged by virtue of random assignment to different strategies to either arrest the suspected batterer or to simply to separate the parties or refer them to counseling. And this is an IV situation because you can't actually tell the police what to do. They have to be free to make their own calls both in the interest of their own safety and in the safety, interest of the safety of the victims on the scene. So, there is an element of random assignment, but there's deviation from random assignment. It turns out that instrumental variables is the ideal tool to analyze that sort of scenario, which is quite common field trials that involve people and the messiness of social policy. So, those are two out of three of the IV examples. The next chapter discuss is regression discontinuity designs, which is growing in importance. Regression Discontinuity (RD) Designs are research designs, non-experimental research designs, that tend to mimic an experiment by using the rules that determine allocation to treatment states. So, an example there is somewhat--one of the examples there is very much along the lines of the regression study I mentioned. Instead of the [lead colleges?], one of the applications in the RD chapter is to the study of the lead high schools. And that's based on some work that my colleagues, Atila Abdulkadiroglu, Parag A. Pathak, and I did on the legendary elite High Schools, like the Boston Latin School and New York's Stuyvesant. And we used the fact that those schools admit kids on the basis of a cut-off. So, you have a test score. It isn't exactly a test score; it's a kind of an index; it's based on your GPA (Grade Point Average) and your tests, your admissions tests. And they admit you according to whether you fall above or below a threshold. And the idea there is that very small changes in test scores are arbitrary, so that if I look at kids who have scores just above and just below the cut-off, they are likely to be quite similar in terms of their family background, motivation, and so on. And so that's something like a randomized trial, the question of whether a kid is slightly above the cut-off or slightly below. There's a serendipitous. And so we can compare the achievement of kids across that threshold and gauge the value of education in an elite high school. And just as in the analysis of elite colleges, the RD study of elite high schools shows no--in this case, no achievement advantage for kids who go to these more elite schools. In spite of the fact that their peers are much better. So we are also relating them to the age-old question of social science of peer effects, whether there are benefits from studying or working with more productive or more talented colleagues, co-workers, and classmates. The RD is particularly interesting because it's relatively new in economics. When I was in graduate school I did not learn about RD and really didn't hear about RD until I had been working as an Assistant Professor for a few years. But now RD is one of our core methods and probably one of our most convincing non-experimental methods. So, Steve and I are especially pleased to kind of bring that in to the undergraduate curriculum. It's not commonly found in the mainline textbooks. Russ: That's Regression Discontinuity --RD. Guest: Right. RD is Regression Discontinuity.

17:31 Russ: So, I want to come to what I think is the heart of the matter, which is what I think is the convincing part. Since I'm kind of a skeptic. And I want to be on the couch; and you can counsel me and give me some cheer. So, when I look at these results, I have two issues. One is a theoretical point, which Leamer and Sims bring up in their response to your 2010 article. So, your article, your title, is playing on the 1983 paper by Ed Leamer, which is "Let's Take the Con Out of Econometrics"-- Guest: Right, wonderful paper. I read it with great pleasure in graduate school Russ: So, Ed's been a guest on this program before, a number of times; and we've talked specifically about that article. That article was worried about the fact that most of us don't get to go into the kitchen and see the enormous range of possible models that an economist might try. And Leamer claims that, as a result of that, the classical statistical significance tests really go out the window. We are kind of at the mercy of the researcher, because we don't know the range of stuff that was tried and not tried. And I have to mention George Stigler, who once told me that when he was in graduate school, since it took such an immense effort to run a regression, you picked the one or two that you thought were the best ideas. And you ran 'em. And it took a long, long time to make the calculations. Basically they were done by hand, with giant calculators. And then you hoped you found something. And that was it. And of course in today's world, you just hit Return. You can do lots and lots of data mining. And Leamer was worried about that. And one of your points, before we get to this issue of convincing specifically, one of your points is that perhaps ironically, you make the argument since Leamer wrote that article--but not based on his remedy. So, talk about what his remedy was, and why you think that has not been a route that people have taken. Guest: Well, I think the question has been whether what Leamer was complaining about was the most important problem that Applied Econometricians face. Leamer was essentially saying that there's a lot of specification search and there's selective reporting. And-- Russ: And his solution was very radical. Right? His suggestion was an immensely honest sensitivity analysis: so basically saying: If you combine all the possible variations of these variables we have, how big a range do we have for the variable we care about? And the answer is usually: Not very much. Guest: He's a fairly committed Bayesian, at least in his writing, if not in person. And he was proposing a fairly conventional I thought Bayesian approach where you would state your priors and you would then show how that maps. And he also had the idea that we should show many variations. Let me say at the outset that Leamer had a huge impact on me, and I think on empirical work. All to the good. That he--he is complaining about the kind of arbitrariness of what I report. Filtered into empirical practice in the form of robustness checks. In the sense that researchers today are expected to report plausible variations on what they've done. A great example of that is from my own work. This is in the new book. In the chapter on Differences in Differences, where you compare changes instead of levels, it's essentially a panel data method. The idea is that treatment and control groups move in parallel. In the absence of treatment. And that's a testable hypothesis. And a very simple check on that is to allow some departure from parallelism into your models. And the easiest way to do that is to introduce--if it's a state-based panel the easiest way to do that is some kind of state-specific trend. And many panels do not survive that, in the sense that the treatment-effective interest either just disappears or becomes not very well identified, not very precisely estimated when you do that. And Mostly Harmless had an example of that, and the new book has an example from my own work where we are trying to use compulsory attendance laws at the beginning of the 20th century by state and year of birth. And that's the source of variation in schooling we want to exploit. And when you put in a state-specific trend, it disappears. So that kind of idea that you owe it to your readers to both understand and explain and probe the fundamental assumptions that drive your results, well taken. And I think we have to credit Leamer's article for highlighting that and bringing that into modern empirical practice. An extreme version of that which is also emerging among my contemporaries is that when I do a randomized trial I might actually precommit to the analyses. And that's also a good development. Russ: Yeah. Shout it out. Guest: That's a sign of maturity, that we're willing to do that. I have mixed feelings about it, because I don't do a lot of randomized trials and I think the idea of precommitment becomes very difficult in some of the research designs that I use where you really need to see the data before you can decide how to analyze them. You're not sure what's going to work. That said, when you can precommit, that's a wonderful thing, and it produces especially convincing findings. The idea that I should show the world a mapping of all possible models and that that's the key to all good empirical work: I did disagree with that at the time and I still do. And that's reflected in the article with Steve in the JEP. The reason that most empirical work was not convincing in the age of, say, Stigler and until more recently was not because there was inadequate specification testing, but because the research designs were lousy. The example that Steve and I gave, is from work by Isaac Ehrlich, very influential papers on the effects of capital punishment. Russ: Yep. Part of my youth. Guest: Yeah. That's a great question and I don't want to single Ehrlich out for doing a particularly sloppy job or anything like that. But, I'm not too interested in how sensitive his findings are to the sort of variation Leamer is describing because I didn't find any of it convincing. He really did not lay out a clear case for his research design. A core concept in my work, in my writing with Steve and in the research methods I think are most effective is the notion of design. The notion of design, in an experiment, of course, is how you set up the experiment; who got allocated; what you were conditioning on; what the strata are; and so on. In an observational study, design is about how you are mimicking that trial. So when I talk about RD and I'm using RD, regression discontinuity, methods to estimate the effects of going to an exam school, you know the design there is that I'm comparing people above and below the test score cutoff. And if that design is convincing, it'll satisfy certain criteria, which I then owe my reader. But I certainly don't owe my readers an account of all possible strategies. I really do build it from my proposed design.

25:45 Russ: Let me react to that. So, I remember very vividly when the Ehrlich study came out. And at the time, I was a proponent of the death penalty. I couldn't exactly tell you why: that would be an unanswerable question. But when it came out--I was very naive and very young--I thought, well, see, it's proved. Of course, it wasn't. And of course, I think if you were not a proponent--and we don't need to go into your personal views on this, because I think it's a general issue: 'Oh, yeah, it was a terrible study; it didn't control for this; it didn't control for that.' People who were more sympathetic to the outcome, the findings of the study I think were more likely to believe that it was a good study. And if he had been more thorough, I suspect those of us who were biased toward the finding might have been a little more embarrassed to wave it around. I wasn't in any position to wave it around, so that isn't exactly my point. Guest: Well, Ehrlich's problem is not thoroughness. That's what I'm saying. Ehrlich's problem was the lack of a design. And, I mean, it's probably not that important--you know, Ehrlich's work was based on small samples and pre-dates most of the methods, except for basic regression methods-- Russ: Yeah, that's true-- Guest: that were highlighted in the book. At a minimum, we'd like to study capital punishment, we would use a state panel, for example. And we'd take out state effects--that is, we would use, basically we would use the Differences in Differences method. And that's been done, and there are references in the article that Steve and I wrote. You know, Ehrlich's work is important because it was intellectually important at the time. It's not of any empirical significance. I don't think any social scientist of my generation would look at Ehrlich's regressions and say they are worth reacting to. Russ: No, of course not; I understand. Guest: But there are other papers in the article about capital punishment; if you want I can look at it quickly, though I don't think it's to our-- Russ: No. I want to stick-- Guest: [?] much better job. Russ: No. I want to stick with the more general [?] Guest: Yeah. But you know, somebody, for example who proposes to study capital punishment, because, you know, the state of New York decides not to use it or outlaws it, you know, that person potentially has a good design. And I can tell that person, that researcher, exactly what he needs to do to convince me of that finding. And it won't be what Leamer suggested, which is a sort of all-hands-on-deck, all-specifications-are-created-equal specification search. Sorry--specification sensitivity analysis. But rather, I know what Differences in Differences depends on; and again, this is a theme of both of my books with Steve. We know what that method turns on. It turns on parallel trends. We always say that. It lives or dies with parallel trends. And to some extent, not 100%, but to a large extent, that kind of assumption can be tested. And the evidence that emerges from that test may or may not be very strong. But if it is strong, and if it's strongly favorable, then I have to be prepared to accept the results from that person's work. Russ: So, that's my question-- Guest: Somebody who is interested in the evidence.

29:05 Russ: Yeah, that's my question. So, let's go--I'll take a micro, a couple of micro issues, one of which you've mentioned; and I'll throw in a couple more that you referred to in your book or article--or that you don't, but they are prominent examples. And then I'll go up to macro. So, I'm going to go micro to macro. On micro, I'm going to mention the effect of the minimum wage on employment; the effect of class size on educational attainment; the effect of health insurance on health outcomes. Those are three incredibly contentious policy issues in microeconomics. At the macro level, I'll pick the Stimulus Package of 2009. So here are four issues that we as economists are expected--whether we actually can speak to them is a different question. But we are expected to speak to these issues. And so we roll out tremendous econometric artillery, along the lines that you've mentioned. And you talk about these, some of them, most of them, in your books and your article. Guest: Yeah, I wouldn't describe it as 'tremendous econometric artillery'. The methods in my book are simple and accessible to any reasonably quantitatively sophisticated undergraduate. Russ: But they take a lot of time and effort to do correctly with the data, and to do the kind of careful research design-- Guest: As I, as any work doing does. Russ: Right. And my question is-- Guest: I don't see that we're sort of over the top here in how hard the econometric work is. Russ: No, okay. That's fine. But the question, then, is: What have we learned in those four areas that you think stands the test of time and that is replicable? There have been some fine studies. There have been some--and I'll throw in the effect of immigration on wages, because you refer to the classic Mariel Boatlift study of David Card-- Guest: Yeah. Russ: How-- Guest: So, some of the evidence in these areas is stronger and weaker. But there is a lot of interesting evidence here that's worth discussing. That's my standard. Russ: Has anybody been convinced-- Guest: [?] would be-- Russ: Has anybody on the other side-- Guest: I've been convinced about many things. If you mention health insurance, for example, Americans are not very healthy compared to other OECD (Organisation for Economic Co-operation and Development) countries. Russ: Correct. Guest: The evidence overwhelmingly suggests that it has nothing to do with health insurance. And we see that in two randomized trials, extremely well done, very convincing. Russ: Well, convincing to you. Guest: That's an area [?] where the evidence is very strong. Russ: Convincing to you. Most--I happen to agree with you. I don't think it's a convincing case that health insurance-- Guest: I'm not too interested in taking a poll. The evidence is clear. I'm not sure who is not convinced. Russ: How about the people-- Guest: But anybody who believes otherwise has to explain away the RAND and the OECD findings. Russ: They can. Can't they? Guest: I haven't heard a convincing explanation, I don't know what it is. Russ: Well, not to you. I mean, I don't want to take a poll either, except to make the point that economists are typically unconvinced by so-called 'scientific experiments' using first-rate research design. It's very easy for them to say, 'Oh, the RAND study--it didn't look at a long enough distance, the Oregon study didn't have enough power. They didn't have a big enough sample. There were problems of selectivity.' Guest: Well, all I can say is the RAND study followed people for up to 5 years and the Oregon study certainly the standard errors are small enough. I mean, you know, there's informed critiques and there's uninformed critiques. There are people who have a position. I'm not sure what your standard is, Russ. I don't really care if I convince, say, Paul Krugman. Russ: No, I understand. There are people with an axe to grind, there are partisans, there's--let's move to the-- Guest: Yeah. I think that the people who work on health insurance in the scholarly community have been enormously influenced by those findings. And, you know, the people who wrote those papers probably did not expect to find what they found. So, I don't think they are representing the work dishonestly. Russ: I agree with that. Guest: And it has to be taken seriously. Now, I'm not sure what the standard is. There are certainly people who have an axe to grind. So I don't--you know, we can say the same thing about charter schools, which is something I work on. There are people who are very hostile to charter schools; and there are people who love charter schools. Russ: Yep. Guest: Okay. And you know, there are people who believe in market-based solutions and there are people who don't-- Russ: who are skeptical-- Guest: who are hostile to market-based solutions. And many of the people who comment on that sort of thing are very committed. I doubt that my work moves them. I think, for example, Diane Ravitch--I know that she's aware of what I do, what our group does--we have something called the School Effectiveness and Inequality Initiative. I don't know what I need to do about that. I don't really see that as my problem. People who study schools, and in my academic community pay attention to what we do. Now, you might say, 'Who cares about that?' Russ: No, no; I care. Guest: When it comes time to make policy, there are people who skip over the advocates. And they do look at what the academics say. When our governor, for example, was thinking--and in Massachusetts, the number of charter schools is capped. I don't have a position on that. I don't care, personally, deeply, what Massachusetts does as far as its charter school policy. I just want my work to be noticed when that issue is debated. And when that issue was debated in 2010, our work was noticed and I was gratified by that. The work was noticed, not just because economists were saying, 'This is worth attending to,' but people found the design convincing. We were able to represent it in a way that was convincing to policy makers as well as to other scholars. And more so, I think, than a lot of the work that had gone before. Russ: I want to come back to your example of Paul Krugman. He does have a Nobel Prize in economics. But I'll take your point-- Guest: Yeah, I don't want to discuss individuals in any--I used him as an example-- Russ: I understand-- Guest: of somebody who is identified with a set of positions. Russ: Agreed. Guest: And what he says is not the measure of my success. Russ: Of course. Guest: The measure of my success is what my peers think. But somewhat indirectly I think what my peers think matters. And when policy-makers--we're lucky to live in the United States where social science does actually matter for policy; and better social science probably matters more. Russ: Well, I'm agnostic on that. I think we like to believe that. I think we also perhaps read that evidence a little more cheerily than it perhaps deserves to be read. I think we're sometimes used by politicians rather than changing their opinions. But let's put that to the side. And I understand your point about Diane Ravitch; certainly partisans who--I'm not talking about political partisanship, I'm talking about people who have a staked-out position on a policy issue are going to be hard to change their mind.

37:08 Russ: Let's just stick, then, with two issues for now, which are: the health insurance case and the minimum wage. Do you think the majority of health economists oppose universal health insurance, based on the empirical evidence that it's not related to health outcomes and it's just a waste of money? Guest: I don't think that's relevant. Again, I'm not taking a poll. I think that many economists, again there's people who follow this and care about it. I think there's an understanding that if you want to improve public health, which of course many of us do, that insurance is not the key. There may be other good reasons to support insurance, and I'm not really interested in debating that. Russ: Yeah, I understand. How about the minimum wage? Do you think we have any scientific understanding of the impact of an increase of the minimum wage on employment, based on the research design. Guest: Yeah. Yeah, there's been a lot of good work on the minimum wage. Of course, it's not as good as the work in health insurance, in the sense of there isn't a randomized trial of the minimum wage. But I would say that the burden of proof has shifted towards people who think that the minimum wage has large dis-employment effects. Because it's been hard to find those. I'm not saying it's been impossible. But, you know, I'm a labor economist by trade. I do econometrics as kind of a hobby. And a lot of my teaching is in labor. And it's clear that the scholarly work on the minimum wage today is in a very different place than it was before Card and Krueger. Russ: Oh, I agree. Guest: I'm not saying everybody is convinced. Russ: That's true. Guest: But the evidence is relevant and worth attending to, and it tends to fail to find large dis-employment effects, and anybody who discusses the minimum wage has to contend with that. And I would say here there's a difference between what, say, Ehrlich did, which I don't, for the most part--and again, I'm not picking on him. I don't think Stigler is remembered for his empirical work, either. You mentioned him early in our discussion. There are studies that are remembered for their findings. You may disagree with the findings, or you may have reasons to discount the findings. But the findings are worth discussing and thinking about and they have to be confronted. Okay, that's my standard. Russ: Absolutely. Guest: You may disagree with my results on charter schools, but they are worth worrying about. Russ: Totally agree. What I find depressing is a couple of things--although I agree with you that sometimes people are surprised by the results they discover in their empirical work when they do a research design along the lines you are talking about, very often they will just dig harder. Other times they will not publish those results. And unfortunately sometimes when those results do get published, they don't hold up. So the biggest problem I have, really, is--there is a theoretical argument, which is-- Guest: Well, science is done by human beings. I think if you come at it with a very idealistic view, you are bound to be disappointed. People make mistakes. I'm not sure economists--we were having this discussion, I was at a conference last week at Stanford about causal influence in business school fields, and one of the speakers, John Rust, gave an interesting talk and he highlighted all the mistakes that economists have made in their empirical work--well known examples of mistaken analyses, I guess the most recent one is the Reinhart and Rogoff thing. Well, we all make mistakes. Science is a human endeavor and I'm not sure that we're worse than other fields-- Russ: I'm not talking about-- Guest: One of the [?] at this conference was talking about that. Russ: But I'm not talking about a spreadsheet error or Excel got the wrong number put in and they overstated some effect. And no one suggests that-- Guest: There's a spectrum of mistakes; some of it has to do with specification searches and that sort of thing. I agree. But you know, don't let the perfect be the enemy of the good. Are we always right? Are the findings always clear? Do the politicians always listen? I'm sure the answer to every one of those questions is 'No.' Are things generally improving? Are they better in the United States than elsewhere? Can you point to a situation or a period in time where the quality of social science and the impact that it has on public policy has been better than it is now? I'm not aware of a strong case for that. Russ: I don't find that necessarily a good thing. I mean, it's good for us. I'm not sure it's good for public policy. The question is whether the precision and accuracy of what we've discovered with the kind of techniques you are talking about, whether they have improved public policy or not--they've certainly given it a more scientific gloss. But the question is whether we have gotten better. Certainly we have more data; we have different kinds of data. But it's not obvious to me that we've gotten better at distinguishing causal impacts from correlations that may not be causal. And yet, you are right--we are the high priests of public policy; we get listened to a lot. I look at [?], my own bias, which is that skepticism. So, I'm willing to concede that I may be overly skeptical. When I look at the single most important macroeconomic event of our lifetime and I see the lack of precision--not just precision but different really smart people say that the effects are not just a different size but have different signs, it makes me wonder whether we are helping the debate or not. And I don't see those differences being narrowed over time. Do you think I'm wrong on macro? Guest: Well, you know, I'm a microeconomist, so I tend to pay less attention to macro. Steve and I wrote about this: I wish that macro was more empirical. And that macroeconomists were more like me in the sense that they look for good experiments and try to produce good designs. I think that's coming. It's been a long time coming. And Steve and I wrote about some of the younger scholars who seem to be bringing that message. It's certainly been resistant in macro. Here I'm talking about sort of on the intellectual side there seems to be a preference for models and theory among people who are trained in macro and see macro as their field. I can't really explain that. I think we'll get better evidence. But, if you draw back and say, where is social science in macro--again, by what standard? One of the most influential documents in the history of social science if Friedman and Schwartz. And it's hard to point to another field where, at least in social science, where anything has been so influential. Russ: I agree with that. I've talked about it many times in here; and it's not a sophisticated statistical analysis. It's just a post--before and after kind of look, what they call a natural experiment. It's very clever. Guest: Well, it's an effort to get at the causes of the Depression. I think that [?]-- Russ: And inflation, generally. Guest: [?] Friedman and Schwartz. And inflation. We can do better than Friedman and Schwartz with the kind of tools that are around today, but Friedman and Schwartz is a benchmark, and a worthy benchmark, and something to the credit of our discipline. Russ: But I have to mention that in 1945 there was a remarkable natural experiment, that WWII ended; and many macroeconomists said that it would create a horrible downturn. It did not. It didn't change 'em. I've gone back and read the AER (American Economic Review) and JPE (Journal of Political Economy) from those times; they then had an explanation for why it didn't conform to their expectations and then they didn't really need to revise them so much. I think it's very hard in a complicated world--and macro is one of the more complicated parts of it--for people to concede that their pet theory--and this is on both sides; I'll pick on my own views, which are very Friedman-and-Schwartz influenced. Certainly many people of my ilk said that we'd have massive inflation by now because of the activities of the Fed increasing its balance sheet. And I acted accordingly; I bought Inflation-Protected Securities with the Treasury. And they did okay, actually. But I was wrong. And a lot of people on my side-- Guest: I don't react to short term current events. When I was growing up--at least, I try not to in my work, in thinking about econometrics--when I was growing up, at least in my intellectual youth when I was in college, inflation was a central, was the central macroeconomic problem. And that problem in developing countries seems to have been solved. Well before the Great Depression. So that's certainly-- Russ: You mean well before the Great Recession? Guest: Right. I see that as a feather in the cap of applied macro. Russ: Oh, I totally agree. I think that's one of the few things that economists can point to, where they have, through empirical work, improved our understanding of something that wouldn't have otherwise been obvious to the general public or to policy makers. Showing that class size has an impact on education, I wouldn't put in the same category. And I'm worried that we're making a mistake when we conclude that minimum wage increases don't affect employment very much in the current range. Guest: Well, you know--I don't know--class size, I would say, is part of a larger literature on human capital. And again, I would credit economists with the prominence of human capital in policy discourse today. And certainly the credit here has to go to Gary Becker. His contribution was not fundamentally empirical. But also to Jacob Mincer; his contributions were fundamentally empirical. And that work began in the 1960s and 1970s and produced a stream of compelling empirical studies that really cemented the foundations that Becker and Mincer laid. So if you asked me for the largest macroeconomic victory for economic policy relevant to empirical work, I would say it's Friedman and Schwartz; and inflation especially on the micro side I would say the general importance of human capital as a causal determinant of earnings. And also something that the government can potentially influence. At the same time, labor economists have been good at showing that other things might not matter very much, like training programs that the government puts a lot of stock in don't seem to help people very much. Some do, but most don't.

49:14 Russ: Let me ask you a question, though, about randomized trials. We had Brian Nosek, the psychologist, on the program-- Guest: Yeah, I know Brian and his center. Russ: So, they are part of a larger agenda to worry about the replicability and credibility of experimental results in psychology. There's been a huge interest in the last 10 years over similar randomized trials in poor countries, trying to find out what works and doesn't work. And again I worry that they appear to have a scientific basis akin to a medical trial that's controlled and a "real" experiment. But we do have the problem of limited sample size. And there's a serious question of whether the findings scale: whether they are not specific to particular experiments rather than general lessons about behavior. Am I right? Guest: [?] let me react to that. First of all, no, I don't think you are right. The first problem is, limited sample size--if that's all you are telling me, the answer to that is in the statistics, in other words, the machinery of statistics tells you whether your sample size is large enough. The answer to that question is in the standard errors. If you think your sample size is too small or too big in a sort of moral sense, I can't help you. But if you want to know-- Russ: No, I'm not talking about that. Guest: whether the results are statistically precise, I have a precise answer for that. If you want to know whether the findings generalize, that's a harder question to address. There are certainly strategies for that, and we don't have to invent them. When somebody produces an important finding in medicine, other people try to replicate it. So, you are seeing that happen now, for example, in microfinance. There is enormous enthusiasm for microfinance in developing countries as a tool to lift people out of poverty. And certainly a priori it's not crazy to think that that might be useful. And we're getting a lot of evidence that it's probably not that effective. And not just from one study. So there's a body of work building up. And that's the J-PAL (Abdul Latif Jameel Poverty Action Lab) Agenda, the Poverty Action Lab folks. Some of whom were my colleagues; and Esther Duflo who is one of the co-leaders of that effort was my student. She's not answering all questions all the time and she's not providing the most general answer at any one time. But she's promoting the idea that we can, through a series of experiments, learn a lot that's useful. And in particular we can come up with evidence that helps us direct resources in directions that are most likely to be useful. One of the things that's important to remember--this came up at the conference I was at last week--is: one of the big roles of a social scientist is to point out what's not likely to work. Russ: Very valuable. Guest: And particularly in the world that you are describing, which is full of interested parties and advocates. In some cases it's ideological, but often it's commercial or it's based on some sort of faith in particular strategies. So, in the education world there's no end of approaches to schools that people are strongly committed to, not based on the evidence but based on a belief about how students learn or perhaps they even have a product to sell--we see that in the case of computer-aided instruction. In the developing country world, you have many actors, philanthropists, governments, non-governmental agencies, who have an idea to sell. Maybe it's smaller family size; maybe it's a particular kind of social organization. Maybe it's a particular technology. And it's very useful for an outside party to come in and say, 'let's take a look at this.' A great example recently is the surge in enthusiasm for computers in early education in developing countries. Many, many people became convinced--and I'm talking about politicians and policy makers and scientists--that it would be extraordinarily beneficial to put laptops or iPads in the hands of young kids in, say, Peru or Thailand or someplace like that. And others came and looked at that. In some cases, the idea that we should look at it was resisted. But we have good experimental evidence that that's probably not going to improve outcomes in those settings.

54:23 Russ: But in so many cases--this is tragedy, this is to be warned[?], not celebrated--a particular experiment which has statistical significance--when I say--my worry about sample size, it's not a moral issue. It's the question of whether you've sufficiently randomized across the unobservable variable that you can't control for; and therefore it's always possible that what you have measured is not really there. A lot of times, those studies don't replicate when they go try to find the results. Now, agreed, it's nice to open a question and it's nice to look at it. But I find it fascinating how often those results don't replicate. And that's a problem of development in randomized trials in poor countries. It's an enormous problem in epidemiology, where they often have enormous samples but they still have results that cannot be replicated on different samples or across different types of people or different cultures. And yet, the results that were established initially become waved around. An example was recently written about in the New Republic, the enthusiasm for deworming in Africa that seems perhaps, based on a followup study--and maybe it's not a good study--they suggest that many of those studies do not get repeated. There's not benefits from deworming, for student performance in education. So, that's--I'm not suggesting we shouldn't do empirical work. I'm suggesting that we should be much more humble about its reliability. Guest: I'm all for humble. I think it's important not to throw the baby out with the bathwater. The idea that findings can be misleading--you know, I'm the first to say that. And I'm known for being a harsh critic on other people's empirical work, and I try to apply the same standards to my own work. I don't agree with the sort of nihilistic proposition that nothing is ever learned, that it's all for naught. Russ: It's depressing, isn't it? Guest: No. I'm not depressed. Russ: Nod, it would be, if it were true. If it were true. Guest: I think there are a lot of people who are sort of retreating into that. I'm not sure why. Again, don't let the perfect be the enemy of the good. And try to keep some perspective. I was at a conference that the Center for Open Science sponsored. And most of the studies that seemed to generate the majority of the handwringing that we saw at that conference came from psychology, where there would be a small sample and there would be kind of a quirky finding. And I would have said, why did you pay any attention to that, anyway? And you know, you are probably right that the Atlantic likes that sort of thing-- Russ: Yup; New York Times. They make the front page-- Guest: Somebody does a little study about men and women do this or that--women are actually more competitive than men-- Russ: Better investors, whatever it is-- Guest: Under the right circumstances, men will eat their children. Or some wacky psychological thing. It doesn't concern me too much. I'm not sure that there's any policy that's reacting to that. I think in some sense that's just kind of a consumption good. It's lots of fun. I like to read it by myself. Russ: Find what's wrong with it. Yeah. Point out what's wrong with it. I understand. Guest: I would worry if everything we do turns out to be wrong, perhaps because the researchers are dishonest or manipulating results. That's not my impression, though. Russ: No; I think the bigger worry is that they are honest, and either they are fooling themselves or they are unintentionally fooling others about the reliability of the work. It's a lot more important, I think, to understand what happens when you spend $780-$805 or whatever it turned out to be, billion dollars on stimulus or whether you have helped or hurt the lowest skilled people with an increase in the minimum wage. There's a lot more at stake. Guest: Right. But there are plenty of examples where there's a body of work emerging. So, you know, in labor, it's certainly been hard in repeated good efforts to find dis-employment effects of the minimum wage. I'm not saying that's the end of the story. It's been hard in repeated efforts, mostly based on random assignment, to find training programs that are very likely to support the lower tail of the income distribution in any substantial way. It's been relatively easy in repeated efforts to find strong evidence that schooling boosts earnings. There's quite a few findings out there that are worth paying attention to and worth taking account of when it comes time to make policy. Russ: Well, I think learning boosts earnings. I don't think we've been very good at proving that schooling does. I think that's a big challenge, especially in poor countries, and Lant Pritchett's work I think is very alarming and probably true. Guest: Well, you need to read Chapter 6 of Mastering 'Metrics. Which is all about the relationship between schooling and earnings. And we trace the history of that question. And we go through the evidence and we explain why the picture that emerges there is reasonably convincing. Russ: Well, sometimes knowledge is correlated with schooling. I don't deny it. Russ: Let's close-- Guest: No, but I'm talking about the effect of schooling on earnings specifically. Measured schooling and earnings. That's what Chapter 6 in Mastering 'Metrics is about. Russ: Right but a huge part of that-- Guest: And we use that as a question to walk the reader through our application of our serious 5 econometric techniques. And, not every study is equally well done. But there's a body of evidence there that's worth taking a serious look at. Russ: Oh, I totally agree with you. But again, I'm not blaming you for this; the fact that it has led to billions--to say b illions of dollars being spent on schooling in poor countries with no impact is tragic. And that's not your fault; it's not the fault of that literature; it's not the fault that that literature doesn't apply to certain countries and settings. And the fact that, say, schooling and education are not always correlated. But I agree with you: when they are, there's no doubt it has an impact. I think people, even without economics degrees, believe it, and believed it before we quantified it.