Do you believe that every time a prisoner is executed in the United States, eight future murders are deterred? Do you believe that a 1% increase in the number of citizens licensed to carry concealed weapons causes a 3.3% decrease in the state's murder rate? Do you believe that 10 to 20% of the decline in crime in the 1990s was caused by an increase in abortions in the 1970s? Or that the murder rate would have increased by 250% since 1974 if the United States had not built so many new prisons?

If you were misled by any of these studies, you may have fallen for a pernicious form of junk science: the use of mathematical models with no demonstrated predictive capability to draw policy conclusions. These studies are superficially impressive. Written by reputable social scientists from prestigious institutions, they often appear in peer reviewed scientific journals. Filled with complex statistical calculations, they give precise numerical "facts" that can be used as debaters’ points in policy arguments. But these "facts" are will o' the wisps. Before the ink is dry on one study, another appears with completely different "facts." Despite their scientific appearance, these models do not meet the fundamental criterion for a useful mathematical model: the ability to make predictions that are better than random chance.

Although economists are the leading practitioners of this arcane art, sociologists, criminologists and other social scientists have versions of it as well. It is known by various names, including "econometric modeling," "structural equation modeling," and "path analysis." All of these are ways of using the correlations between variables to make causal inferences. The problem with this, as anyone who has had a course in statistics knows, is that correlation is not causation. Correlations between two variables are often "spurious" because they are caused by some third variable. Econometric modelers try to overcome this problem by including all the relevant variables in their analyses, using a statistical technique called "multiple regression." If one had perfect measures of all the causal variables, this would work. But the data are never good enough. Repeated efforts to use multiple regression to achieve definitive answers to public policy questions have failed.

But many social scientists are reluctant to admit failure. They have devoted years to learning and teaching regression modeling, and they continue to use regression to make causal arguments that are not justified by their data. I call these arguments the myths of multiple regression, and I would like to use four studies of murder rates as examples.

Myth One: More Guns, Less Crime.

John Lott, an economist at Yale University, used an econometric model to argue that "allowing citizens to carry concealed weapons deters violent crimes, without increasing accidental deaths." Lott's analysis involved "shall issue" laws that require local authorities to issue a concealed weapons permit to any law-abiding citizen who applies for one. Lott estimated that each one percent increase in gun ownership in a population causes a 3.3% decrease in homicide rates. Lott and his co-author, David Mustard posted the first version of their study on the Internet in 1997 and tens of thousands of people downloaded it. It was the subject of policy forums, newspaper columns, and often quite sophisticated debates on the World Wide Web. In a book with the catchy title More Guns, Less Crime, Lott taunted his critics, accusing them of putting ideology ahead of science.

Lott's work is an example of statistical one-upmanship. He has more data and a more complex analysis than anyone else studying the topic. He demands that anyone who wants to challenge his arguments become immersed in a very complex statistical debate, based on computations so difficult that they cannot be done with ordinary desktop computers. He challenges anyone who disagrees with him to download his data set and redo his calculations, but most social scientists do not think it worth their while to replicate studies using methods that have repeatedly failed. Most gun control researchers simply brushed off Lott and Mustard's claims and went on with their work. Two highly respected criminal justice researchers, Frank Zimring and Gordon Hawkins (1997) wrote an article explaining that:

just as Messrs. Lott and Mustard can, with one model of the determinants of homicide, produce statistical residuals suggesting that 'shall issue' laws reduce homicide, we expect that a determined econometrician can produce a treatment of the same historical periods with different models and opposite effects. Econometric modeling is a double-edged sword in its capacity to facilitate statistical findings to warm the hearts of true believers of any stripe.

John Lott, however, disputed their analysis and continued to promote his own. Lott had collected data for each of America's counties for each year from 1977 to 1992. The problem with this is that America's counties vary tremendously in size and social characteristics. A few large ones, containing major cities, account for a very large percentage of the murders in the United States. As it happens, none of these very large counties have "shall issue" gun control laws. This means that Lott’s massive data set was simply unsuitable for his task. He had no variation in his key causal variable – "shall issue" laws – in the places where most murders occurred.

He did not mention this limitation in his book or articles. When I discovered the lack of "shall issue" laws in the major cities in my own examination of his data, I asked him about it. He shrugged it off, saying that he had "controlled" for population size in his analysis. But introducing a statistical control in the mathematical analysis did not make up for the fact that he simply had no data for the major cities where the homicide problem was most acute.

It took me some time to find this problem in his data, since I was not familiar with the gun control issue. But Zimring and Hawkins zeroed in on it immediately because they knew that "shall issue" laws were instituted in states where the National Rifle Association was powerful, largely in the South, the West and in rural regions. These were states that already had few restrictions on guns. They observed that this legislative history frustrates "our capacity to compare trends in 'shall issue' states with trends in other states. Because the states that changed legislation are different in location and constitution from the states that did not, comparisons across legislative categories will always risk confusing demographic and regional influences with the behavioral impact of different legal regimes." Zimring and Hawkins further observed that:

Lott and Mustard are, of course, aware of this problem. Their solution, a standard econometric technique, is to build a statistical model that will control for all the differences between Idaho and New York City that influence homicide and crime rates, other than the "shall issue" laws. If one can "specify" the major influences on homicide, rape, burglary, and auto theft in our model, then we can eliminate the influence of these factors on the different trends. Lott and Mustard build models that estimate the effects of demographic data, economic data, and criminal punishment on various offenses. These models are the ultimate in statistical home cooking in that they are created for this data set by these authors and only tested on the data that will be used in the evaluation of the right-to-carry impacts.

Myth Two: Imprisoning More People Cuts Crime

The Lott and Mustard case was exceptional only in the amount of public attention it received. It is quite common, even typical, for rival studies to be published using econometric methods to reach opposite conclusions about the same issue. Often there is nothing demonstrably wrong with either of the analyses. They simply use slightly different data sets or different techniques to achieve different results. It seems as if regression modelers can achieve any result they want without violating the rules of regression analysis in any way. In one exceptionally frank statement of frustration with this state of affairs, two highly respected criminologists, Thomas Marvell and Carlisle Moody (1997: 221), reported on the reception of a study they did of the effect of imprisonment on homicide rates. They reported that they:

widely circulated [their] findings, along with the data used, to colleagues who specialize in quantitative analysis. The most frequent response is that they refuse to believe the results no matter how good the statistical analysis. Behind that contention is the notion, often discussed informally but seldom published, that social scientists can obtain any result desired by manipulating the procedures used. In fact, the wide variety of estimates concerning the impact of prison populations is taken as good evidence of the malleability of research. The implication, even among many who regularly publish quantitative studies, is that no matter how thorough the analysis, results are not credible unless they conform with prior expectations. A research discipline cannot succeed in such a framework.

Myth Three: Executing People Cuts Crime

In 1975 The American Economic Review published an article by a leading economist, Isaac Ehrlich of the University of Michigan, who estimated that each execution deterred eight homicides. Before Ehrlich, the best known specialist on the effectiveness of capital punishment was Thorsten Sellen, who had used a much simpler method of analysis. Sellen prepared graphs comparing trends in different states. He found little or no difference between states with or without the death penalty, so he concluded that the death penalty made no difference. Ehrlich, in an act of statistical one-upmanship, claimed that his analysis was more valid because it controlled for all the factors that influence homicide rates.

Even before it was published, Ehrlich's work was cited by the Solicitor General of the United States in an amicus curiae brief filed with the United States Supreme Court in defense of the death penalty. Fortunately, the Court decided not to rely upon Ehrlich's evidence because it had not been confirmed by other researchers. This was wise, because within a year or two other researchers published equally sophisticated econometric analyses showing that the death penalty had no deterrent effect.

The controversy over Ehrlich's work was so important that the National Research Council convened a blue ribbon panel of experts to review it. After a very thorough review, the panel decided that the problem was not just with Ehrlich's model, but with the idea of using of econometric methods to resolve controversies over criminal justice policies. They (Manski, 1978: 422) concluded that:

because the data likely to be available for such analysis have limitations and because criminal behavior can be so complex, the emergence of a definitive behavioral study lying to rest all controversy about the behavioral effects of deterrence policies should not be expected.

Myth Four: Legalized Abortion Caused the Crime Drop in the 1990s.

In 1999, John Donohue and Steven Levitt released a study with a novel explanation of the sharp decline in murder rates in the 1990s. They argued that the legalization of abortion by the U.S. Supreme Court in 1973 caused a decrease in the birth of unwanted children, a disproportionate number of whom would have grown up to be criminals. The problem with this argument is that the legalization of abortion was a one-time historical event and one-time events do not provide enough data for a valid regression analysis. It is true that abortion was legalized earlier in some states than others, and Donohue and Levitt make use of this fact. But all these states were going through the same historical processes, and many other things were happening in the same historical period that effected murder rates. A valid regression analysis would have to capture all of these things, and test them under a wide range of variation. The existing data do not permit that, so the results of a regression analysis will vary depending on which data are selected for analysis.

In this case, Donohue and Levitt chose to focus on change over a twelve year time span, ignoring fluctuations within those years. By doing this, as James Fox (2000: 303) pointed out, "they missed most of the shifts in crime during this period - the upward trend during the late 1980s crack era and the downward correction in the post-crack years. This is something like studying the effects of moon phases on ocean tides but only recording data for periods of low tide."

When I was writing this article, I included a sentence stating "soon another regression analyst will probably reanalyze the same data and reach different conclusions." A few days later, my wife handed me a newspaper story about just such a study. The author was none other than John Lott of Yale, together with John Whitley of the University of Adelaide. They crunched the same numbers and concluded that "legalizing abortion increased murder rates by around about 0.5 to 7 percent" (Lott and Whitely, 2001).

Why such markedly different results? Each set of authors simply selected a different way to model an inadequate body of data. Econometrics cannot make a valid general law out of the historical fact that abortion was legalized in the 1970s and crime went down in the 1990s. We would need at least a few dozen such historical experiences for a valid statistical test.

Conclusions.

The acid test in statistical modeling is prediction. Prediction does not have to be perfect. If a model can predict significantly better than random guessing, it is useful. For example, if a model could predict stock prices even slightly better than random guessing, it would make its owners very wealthy. So a great deal of effort has gone into testing and evaluating models of stock prices. Unfortunately, researchers who use econometric techniques to evaluate social policies very seldom subject their models to predictive tests. Their excuse is that it takes too long for the outcomes to be known. You don’t get new data on poverty, abortion or homicide every few minutes as you do with stock prices. But researchers can do predictive testing in other ways. They can develop a model using data from one jurisdiction or time period, then use it to predict data from other times or places. But most researchers simply do not do this, or if they do the models fail and the results are never published.

The journals that publish econometric studies of public policy issues often do not require predictive testing, which shows that the editors and reviewers have low expectations for their fields. So researchers take data for a fixed period of time and keep fine tuning and adjusting their model it until they can "explain" trends that have already happened. There are always a number of ways to do this, and with modern computers it is not terribly hard to keep trying until you find something that fits. At that point, the researcher stops, writes up the findings, and sends the paper off for publication. Later, another researcher may adjust the model to obtain a different result. This fills the pages of scholarly journals, and everybody pretends not to notice that little or no progress is being made. But we are no closer to having a valid econometric model of murder rates today than we were when Isaac Ehrlich published the first model in 1975.

The scientific community does not have good procedures for acknowledging the failure of a widely used research method. Methods that are entrenched in graduate programs at leading universities and published in prestigious journals tend to be perpetuated. Many laymen assume that if a study has been published in a peer reviewed journal, it is valid. The cases we have examined show that this is not always the case. Peer review assures that established practices have been followed, but it is of little help when those practices themselves are faulty.

In 1991, David Freedman, a distinguished sociologist at the University of California at Berkeley and the author of textbooks on quantitative research methods, shook the foundations of regression modeling when he frankly stated "I do not think that regression can carry much of the burden in a causal argument. Nor do regression equations, by themselves, give much help in controlling for confounding variables" (Freedman, 1991: 292). Freedman's article provoked a number of strong reactions. Richard Berk (1991: 315) observed that Freedman's argument "will be very difficult for most quantitative sociologists to accept. It goes to the heart of their empirical enterprise and in so doing, puts entire professional careers in jeopardy."

Faced with critics who want some proof that they can predict trends, regression modelers often fall back on statistical one-upmanship. They make arguments so complex that only other highly trained regression analysts can understand, let alone refute, them. Often this technique works. Potential critics simply give up in frustration. The Philadelphia Inquirer's David Boldt (1999), after hearing John Lott speak on concealed weapons and homicide rates, and checking with other experts, lamented that "trying to sort out the academic arguments is almost a fool’s errand. You can drown in disputes over t-statistics, dummy variables and ‘Poisson’ vs. ‘least squares’ data analysis methods."

Boldt was correct to suspect that he was being lured into a fool’s mission. There are, in fact, no important findings in sociology or criminology that cannot be communicated to journalists and policy makers who lack graduate degrees in econometrics. It is time to admit that the emperor has no clothes. When presented with an econometric model, consumers should insist on evidence that it can predict trends in data other than the data used to create it. Models that fail this test are junk science, no matter how complex the analysis.

REFERENCES