Obituaries for the late Princeton economist Alan Krueger have mixed praise for his personal character with implicit (or explicit) endorsement of his research approach. Krueger, writes Noah Smith, “helped turn the economics profession into a more empirical, more scientific enterprise.” Krueger was “a beacon of cool-headed reason -- his overriding goal was to get the facts.” Indeed, Krueger was a key figure in the “credibility revolution” which emphasizes natural experiments, randomized controlled trials, and similar empirical techniques to identify causal relationships between economic variables. Critics think the credibility revolution places too much emphasis on data over theory and, as a result, often gets the relationships wrong (the famous Card-Kreuger minimum-wage study is an obvious example). Moreover, the emphasis on econometric identification at the expense of problem selection, framing, theoretical justification, and interpretation has led economists to focus on smaller and smaller problems, perhaps at the expense of the growth of knowledge.

Today I want to focus on a different problem with atheoretical empirical work: The data themselves may be unreliable. More precisely, the data never “speak for themselves.” Instead, empirical work depends on a host of beliefs, assumptions, and judgment calls about what data are acceptable, which relationships should be examined, how variables are defined, what statistical techniques are appropriate, and what the findings mean. In other words, while we often hear calls to make the social sciences more empirical and data driven (i.e., more “scientific”), these calls ignore the fact that empirical research relies on subjective human judgment about framing, data collection, analysis, and interpretation.

One example is the concept of “statistical significance.” Students are taught that statistical relationships between variables are “significant” (i.e., valid, meaningful, important) if the correlations are unlikely to be produced by pure chance. “Unlikely” means chance would have produced the observed effects only 1% of the time, or 5%, or 10%. So what threshold counts? Scientists have long known, but are only now beginning to acknowledge publicly, that the conventionally accepted thresholds are arbitrary. A recent editorial in Nature calls for abandoning the concept of statistical significance:

[W]e should never conclude there is “no difference” or ‘no association’ just because a P value is larger than a threshold such as 0.05 or, equivalently, because a confidence interval includes zero. Neither should we conclude that two studies conflict because one had a statistically significant result and the other did not. These errors waste research efforts and misinform policy decisions.

The authors call for pre-registering studies – i.e., for scientific journals to accept or reject papers based on the proposed research design, before the results are reported – to avoid the bias that only papers with “significant” results are published. Some social science journals have gone farther and banned the reporting of p-values and the use of significance language.

What about the data themselves? The recent ousting by the Southern Poverty Law Center of its founder and longtime director, Morris Dees, raises an important question about data reliability. Not only journalists, but also social scientists, have long relied on SPLC data to measure the presence and activity of “hate groups” in the US. However, as the SPLC’s critics have long maintained, the SPLC has a financial incentive to exaggerate the number and characteristics of such groups to increase donations. Academic researchers would not rely on data provided by Philip-Morris on lung cancer or Exxon-Mobil data on the environmental effects of oil production. And yet, social scientists are remarkably sanguine about data provided by nonprofit groups and government agencies, even when those groups have obvious interests in how the research agenda is framed and what conclusions are acceptable (here are examples from monetary and agricultural economics). Remember, datasets useful for research do not appear by magic, given by nature, but are constructed by human beings and are subject to bias, error, manipulation, and so on. (The Card-Krueger data on wages and employment are a particularly salient example.)

Problems with data are yet another reason why empirical science is never “settled,” but subject to continual criticism, interpretation, and assessment.