The blow-up over the Reinhart-Rogoff results reminds me of a point I’ve been meaning to make about our ability to use empirical methods to make progress in macroeconomics. This isn't about the computational mistakes that Reinhart and Rogoff made, though those are certainly important, especially in small samples, it's about the quantity and quality of the data we use to draw important conclusions in macroeconomics.

Everybody has been highly critical of theoretical macroeconomic models, DSGE models in particular, and for good reason. But the imaginative construction of theoretical models is not the biggest problem in macro – we can build reasonable models to explain just about anything. The biggest problem in macroeconomics is the inability of econometricians of all flavors (classical, Bayesian) to definitively choose one model over another, i.e. to sort between these imaginative constructions. We like to think or ourselves as scientists, but if data can’t settle our theoretical disputes – and it doesn’t appear that it can – then our claim for scientific validity has little or no merit.

There are many reasons for this. For example, the use of historical rather than “all else equal” laboratory/experimental data makes it difficult to figure out if a particular relationship we find in the data reveals an important truth rather than a chance run that mimics a causal relationship. If we could do repeated experiments or compare data across countries (or other jurisdictions) without worrying about the “all else equal assumption” we’d could perhaps sort this out. It would be like repeated experiments. But, unfortunately, there are too many institutional differences and common shocks across countries to reliably treat each country as an independent, all else equal experiment. Without repeated experiments – with just one set of historical data for the US to rely upon – it is extraordinarily difficult to tell the difference between a spurious correlation and a true, noteworthy relationship in the data.

Even so, if we had a very, very long time-series for a single country, and if certain regularity conditions persisted over time (e.g. no structural change), we might be able to answer important theoretical and policy questions (if the same policy is tried again and again over time within a country, we can sort out the random and the systematic effects). Unfortunately, the time period covered by a typical data set in macroeconomics is relatively short (so that very few useful policy experiments are contained in the available data, e.g. there are very few data points telling us how the economy reacts to fiscal policy in deep recessions).

There is another problem with using historical as opposed to experimental data, testing theoretical models against data the researcher knows about when the model is built. In this regard, when I was a new assistant professor Milton Friedman presented some work at a conference that impressed me quite a bit. He resurrected a theoretical paper he had written 25 years earlier (it was his plucking model of aggregate fluctuations), and tested it against the data that had accumulated in the time since he had published his work. It’s not really fair to test a theory against historical macroeconomic data, we all know what the data say and it would be foolish to build a model that is inconsistent with the historical data it was built to explain – of course the model will fit the data, who would be impressed by that? But a test against data that the investigator could not have known about when the theory was formulated is a different story – those tests are meaningful (Friedman’s model passed the test using only the newer data).

As a young time-series econometrician struggling with data/degrees of freedom issues I found this encouraging. So what if in 1986 – when I finished graduate school – there were only 28 quarterly observations for macro variables (112 total observations, reliable data on money, which I almost always needed, doesn’t begin until 1959). By, say, the end of 2012 there would be almost double that amount (216 versus 112!!!). Asymptotic (plim-type) results here we come! (Switching to monthly data doesn’t help much since it’s the span of the data – the distance between the beginning and the end of the sample – rather than the frequency the data are sampled that determines many of the “large-sample results”).

By today, I thought, I would have almost double the data I had back then and that would improve the precision of tests quite a bit. I could also do what Friedman did, take really important older papers that give us results “everyone knows” and see if they hold up when tested against newer data.

It didn’t work out that way. There was a big change in the Fed’s operating procedure in the early 1980s, and because of this structural break today 1984 is a common starting point for empirical investigations (start dates can be anywhere in the 79-84 range though later dates are more common). Data before this time-period are discarded.

So, here we are 25 years or so later and macroeconomists don’t have any more data at our disposal than we did when I was in graduate school. And if the structure of the economy keeps changing – as it will – the same will probably be true 25 years from now. We will either have to model the structural change explicitly (which isn’t easy, and attempts to model structural beaks often induce as much uncertainty as clarity), or continually discard historical data as time goes on (maybe big data, digital technology, theoretical advances, etc. will help?).

The point is that for a variety of reasons – the lack of experimental data, small data sets, and important structural change foremost among them – empirical macroeconomics is not able to definitively say which competing model of the economy best explains the data. There are some questions we’ve been able to address successfully with empirical methods, e.g., there has been a big change in views about the effectiveness of monetary policy over the last few decades driven by empirical work. But for the most part empirical macro has not been able to settle important policy questions. The debate over government spending multipliers is a good example. Theoretically the multiplier can take a range of values from small to large, and even though most theoretical models in use today say that the multiplier is large in deep recessions, ultimately this is an empirical issue. I think the preponderance of the empirical evidence shows that multipliers are, in fact, relatively large in deep recessions – but you can find whatever result you like and none of the results are sufficiently definitive to make this a fully settled issue.

I used to think that the accumulation of data along with ever improving empirical techniques would eventually allow us to answer important theoretical and policy questions. I haven’t completely lost faith, but it’s hard to be satisfied with our progress to date. It’s even more disappointing to see researchers overlooking these well-known, obvious problems – for example the lack pf precision and sensitivity to data errors that come with the reliance on just a few observations – to oversell their results.