To make progress in economics, it is essential that theoretical models be subjected to empirical tests that determine how well they can explain actual data. The tests that are used must be able to draw a sharp distinction between competing theoretical models, and one of the most important factors is the quality of the data used in the tests. Unfortunately, the quality of the data that economists employ is less than ideal, and this gets in the way of the ability of economists to improve the models they use. There are several reasons for the poor quality of economic data:

Non-Experimental Data: Economists do not have the ability to perform experiments, except in a very limited way. Instead, they must rely upon historical data. This makes tests of theoretical models much more difficult to conduct.

Related: Seven Myths About Keynesian Economics

A chemist can, for example, go the lab and perform experiments again and again and this has several advantages. To see the advantages, suppose there are two chemicals that combine imperfectly, and the investigator would like to know the temperature that produces the most complete chemical reaction.

The first advantage is that in a laboratory, the air pressure, amount of oxygen in the air, the temperature, and so on can be controlled as the chemicals are combined.

When using historical, real-world data this is not possible. All of the factors will vary –– they cannot be held constant unless the researcher is lucky enough to encounter a “natural experiment” where “all else equal” holds and that is rare –– and the inability to hold “all else equal” confounds the tests. It is still possible to add controls that try to capture the other factors that might influence the outcome, but one can never be sure that this has been done sufficiently well to allow clean statistical tests.

The second advantage is that the experiment can be repeated many, many times so that any randomness in the outcome of individual experiments can be averaged out. In the experiment above, for example, the chemicals could be combined 1,000 times at each temperature, and then the outcomes averaged to smooth out the noise in individual experiments.

Related: How to Tell If Fiscal Policy Works

In economics there is simply no way to, for example, run an experiment where the Great Recession occurs thousands of times and various policy interventions are implemented to see what type perform the best. Economists are stuck with a single historical realization, and can never be sure the extent to which the outcome is due to randomness or inadequate controls.

Surveys, Revisions, and Real-Time Data: Economic data is usually based upon surveys rather than a full tabulation of the variable of interest. Unemployment data, for example, is based upon a monthly sample of approximately 60,000 households. In some cases, as with GDP, the data arrive with a substantial time lag leading to revisions as new data clarifies the picture. For GDP, there is an advance estimate based upon data that is available one month after the end of a quarter, followed by second and third estimates released after two and three months later. There is also a first annual estimate released in the summer incorporating further new data, and there are subsequent annual and five-year revisions.

There are two important consequences of this. First, the data are often very noisy – month to month data on employment, for example, can vary substantially simply due to sampling error. Second, the data can change over time. In fact, it is possible that statistical tests that point in a particular direction will be reversed as the data are revised over time.

The fact that data are revised over time is particularly important when evaluating policy decisions. For example, early in the Great Recession it was “initially projected that the economy shrank at an annual rate of 3.8 percent in the last quarter of 2008. Months later, the bureau almost doubled that estimate, saying the number was 6.2 percent. Then it was revised to 6.3 percent. But it wasn’t until [2011] that the actual number was revealed: 8.9 percent.

Related: Is There One Economic Model to Rule Them All?

That makes it one of the worst quarters in American history.” This led policymakers to underestimate the severity of the recession, and to advocate policies that fell short of what was needed. Thus, investigators need to have access to the data that was available when policymakers make decisions, and for this reason a set of “real-time data” is archived at the Federal Reserve Bank of Philadelphia that gives the actual data available at any point in time.

Constructed Data: The data used in tests of theoretical models must match the theoretical variables, but raw data is often insufficient. For example, a particular theory may have “core inflation” or “trend inflation” as one of the variables, but what is the best way to match this theoretical variable? Should the investigator use inflation net of food and energy, trimmed mean estimates (i.e. eliminate the goods and services with the most variable prices), an estimated trend, or some other construct? It can matter which is used, and it is often unclear which method is best.

In addition, sometimes the data does not exist at all and must be constructed from various sources (as with Piketty and Reinhart and Rogoff). This requires assumptions about how micro data should be combined, how to weight data, and so on, and those decisions and assumptions can be controversial. In addition, as we’ve seen with Reinhart and Rogoff, errors are possible. For all of these reasons, the data can be imperfect, or wrong, and that can lead to misleading test outcomes.

Span versus Frequency: Suppose a researcher would like to know if the average temperature worldwide is increasing over time. To answer this question, the underlying trend change in the average temperature must be separated from the short-run variation. That requires a very long data set. With a data set thousands of years long, the underlying trend is evident, but data on temperature for the last ten or twenty years would not be very helpful.

Related: Blaming Rogoff-Reinhart for Austerity Policies is Absurd

It would difficult, if not impossible to separate short-run variation in the temperature from the longer-term changes. Notice too that it does not help very much to have higher frequency data, i.e. daily data instead of weekly or monthly data. It is the span of the data – the time period covered – rather than the frequency that the temperature is sampled that is important.

Economists face a very similar problem when attempting to identify cycles (short-run variation) and the trend changes (long-run variation) in variables like GDP. Our data sets are relatively short, and questions about long-run versus short-run changes are very difficult to answer.

*****

Progress in economics is frustratingly slow, and a key reason for this is the lack of quality data -- sometimes the data does not exist at all. Unfortunately, while the passage of time will increase the amount and span of economic data, the data will never be as good as it is for disciplines that have access to experimental data, and the ability to move economic theory forward will suffer as a result.

Top Reads from The Fiscal Times: