Whither macroeconomics? The surprising success of naïve GDP forecasts

Jon Faust

Over the past ten days, the U.S. Federal Reserve has lowered its policy interest rate 125 basis points based largely on its assessment of the need to battle strong recessionary forces. This comes after a December 12 meeting at which the Fed lowered rates a mere 25 basis points, still hoping to “foster maximum sustainable growth and provide some additional insurance against risks.1” To some, this rapid change in sentiment might seem surprising. In my view, though, these events mainly serve to remind us of how extraordinarily challenging it is to forecast economic activity.

Economic forecasting challenges

I regularly hear the accusation that economic forecasting is no better than weather forecasting, but this does a disservice to weather forecasters. It is also an unfair comparison: weather forecasters have immense advantages over economic forecasters.

When making a forecast, weather forecasters have access to data on the current and recent past conditions. In contrast, when the Fed made the forecast for the January Federal Open Market Committee (FOMC) meeting, the latest available GDP data were for the third quarter of the previous year. On January 30, we will get an advance release of fourth quarter GDP for the U.S., but this initial estimate will be highly speculative. Historically, the root mean square revision of the annualised quarterly growth rate in the advance release is about 1.5 percentage points--easily enough to spell the difference between slow growth and deep recession.2

Further, the GDP data will continue to be revised in important ways indefinitely. For example, the 1999 benchmark revision of GDP data raised measured average growth over 1997 and 1998 by more than one-half of a percentage point.3 A significant piece of the much-discussed productivity boom of the late 1990s was not in the GDP data until the 1999 benchmark. The weather equivalent would be forecasting temperature without knowing the current temperature, having only a fuzzy estimate of temperature in the recent past, and knowing that, years after the fact, a hot spell might be revised into the data.

Given that we cannot even measure GDP without considerable hindsight, we cannot expect forecasts of real economic activity to be very precise. We can and should, however, ask whether Federal Reserve forecasts are as accurate as possible.

How well does the Fed do?

Research over the years has generally supported the view that the Fed's Greenbook forecast, prepared for each FOMC meeting, is outstanding.4 Recently, discovery of a new dataset has made a more stringent evaluation possible. This dataset has a snapshot of (part of) the Fed's dataset as it stood at the time of about 150 Greenbook forecasts since 1979.5 Using this dataset, Jonathan Wright of the Federal Reserve Board and I assessed how a wide range of models would have performed if they had been used to forecast based on the information actually available to the Fed when it made its forecasts.6

Our results again confirm the high quality of the Greenbook forecast, but are sobering in some respects. When we give ten alternative models only those data that were available to the Fed, the Fed's forecast generally outperforms the alternatives by a wide margin. Chris Sims and others had speculated that the Fed's good performance might be due to the immense resources the Fed pours into assessing GDP in the current period and recent past. Since many of the raw inputs used by the Bureau of Economic Analysis (BEA) to construct the GDP data are public, the Fed attempts to assess current GDP by replicating many aspects of the BEA's efforts. It is probably not surprising that conventional models have trouble competing regarding current and past conditions with a thorough attempt to mirror the data construction machinery of the BEA.

A surprisingly simple forecast

While the Fed has a clear edge in assessing current conditions, policy decisions can only affect the future. To assess how the Fed does in projecting where real activity will be in the future, Jonathan and I give the alternative models the data available to the Fed and the Fed's estimate of the current state of the economy. Then we compare how well the models forecast when they all know the Fed's assessment of the current and recent past values of variables.

We find the surprising result that no model clearly outperforms the univariate autoregressive model. This is one of the simplest possible models: it basically forecasts in every period that the GDP growth will simply follow its historical average rate back to the mean. This may be sobering for not only the Fed but for the macroeconomics profession as a whole: knowledge of interest rates, labour market conditions, capacity utilisation, inflation, or any of about 50 additional variables does not systematically improve our ability to foretell where real activity is headed.

There are many details and caveats to be considered in fully understanding the meaning of these results. We can, however, give a bit more precise statement. The univariate autoregressive model, which predicts GDP growth based on four lagged values of growth, has smaller prediction errors than Greenbook and essentially every other model at every forecast horizon between one and five quarters into the future.7 Our measure of the prediction error is an estimate, however, and one should consider whether the differences in estimated forecast precision are statistically significant. This raises complex statistical issues, but our basic conclusion is that no method significantly outperforms the univariate model.

It is important to emphasise that the results for forecasting inflation are dramatically different than those for real activity. For inflation, Greenbook outperforms all other models, often by a wide margin.

Conclusions

Returning to the weather analogy, our results would translate something like this. We are forecasting temperature without knowing current temperature and having only a fuzzy estimate of temperature in the recent past. Further, we find that given a good estimate of recent temperatures, we do not know any systematic way to improve our temperature forecast using measures of other variables such as precipitation, barometric pressure, location of the jetstream, etc.

While necessary, forecasting real activity is a nasty endeavour. Our recent research confirms the basic conclusion of earlier work that the Fed's Greenbook forecast is excellent. No model we assess, however, including Greenbook, historically outperforms a naïve forecast based only on the best available estimate of recent GDP itself.

Note: This column draws conclusions from research that was started when I was an employee of the Federal Reserve Board and joint with Jonathan Wright of the Fed. The opinions stated here are my own and need not reflect Jonathan's opinion or those of anyone in the Federal Reserve System.

Footnotes

1 Minutes of the Federal Open Market Committee, Dec. 11, 2007, p.8,

2 For a summary of GDP revisions across the G-7, see Faust, et al., 'News and Noise in G-7 GDP Announcements,' Journal of Money, Credit, and Banking, v.37, n.3, June 2005, 403-417.

3 This is based on the author’s calculation using the data discussed below.

4 Two notable sources are Romer, C.D. and D.H. Romer (2000): 'Federal Reserve Information and the Behavior of Interest Rates', American Economic Review, 90, pp.429-457, and Sims, C.A. (2002): 'The Role of Models and Probabilities in the Monetary Policy Process', Brookings Papers on Economic Activity, 2, pp.1-40.

5 This new dataset was created and preserved over the years by Fed Staffer Douglas Battenberg.

6 Jon Faust and Jonathan Wright, 'Comparing Greenbook and Reduced Form Forecasts using a Large Realtime Dataset', NBER Working Paper 13397, Sept. 2007.

7 See Table 5c, panel 2.

Size of the errors is measured by root mean square prediction error (RMSPE). The RMSPE for the autoregressive model is lower at every horizon than 9 of the other 10 models. One model beat the simple model at a few horizons using the RMPSE criterion, but the difference was only few hundredths of a percentage point.