Tim Curtin’s paper in TSWJ isn’t the first time he’s mis-applied the Durbin-Watson test in order to justify rejecting a regression of physical variables (he substituted regression of the differenced values, after which he found the regression not statistically significant). He also does so in this precursor, in which he explicitly states (regarding his first regression)



… the Durbin-Watson statistic at 1.313, which is well below the benchmark 2.0 …



I don’t see any other interpretation than that he was using two (in fact, two point zero) as his critical test value, which we have already mentioned is completely wrong.

Curtin claimed that the absence of autocorrelation is required for valid regression, which is also wrong. Nonetheless he uses that claim to justify requiring regression be performed on differenced variables. Curtin isn’t the first (and won’t be the last) to claim that regression of climate variables like global temperature should be done using differenced variables. For example, a recent commenter on RealClimate by the handle “t.marvell” did the same, justifying it by insisting that global temperature was not a stationary time series. Of course it’s not stationary — it shows a trend!

Neither of those individuals seems to understand the impact that first-differencing has on regression analysis, especially when the causal relationship we’re interested in has to do with the trends which are present. Let’s give that some consideration.



First let’s consider the simple case in which some variable of interest, say y (it might be global temperature), is related to some other variable of interest, say x (which might be climate forcing due to CO 2 ). Suppose there’s a strict linear relationship between them, but the variable y also includes random noise

.

The coefficients and are the intercept and the slope of the line relating y to x. The quantities are random noise, which may or may not be white noise, i.e., it may or may not show autocorrelation. If we regress y on x, we get estimates of the coefficients and . We can also compute the residuals, which are estimates of the noise values , with which we can test whether or not they show significant autocorrelation. We can further test whether the regression itself is statistically significant — but if the noise shows autocorrelation that has to be taken into account when testing for such significance.

Let’s create some artificial data for an example. Let the x variable follow a straight-line trend plus just a little bit of noise:

We’ll define the variable y as a linear function of x plus white noise (with no autocorrelation)

.

which looks like this:

These two variables are strongly correlated, as is clear if we “normalize” them and plot them on the same graph:

Take note that the principal source of correlation is the fact that they both show a strong upward trend.

Of course, correlation isn’t causation. With this artificial data, we know that x causes y because the data were designed that way. With real climate data, we know that climate forcing causes temperature change because of the laws of physics.

If we do a linear regression of y on x we get this:

The slope of the best-fit line is 0.0103 +/- 0.0008 and the fit is certainly statistically significant. We can test for autocorrelation in the residuals, either by computing the sample autocorrelation function (indicating no statistically significant autocorrelation, which we already knew), or by performing the Durbin-Watson test. The DW statistic is 1.606, which for this sample size is also not statistically significant of autocorrelation. Bottom line: the fit is significant and the slope is correct within its error limits — in other words, we got the right answer.

Now suppose we applied Tim Curtin’s methodology. Since the DW statistic is less than 2, the regression has to be rejected in favor of regressing first-differenced variables. Therefore we define

,

and

.

Now let’s regress Y on X. That gives this:

This fit certainly doesn’t look very good. We find that it’s not statistically significant. The slope of the line is 0.03 +/- 0.09, which is actually bigger than the real slope! But the probable error is so large the result is meaningless. In other words, we got the wrong answer.

The sample autocorrelation function now indicates that there is autocorrelation in the residuals (at least at lag 1), but it’s negative. The Durbin-Watson statistic is 2.886, which once again is significant, but of negative lag-1 autocorrelation. But by Tim Curtin’s criterion we would conclude that there’s no autocorrelation, and by the logic he used in his paper we should conclude that there’s no relationship between x and y because there’s no statistically significant relationship between X and Y. Again, that’s the wrong answer.

The essential problem is that when we first-differenced the variables we removed the trend from each. But there’s another problem too. Suppose we model Y as a function of X using a straight line:

.

If we “integrate” this model (i.e., reverse the first-differencing step), we get

,

where C is some constant. In other words, by including an intercept ( ) for the straight-line fit of the first-differenced variables, we automatically include a time trend in the undifferenced variables. But, since the variable x very nearly follows a straight line, we now have the problem of collinearity, that two of our regressors are very nearly equivalent. That can wreak havoc with regression.

If we start with our original model (which is the correct one because that’s how we designed the data), and apply first-differencing to it, we get this:

.

Notice: there’s no intercept term in this model. Notice also that the noise is no longer white noise, it’s first-differenced white noise (a.k.a. MA(1) noise), which is why there’s (negative) autocorrelation in the residuals of the fit of Y on X. By the way, we can fit this model (with no intercept term) to our first-differenced data. Then we get a slope estimate of 0.013, which is much closer to the right answer, but the 2-sigma uncertainty is +/- 0.015 and the fit is not statistically significant. That’s because when you first-difference the variables, but don’t confound things by inserting a spurious intercept, the uncertainty in your result is increased dramatically.

There are situations in which analyzing first-differenced variables is extremely useful, sometimes even necessary. If, for instance, the noise had a “unit root” then we might want to difference the variables. Or, we might be primarily interested in the effect of the short-term fluctuations of x on those of y, in which case we could first-difference for the specific purpose of removing the trend from each. But to know the overall impact of our artifical x on our artifical y we want to avoid differencing — because we know how the data were constructed. To know the overall climate impact of greenhouse-gas forcing on global temperature, likewise we want to avoid differencing — because of the laws of physics.

As I said, lots of folks want to first-difference climate variables, but there’s no justification for doing so. More to the point, I doubt that many of them (or perhaps even any of them) really understand the impact of what they’re doing. But they sure seem to like the answers it gives ’em.