Jeff Condon seems unhappy with me. Enough to blog about it. And Anthony Watts cross-posts. But what I’d really like to discuss is yet another way in which Condon fools himself about trends in sea ice.



He thinks he shows that the trend in global sea ice (both area and extent) is just barely statistically significant, so “just barely” that if you leave out the first few months of data, significant it no longer is. Here, for instance, is his result for global sea ice area:

Note the numbers for the slope and its uncertainty near the top of the graph.

I’ve done similar analysis, using global sea ice area data from Cryosphere Today (UIUC, Univ. of Illinois at Urbana-Champaign). Even restricting to data no later than December 2009 (when Condon made his post), I get a result which is undeniably significant, not “just barely” so:

The rates we computed are nearly the same, and since we seem to be using different data sets the small difference is no surprise. But the uncertainty levels are dramatically different. The trend I get is “fer shure.” Wherefore art thou “just barely”?

Condon’s problem seems to be that he has overstimated the lag-1 autocorrelation. Note that on his graph he indicates “lag-1 value 0.998” for the AR(1) model he’s using to correct for autocorrelation. But that value is too high. When I compute the lag-1 autocorrelation of global sea ice area anomaly from UIUC data I get 0.9913 (data through December 2009). Numerically that’s not much different from 0.998, but in terms of its impact on the uncertainty level of a trend analysis, it’s way different.

I’m not sure, but I think the UIUC data represents a multi-day average of ice conditions. But that can’t cause the lower value, because such a process will inflate the autocorrelation even more, so it can’t explain this value being so much lower than 0.998.

If we use an AR(1) model with lag-1 autocorrelation 0.9913, then the “number of data points per effective degree of freedom” is about 229. This increases the uncertainty in an estimated trend slope by the square root of that, making it about 15 times larger. Clearly that’s a very strong effect. But if we use a lag-1 autocorrelation value of 0.998, the “number of data points per effective degree of freedom” is a whopping 999, increasing the uncertainty in a slope estimate by a factor of almost 32. That’s more than twice as large a correction factor as actually applies.

Where did Condon get his 0.998? I can’t be sure but I have a theory. I think he computed the lag-1 autocorrelation, not of the anomaly values, but of the data values. For the UIUC data, that gives a lag-1 autocorrelation of 0.9986, which is even higher than Condon’t 0.998, possibly because of the UIUC data being a multi-day average.

If it’s true that Condon estimated the lag-1 autocorrelation from the raw data rather than the anomalies, then the only proper characterization is — rookie mistake. In any case, the value 0.998 is too high, as is Condon’s estimate of the uncertainty of the slope.

By the way, it can also sometimes be a mistake to use an AR(1) model for the errors. In fact it’s a good idea to look at the autocorrelation function (ACF) in order to determine whether or not such a model is plausible. For the UIUC data, we can compare the ACF estimate from the data (in black) to that which would follow from an AR(1) process with autoregressive parameter given by the lag-1 autocorrelation (in red):

Note that the AR(1) model values are consistently higher than the values estimated from the data. That shows that the AR(1) correction will overcompensate for autocorrelation, giving uncertainty levels which are too high. A better (but still imperfect) model is ARMA(1,1). Either way, the trend is “fer shure.”

Having established that the trend is real, we should estimate the autocorrelation from the residuals to the linear fit. This lowers the lag-1 autocorrelation from 0.9913 to 0.9888.

Bottom line: the trend is real, with or without the first few months of data. By no means is it “just barely.” Condon’s estimate of the trend uncertainty is way too high. He has fooled himself (and others too).

He closes by taking the anomaly values and adding to them the overall average value, in order to produce this graph:

That certainly accomplishes his purpose.