In the previous post I discussed the refutation of LeMouel et al. (and a companion paper) by Legras et al. Now it seems that LeMouel et al. have responded to their critics (which is actually part of the discussion, not a published peer-reviewed response).



I won’t discuss the entire rebuttal except to say that it’s embarrassingly bad. They actually have the audacity to suggest that this:

is not clear evidence of an inhomogeneity (one which has a profound impact on their analysis since it coincides with one of the “high” solar cycles). Truly embarrassing.

What I will consider is their claim that they don’t actually need to account for autocorrelation in estimating the variance of their 21-day moving averages. They say this:



In addition, as can be seen in their supplementary material, when trying to account for dependencies in a 21-day interval (which we select), LMBY use 90- and 150-day intervals that naturally are affected by the seasonal variability of temperatures (plot and output on page 21, SM to LMBY). Figure 1 actually shows that autocorrelations of the daily temperatures in 21-day intervals fall below 0.2 in less than 3 days, while autocorrelation for the daily range of temperatures dT (which LMBY fail to consider) falls below 0.2 on the second day… The sentence “the number of effective degree of freedom is about 9 times smaller than estimated by LKMC and consequently the estimated variance of the ensemble average is about three times larger” is therefore false.



They support their claim with this graph for Praha (and a couple of others for Bologna and Uccle):

First things first: even if the autocorrelation estimates of LeMouel et al. were correct, they would still need to be taken into account when estimating the variance of the 21-day moving averages, and they’re big enough to have a significant effect. But I strongly suspect that the estimates of LeMouel et al. are not correct.

At first I was puzzled how LeMouel et al. managed to get autocorrelation estimates so much lower than I got. This is especially puzzling because I estimated the autocorrelation of anomalies so as to remove the seasonal cycle — their “affected by the seasonal variability of temperatures” claim doesn’t apply. Then I noticed something interesting. The estimates from LeMouel et al. drop below zero rapidly — after day 5 for temperature and after day 3 for dT.

Does it seem implausible, on purely physical grounds, that there would be negative autocorrelation between temperature (or dT) and its value just a few days later? It is.

Then how did it happen? Here’s my theory: LeMouel et al. estimated the autocorrelation function (ACF) of actual 21-day intervals. They probably did so for many 21-day intervals and averaged the results, but their estimates are still based on 21-day intervals. But the usual estimate of the autocorrelation function (the “Yule-Walker” estimate) is a biased estimate, biased low, and for short spans of data the bias can be profound.

In fact the bias of the Yule-Walker estimate (and of others like the least-squares estimate) leads to exactly the characteristic pattern observed in the graphs from LeMouel et al., that the estimated ACF rapidly drops below zero even when the true ACF remains positive (as it does, e.g., for an AR(1) process).

We’ve seen this problem before. When Schwartz estimated climate sensitivity using a simple 1-box energy balance model, he estimated autocorrelations using time series of only 125 data points. One of the points made in response was that such a small data set carries a large bias in the estimated autocorrelation, especially when the autocorrelation is sizeable. It was a big problem with data sets of only 125 data points, and it’s a much bigger problem with only 21.

Allow me to illustrate. I generated a 21-day time series of random noise from an AR(1) process, with (true) lag-1 autocorrelation 0.8 (which is big!), and estimated the autocorrelation. In fact I did so 1000 times and averaged the results. Here it is:

Note that the true ACF remains positive at all lags (as is always the case for an AR(1) process), but the estimate drops to zero by lag 4 — even though the true ACF at lag 4 is still sizeable (= 0.4096).

The actual impact of autocorrelation on the variance of a 21-day average is given by

,

where is the variance of the average, is the variance of the data, and is the (true) autocorrelation at lag . I estimated the autocorrelation of the daily high temperature data from Praha, and used this formula to compute the inflation factor for variance and standard deviation (which is the square root of the factor for variance). Result? The standard error in the 21-day averages is inflated by a factor of 2.44. This isn’t as big as the factor of 3 estimated by Legras et al., but it’s a lot bigger than no inflation at all.

On another subject altogether: My wife reminds me that I shouldn’t be an ingrate. So I’d like to express my sincere gratitude to those who’ve made a donation to this blog using the “donate” button at the top right. It really has been a big help. Even the small donations make a real difference, and the not-so-small ones (you know who you are!) have been immensely helpful. Thanks.