Almost two years ago, I wrote a post about running correlations and their problems. It is still a well read post. I wish it was better read.

It is not that running correlation cannot be be a useful tool. If a good correlation has been found between two variables, it can be useful to test how consistent this correlation is over time. But if the correlation between two variables is weak and non-significant, then running correlations risks being a data dredging technique.

Case in point: Jiang et al (2015) who reconstruct Holocene sea-surface temperatures (SST) just north of Iceland using diatoms and relate the variability in SST to cosmogenic isotopes (an indicator of solar variability) using running correlation.

The abstract starts

Mounting evidence from proxy records suggests that variations in solar activity have played a significant role in triggering past climate changes.

As readers of my critical evaluations of papers reporting of solar-palaeoecology links will know, much of this “mounting evidence” is weak. How robust is Jiang et al?

Lets start with a detour into the transfer function that Jiang et al use to reconstruct SST from their fossil diatom assemblage. For reasons I don’t understand, they cite me (Telford and Birks 2009) when reporting that they test six transfer function methods. We didn’t and testing so many methods this risks a model selection bias (Telford et al 2004). In a previous paper, Jiang et al (2005) cite Juggins and ter Braak (1992) for an identical phrase about six methods.

Jiang et al (2015) settle on a four component weighted average-partial least squares model. They claim this is a parsimonious choice having used a five component model in Jiang et al (2005). I suspect using so many components (rare to need more than two) means that they have a spatial autocorrelation problem although the modern analogue technique (which normally does well in such cases) performs surprisingly badly relative to the other methods. It would have been good if they had tested if there was a spatial autocorrelation problem – code is available.

The choice of a four component WAPLS model won’t bias the results, but it might make the model performance appear better than it really is and make the reconstruction more variable. As it is, almost all the high frequency variability in the SST reconstruction is less than ±1°C, about the same at the root mean square error of prediction, so potentially a lot of this variability is just noise.

The chronology is based on tephra layers, circumventing any problems with a variable radiocarbon reservoir effect, and the sedimentation rate is fairly linear over most of the Holocene. The chronology is as good as chronologies get for marine cores, but still the chronological uncertainty on the pre-settlement tephras is about 100 years, enough to matter for a high resolution correlation.

What about the relationship between the reconstruction and solar activity? Jiang et al start by showing that the long term trends match the orbitally driven decline in summer insolation, as do many proxy records of summer temperature in the North Atlantic region. Next they compare the reconstruction with cosmogenic isotopes, detrending both records with a 6th order polynomial and then using a 50 year lowpass filter to remove high frequency variability. (There must also be an undocumented interpolation step to even temporal spacing.)

For the period 9500-4500 BP, there is no obvious correlation between SST and the solar proxy. For the period 4500-0 BP at least some of the wiggles align, as would be expected for smoothed data. Jiang et al don’t report or test the overall correlations between the solar activity proxy and the SST reconstruction, instead they proceed directly to a running correlation. Jiang et al base their significance level of the running correlation on a Monte Carlo test using surrogate time series with the same temporal autocorrelation as the SST reconstruction (they use Ebisuzaki’s (1997) phase randomisation method). This is good: often correlations are either not quantified or autocorrelation is ignored (eg Jiang et al 2005). However, Jiang et al. (2012) do not take account of the multiple testing inherent in a running correlation.

How serious a problem is multiple testing for Jiang et al? I’ve repeated their analysis as well as I can (they leave several details undocumented – how was SST interpolated (sampling resolution varies from 2 to >50 years), what filter did they use for the low pass). I find the absolute maximum correlation in a running correlation with a window width of 2000 years, step size 100 years, for 1000 phase-randomised detrended-SST surrogates. The 95th percentile of this null distribution is 0.44. Almost exactly the same as the absolute maximum correlation of Jiang et al’s running correlation. Rather than suggesting a strong link between solar activity and SST over the last 4000 years, Jiang et al’s result is on the cusp of statistical significance at the p=0.05 level. Not the worst result possible, but it makes their story less persuasive. My choice of methodological details may have affected the significance threshold somewhat.

Jiang et al also run a spectral analysis that finds several peaks that are close to some of the solar cycle frequencies, but not others.

Of course Jiang et al have an explanation of why their reconstruction is only sensitive to solar variability some of the time (more sensitive in cool climates). However plausible these explanations are, without supporting evidence we have to ask whether a more parsimonious explanation is that the on-off correlation between solar activity and the SST construction is due to chance.

(hat tip to Kaustubh Thirumalai @holy_kau)