Appendix: An interpolation-free algorithm for estimating Haar fluctuations

Paleotemperatures are typically nonuniformly sampled in time. Sometimes—such as in the case of the Epica series used in Fig. 1b—the problem is due to the compression of the ice with depth and can be somewhat alleviated by sampling the deeper reaches of the core at higher rates (e.g. the high resolution section of the GRIP core shown in Fig. 2b). However, the usual remedy is to interpolate the series and then to resample it at a uniform temporal interval/resolution. While for many purposes this may be adequate, for either spectral or fluctuation analyses it may lead to biases and spurious results. The reason is that interpolation assumes that the curve is not only continuous between points, but also that the series T(t) is differentiable (in the common case of cubic spline interpolation, up to third order!). However in the small scale limit, in a scaling regime, the mean derivatives of order >H diverge. Since we have found empirically that all the relevant atmospheric regimes have H < 1, even linear interpolation may give spurious results. Indeed, any linearly interpolated part of the T(t) series will at least locally have H = 1 since over such segments, ΔT(Δt) ≈ Δt. Therefore if these regions are too numerous, including the fluctuation statistics over linear segments will introduce biases.

One of the many advantages of Haar fluctuations is that they are quite easy to estimate without any interpolation while accurately taking into account the resolution of the data. We now describe the simple algorithm used in Figs. 4b–e and 5 (note that several of these series were already analysed but using interpolation). Assume that there are N measurements of temperature T(t i ) at time t i where i is an index 1 through N. Define the running sum S i :

$$S_{i} = \sum\limits_{j \le i} {T(t_{j} )}$$ (3)

Consider an index j and an even number k. The j, k fluctuation \(\varDelta T_{j,k}\) over the interval [t j , t j+k ] can be estimated as follows. First determine the sums of the T(t i ) over the first and second halves the interval:

$$\varDelta S^{(1)} = S_{j + k/2} - S_{j} ;\quad \varDelta S^{(2)} = S_{j + k} - S_{j + k/2}$$ (4)

in the case of regular sampling, the ratio:

$$\varepsilon = \frac{{t_{j + k/2} - t_{j} }}{{t_{j + k} - t_{j} }}$$ (5)

has the value ε = 1/2.

The Haar fluctuation is simply the average of the first half minus the average of the second half of the interval and can thus be estimated as:

$$\varDelta T_{j,k} = \frac{2}{{t_{j + k} - t_{j} }}(S^{(1)} - S^{(2)} )$$ (6)

However, if ε is too far from 1/2, this estimate may be poor. Therefore, in the calculation of the statistical moments we should only keep the corresponding fluctuations on condition that ε min < ε < (1-ε min ) where 0 < ε min < 1/2 is a parameter that can be adjusted so as to make the condition as restrictive as we like: exactly uniform sampling corresponds to the limit ε min −> 1/2. Decreasing ε min has the effect of losing precision in the scale Δt, hence it smooths the S(Δt) curve. However, taking ε min too close to ½ will result in the rejection of too many fluctuations with the consequence that the statistics will be poor. In the present case, it was found that generally ε min = 1/4 was a reasonable compromise (see Fig. 6). One can check the accuracy by seeing how much the statistics change when ε min is varied (if they don’t vary much then the choice of ε min is acceptable). Note also that as usual, the fluctuations are multiplied by an extra “calibration” constant (taken throughout this paper = 2). This ensures that they are quite close to differences in regions where H > 0 and close to tendencies (averages with the means removed) in regions where H < 0. Once the fluctuations are estimated, S q (Δt) can be estimated by “binning” the fluctuations into “bins” with Δt regular spaced logarithmically. For each bin, the various powers of ΔT are averaged, in our implementation of the algorithm we used 20 bins per order of magnitude in Δt (the software available from http://www.physics.mcgill.ca/~gang/software/index.html).

Fig. 6 A comparison of the Epica analysis using a uniform sampling on the linearly interpolated data using the same number of data points as in the original series (5,788 points, interpolated resolution 138 years), magenta, and the result of the interpolation free algorithm described here using ε min = 0.25 (blue). The main differences are at the small and large ∆ t’s. The magenta interpolated curve is reproduced from Lovejoy (2013) Full size image

While the above procedure essentially solves the problem of “holes” in the series, it does not remove possible biases that arise from systematic sampling nonuniformities such as those arising from cores with high temporal sampling rates near the surface and systematically lower rates at depth. When applied to such series, the small Δt part of the S(Δt) function will be sampled from the top part of the core where all the high resolution data lie. Therefore the high frequencies will be biased towards the near surface statistics. However, if the statistics are fairly homogeneous in time—as they typically are (see Fig. 5)—then this is unimportant (see however Lovejoy and Schertzer 2013 for evidence of exceptional Holocene statistics in Greenland).