Appendices

A The step model

The step model (Good et al. 2011, 2013; Hansen et al. 2011; Gregory et al. 2016) is based on the assumption that the climate responses \(X_{i}(t)\) in the quantities of interest (T and N) to separate forcings \(F_{i}(t)\) combine linearly to give \(X(t)=\sum _{i}\,X_{i}(t)\) in response to the forcings applied together as \(F(t)=\sum _{i}\,F_{i}(t)\). By assuming further that the response to any step-change in forcing depends only on the size of the step and not the nature of the forcing agent, we can estimate the response to historical time-dependent net forcing F(t) due to all agents by treating it as the sum of a set of discrete steps in forcing, such that \(F(t)=\sum _{j=1}^{t}\,[F(j)-F(j-1)]\) where j are successive instants of time (we use a timestep of one year) and \(F(0)=0\). The response of an AOGCM at time t to the forcing increment which occurred at time \(j<t\) is estimated as \(X_{4\times }(t-j+1)[F(j)-F(j-1)]/F_{4\times }\), where \(X_{4\times }\) is the AOGCM’s time-dependent response to the step-change forcing \(F_{4\times }\) in the abrupt4xCO2 experiment, since time t is timestep \(t-j+1\) since the forcing increment \([F(j)-F(j-1)]\) occurred. Hence, adding up the response to all previous increments,

$$\begin{aligned} X(t) = \sum _{j=1}^{t}\,X_{4\times }(t-j+1)\frac{F(j)-F(j-1)}{F_{4\times }}. \end{aligned}$$

Note that the step-model makes no assumption about the value or time-variation of \(\alpha\), except that it is the same for all magnitudes and kinds of forcing.

Fig. 10 Relationships in CMIP5 AOGCM historical experiments between \(\alpha\) evaluated from the ensemble-mean R(t) and T(t), and the ensemble-mean of \(\alpha\) evaluated from R(t) and T(t) in individual integrations, a, b between \(\overline{\alpha }_{i}\) and \(\overline{\alpha }_{e}\), c between time-mean \(\widetilde{\alpha }_{i}\) and time-mean \(\widetilde{\alpha }_{e}\) (see Table 1 for notation). Only those AOGCMs which have more than one ensemble member are included (see Table 2). We use our \(\hbox {AR5}'\) estimate for historicalF(t) for all AOGCMs except HadGEM2-ES and MPI-ESM1.1 (models J and P), for which we use F(t) diagnosed in these models individually (compared in Fig. 1). The dotted line in b is zero on the vertical axis; all models lie very near or above this line, indicating that \(\overline{\alpha }_{e}-\overline{\alpha }_{i}\ge 0\). The dotted line in a, c is 1:1; all models lie very near or to the right of this line in a, indicating that \(\overline{\alpha }_{e}\ge \overline{\alpha }_{i}\) (consistent with b), and in c, indicating that \(\hbox {time-mean}\ \widetilde{\alpha }_{e} \ge \hbox {time-mean}\ \widetilde{\alpha }_{i}\) Full size image

B Choice of independent variable for regression

Ordinary least-squares (OLS) linear regression assumes that all variations in the independent variable x cause proportionate variations in the dependent variable y. If there is “noise” in y, meaning fluctuations that are linearly uncorrelated with the “signal”, which is a function of x, the OLS estimate of the slope \(\mathrm {d}y/\mathrm {d}x\) is imprecise, with a standard error that increases with the amplitude of the noise (Appendix D.2), but it is unbiased, meaning that expectation value of the estimate equals the true value. On the other hand, if our data for x contain some noise which does not cause variations in y i.e. the “true” independent x on which y depends is not precisely known (possible sources of such noise are considered in Sect. 4), the OLS estimate of the slope is biased. It is expected to be smaller than the true value, and the bias grows with the amplitude of the noise (Appendix D.3).

Therefore if one of the variables contains noise which is not correlated with the other variable, the former should be chosen as dependent and the latter as independent, in order to obtain an unbiased estimate of the slope. This is the natural choice for a situation where the independent variable is chosen precisely by the experimenter, and the dependent variable is measured with some uncertainty. In our application, N and T are physically both dependent on the prescribed F, so it is not obvious which of \(R=F-N\) or T we should select as the independent variable.

Because random error is small in the MPI-ESM1.1 historical ensemble mean, we expect the bias in the estimated slope to be small, regardless of whether T or R is chosen as the independent variable. The correlation between T and R is less than unity, so the slopes for the two choices are not quite equal (Appendix D.4), but they are close, namely \(1.36\,{\pm }\,0.04\)\(\hbox {W}\,\hbox {m}^{-2}\,\hbox {K}^{-1}\) for regression of ensemble-mean R against ensemble-mean T, denoted by \(\overline{\alpha }_{e}\) (Table 1, solid line in Fig. 4), and \(1.54\,{\pm }\,0.05\)\(\hbox {W}\,\hbox {m}^{-2}\,\hbox {K}^{-1}\) for T against R (dashed line), where the standard error is inferred from the residual of the fit. Therefore the historical slope for the ensemble mean is \(\overline{\alpha }_{e}=\)1.4–1.5 \(\hbox {W}\,\hbox {m}^{-2}\,\hbox {K}^{-1}\), assuming the underlying physical relationship is truly linear.

The mean of the ensemble of slopes obtained by regression of R against T in the individual integrations is \(\overline{\alpha }_{i}=1.38\,{\pm }\,0.01\)\(\hbox {W}\,\hbox {m}^{-2}\,\hbox {K}^{-1}\) (mean and standard error), not shown in Fig. 4 because it is statistically indistinguishable from \(\overline{\alpha }_{e}\). However, the mean of the slopes from individual members when we regress T against R is quite different (dotted line in Fig. 4, slope of \(2.08\,{\pm }\,0.01\)\(\hbox {W}\,\hbox {m}^{-2}\,\hbox {K}^{-1}\)), and looks like a poor fit to the ensemble-mean data. This bias is the expected outcome of OLS regression of y against x when x contains noise which is uncorrelated with y (Appendix D.3). If there is uncorrelated noise in R, linear regression of T against R gives an estimate of \(\mathrm {d}T/\mathrm {d}R\) which is biased low, and hence its reciprocal \(\overline{\alpha }=\mathrm {d}R/\mathrm {d}T\) is biased high.

To minimise the bias, we prefer to choose T as the independent variable for OLS regression (Appendix D.4), assuming that the noise in R is not correlated with T. Certainly, there appears to be more noise in R than in T (Fig. 3), consistent with physical understanding that T is related to the time-integral of N (although a similar bias in the slope could be caused by correlated noise in T and R, Appendix D.6). The results from the MPI-ESM1.1 are consistent with assuming that T contains no noise, but this may not hold for other AOGCMs.

C Error in estimating climate feedback from a single ensemble member

Using the HadGEM2 historicalF (Sec. 3.1), we carry out the calculations of Appendix B for the HadGEM2-ES historical ensemble, which comprises only five members, a typical size for CMIP5 submissions. We obtain \(\overline{\alpha }_{i}=0.94\,{\pm }\,0.10\)\(\hbox {W}\,\hbox {m}^{-2}\,\hbox {K}^{-1}\) and \(\overline{\alpha }_{e}=1.22\,{\pm }\,0.14\)\(\hbox {W}\,\hbox {m}^{-2}\,\hbox {K}^{-1}\), thus \(\overline{\alpha }_{e}>\overline{\alpha }_{i}\), unlike MPI-ESM1.1, in which we found above that \(\overline{\alpha }_{e}\simeq \overline{\alpha }_{i}\). The correlation coefficient between ensemble-mean R and T is 0.59, weaker than for MPI-ESM1.1 due to the smaller ensemble size and consequently greater noise in the ensemble mean.

For the same calculations with the historical experiments of other CMIP5 AOGCMs we use our \(\hbox {AR5}'\) estimate for F(t) (Sect. 3.3), because F has not been diagnosed in these models. Since F is model-dependent, it may differ from the \(\hbox {AR5}'\) estimate, so \(\overline{\alpha }\) from the regression could be inaccurate; that would be a systematic error that affects all the ensemble members of each model equally, rather than a statistical uncertainty affecting them randomly. Within each model ensemble, noise produces a spread of \(\overline{\alpha }\). The geometrical multimodel mean of the ensemble standard deviation of \(\overline{\alpha }\) is 0.11 \(\hbox {W}\,\hbox {m}^{-2}\,\hbox {K}^{-1}\), \(\sim\) 10% of the multimodel-mean \(\overline{\alpha }_{e}\).

Across AOGCMs, the correlation cofficient of \(\overline{\alpha }_{i}\) and \(\overline{\alpha }_{e}\) is very high (0.96, Fig. 10a) but \(\overline{\alpha }_{e}>\overline{\alpha }_{i}\) (Fig. 10b), as for HadGEM2-ES, except in the MPI and CanESM2 AOGCMs, in which \(\overline{\alpha }_{e}\simeq \overline{\alpha }_{i}\). This is consistent with the bias of OLS regression whereby the slope is underestimated when there is noise in T that is not correlated with R (Appendix B); because the noise is larger in individual integrations than in the ensemble mean, \(\overline{\alpha }_{i}\) is underestimated more severely than \(\overline{\alpha }_{e}\). Furthermore, the bias tends to be greater for larger \(\overline{\alpha }_{e}\) (Fig. 10b, correlation 0.61), consistent with the same explanation (Appendix D.3). The multimodel-mean underestimate of \(\overline{\alpha }_{i}\) with respect to \(\overline{\alpha }_{e}\) is 10%.

As mentioned in Sect. 1, estimates of \(\alpha\) using observed N can be made only from the more recent \(\sim\)30 years, since interannual variation of N is not well enough known at earlier times. To evaluate the effect of the OLS bias on \(\alpha\) estimated from a 30-year period, denoted by \(\widetilde{\alpha }\) (Table 1), with each AOGCM we regress R against T for 30-year periods starting in every year (i.e. they overlap) in every integration, obtaining a timeseries \(\widetilde{\alpha }(t)\) for each integration (following Gregory and Andrews 2016). From these we calculate the ensemble-mean timeseries, denoted by \(\widetilde{\alpha }_{i}(t)\), and its historical time-mean. The time-mean is the expectation value of \(\widetilde{\alpha }\) for a randomly chosen 30-year period of a single integration. The geometrical multimodel mean of the ensemble standard deviation of \(\widetilde{\alpha }\), pooled over years in each model, is 0.42 \(\hbox {W}\,\hbox {m}^{-2}\,\hbox {K}^{-1}\), 30% of the multimodel-mean time-mean \(\widetilde{\alpha }_{e}\). Similarly, from the ensemble-mean R and T of each model we compute the \(\widetilde{\alpha }_{e}(t)\) for 30-year periods and its historical time-mean.

Across models, the correlation coefficient of the time-means of \(\widetilde{\alpha }_{i}\) and \(\widetilde{\alpha }_{e}\) is high (0.88), but time-mean \(\widetilde{\alpha }_{e}\) is greater in all cases (Fig. 10c), consistent with a greater bias of OLS regression for a randomly chosen 30-year period of a single integration than of the ensemble mean, just as for \(\overline{\alpha }_{i}\) and \(\overline{\alpha }_{e}\), but the effect is more pronounced because the noise is more important for a shorter period. The multimodel-mean underestimate of \(\widetilde{\alpha }_{i}\) with respect to \(\widetilde{\alpha }_{e}\) is 20%. Since the CMIP5 ensembles are fairly small, it is likely that \(\widetilde{\alpha }_{e}\) is also biased, and the underestimate of the true value therefore greater.

D Statistical issues in regression

Fig. 11 Illustration of the effect of random noise on ordinary least squares regression. We take the x(t) shown in black in a, with a slope of unity so that \(y=x\), generate many sets of \({\hat{x}}_{i}(t)\) and \({\hat{y}}_{i}(t)\) by adding noise either to y or x, and calculate the distribution of estimated slopes. a Red shows an example with noise in y of standard deviation 0.075 and its regression line, grey envelope is the 5–95% range of regression lines; b distribution of correlation coefficients between \({\hat{x}}_{i}(t)\) and \({\hat{y}}_{i}(t)\) with noise in either x or y; c, d distribution of slopes of regression lines when there is noise in y or x respectively; b–d each show results for noise with three different standard deviations, as indicated by the key in d Full size image

In this appendix, we consider various statistical issues related to the estimation of \(\alpha\) as the slope of the regression of R against T. These issues apply more generally than to those specific variables. The general problem is to estimate the slope m in the linear relationship \(y(t)=mx(t)\), where x and y are timeseries of length n with values at times \(t=\tau _{1}, \tau _{2}, \ldots , \tau _{n}\), given the data \({\hat{x}}_{i}\) and \({\hat{y}}_{i}\), which may differ from x and y because of random noise. (To simplify the formulae we have chosen the origin so that the means of x and y are zero.) In the model world, we may have an ensemble of integrations \(i=1,\ldots \,N\), with the same x and y in all but different noise in each. For ensemble member i, we obtain an estimate \({\hat{m}}_{i}=\mathrm {cov}({\hat{x}}_{i},{\hat{y}}_{i})/\mathrm {var}({\hat{x}}_{i})\) of \(m=\mathrm {d}y/\mathrm {d}x\) by ordinary least-squares linear regression (OLS) of \({\hat{y}}_{i}(t)\) against \({\hat{x}}_{i}(t)\). The OLS estimate minimises the root-mean-square (RMS) of the residuals of the \(y_{i}(t)\) from the fitted line in the y-direction. By doing so it maximises the likelihood that the residuals are consistent with independent identically distributed random noise \(\epsilon _{i}(t)\) in y.

D.1 The difference method is a special case of regression

In the special case of \(n=2\), whatever the noise may be, a straight line can be drawn exactly through the two points \({\hat{x}}=x_{0}\,{\pm }\,\frac{1}{2}{{\Delta }}x\) and \({\hat{y}}=y_{0}\,{\pm }\,\frac{1}{2}{{\Delta }}y\), leaving zero residual. Denoting a mean by \(\mathrm {M}(\cdot )\), we obtain \(\mathrm {M}({\hat{x}})=x_{0}\), \(\mathrm {M}({\hat{y}})=y_{0}\), \({\mathrm{var}}({\hat{x}})={\mathrm{M}}({\hat{x}}^{2})-({\mathrm{M}}({\hat{x}}))^{2}=(\frac{1}{2}{{\Delta }}x)^{2}\), \({\mathrm{cov}}({\hat{x}},{\hat{y}})= {\mathrm{M}}({\hat{x}}{\hat{y}}) -{\mathrm{M}}({\hat{x}}){\mathrm{M}}({\hat{y}}) =\frac{1}{4}{{\Delta }}x\,{{\Delta }}y\). Hence for this case the OLS formula gives \({\hat{m}}=\mathrm {var}({\hat{x}})/\mathrm {cov}({\hat{x}},{\hat{y}})={{\Delta }}y/{{\Delta }}x\), the slope of the line passing through the points. Therefore \({\hat{m}}\) estimated as the slope between the endpoints in \({\hat{x}}\) is a special case of OLS, using a minimal amount of data, and the results derived in this appendix, that \({\hat{m}}\) is uncertain and may be biased on account of noise in x and y, apply to the difference method Eq. 2 just as they do to regression Eq. 3.

D.2 No bias in \({\hat{m}}\) due to uncorrelated noise in y

The rationale for the use of OLS is that the the independent variable \({\hat{x}}_{i}\) is perfectly known but the dependent variable \({\hat{y}}_{i}\) is noisy,

$$\begin{aligned} {\hat{x}}_{i}(t)=x(t) \qquad {\hat{y}}_{i}(t)=y(t)+\epsilon _{i}(t)=mx(t)+\epsilon _{i}(t). \end{aligned}$$ (4)

With these assumptions, \(\mathrm {var}({\hat{x}})=\mathrm {var}(x)\), and

$$\begin{aligned} \mathrm {cov}({\hat{x}},{\hat{y}})=\, & {} \mathrm {cov}(x,mx+\epsilon ) \\=\, & {} \mathrm {M}(x(mx+\epsilon )) - \mathrm {M}(x)\,\mathrm {M}(mx+\epsilon )\\=\, & {} m\,\mathrm {var}(x) + \mathrm {M}(x\epsilon ) \end{aligned}$$

since \(\mathrm {M}(x)=0\). Therefore the OLS slope

$$\begin{aligned} {\hat{m}} = \frac{\mathrm {cov}({\hat{x}},{\hat{y}})}{\mathrm {var}({\hat{x}})} = m + \frac{\mathrm {M}(x\epsilon )}{\mathrm {var}(x)} \end{aligned}$$

is an imprecise estimate of m. However, the expectation value \(E({\hat{m}})=m\), because \(E(\mathrm {M}(x\epsilon ))=0\) if there is no correlation between x and \(\epsilon\); we call the noise “uncorrelated” to indicate that is not correlated with x or y. Thus, the OLS estimate of the slope is not biased by the presence of uncorrelated noise in y.

To illustrate this, we choose a set of \(n=10\) random numbers x(t) in the interval 0–1, and take \(m=1 \Rightarrow y=x\) (x, y and \(y=x\) are shown in black in Fig. 11a). We generate \(N=10^{5}\) instances of \({\hat{y}}_{i}(t)\) from y(t) by adding independent normally distributed \(\epsilon _{i}(t)\) with standard deviation of 0.075. The correlation coefficients of x with \({\hat{y}}_{i}\) have a positively skewed distribution (red in Fig. 11b). We regress each \({\hat{y}}_{i}(t)\) against x(t) to obtain \({\hat{m}}_{i}\) (an example \({\hat{y}}_{i}\) and its regression line are shown in red in Fig. 11a). The distribution of \({\hat{m}}\) is normal, its mean is \(m=1\) and its standard deviation 0.079 (red in Fig. 11c). If we increase the amplitude of noise to 0.100 and 0.125, \({\hat{m}}\) remains unbiased but becomes less precise (standard deviation of 0.105 for green and 0.131 for blue in Fig. 11c), and the correlation is degraded gradually (Fig. 11b).

Although x was chosen randomly, there is no uncorrelated noise in x in this example, because \({\hat{x}}_{i}=x_{i}\). For example, we might have

$$\begin{aligned}{\hat{x}}_{i}(t)&=x_{i}(t)=x(t)+\xi _{i}(t) \\{\hat{y}}_{i}(t)&=y_{i}+\epsilon _{i}(t)

onumber \\&=mx_{i}(t)+\epsilon _{i}(t)\\&=mx(t)+m\xi _{i}(t)+\epsilon _{i}(t), \end{aligned}$$ (5)

where x(t) is the response to external forcing and the same in all ensemble members, while \(\xi _{i}(t)\) is unforced variability that is different in each member. Although \(\xi\) might be called “noise in x”, it is perfectly correlated with noise \(m\xi\) in y. If all variations \(x'\) in \({\hat{x}}\), however they are caused, produce corresponding variations \(mx'\) in \({\hat{y}}\), \({\hat{m}}\) will be an unbiased estimate of m. If x and y are T and R, this is the case which Proistosescu et al. (2018) call “ocean-forced”.

D.3 Bias in \({\hat{m}}\) due to uncorrelated noise in x

If y is not noisy but x contains uncorrelated noise \(\delta _{i}(t)\) in ensemble member i, we have

$$\begin{aligned} {\hat{x}}_{i}(t)=x(t)+\delta _{i}(t) \qquad {\hat{y}}_{i}(t)=y(t)=mx(t), \end{aligned}$$ (6)

which differs from Eq. (5) because the variations \(\delta\) in \({\hat{x}}\) do not produce proportionate variations \(m\delta\) in \({\hat{y}}\). In this situation

$$\begin{aligned} \mathrm {cov}({\hat{x}},{\hat{y}})=\, & {} \mathrm {cov}(x+\delta ,mx) = \mathrm {M}((x+\delta )mx) \\&-\mathrm {M}(x+\delta )\,\mathrm {M}(mx) \\=\,& {} m\,\mathrm {var}(x)+m\mathrm {M}(x\delta ), \end{aligned}$$

and

$$\begin{aligned} \mathrm {var}({\hat{x}})= \,& {} \mathrm {M}((x+\delta )^2)-(\mathrm {M}(x+\delta ))^{2}

onumber \\= \,& {} \mathrm {var}(x)+\mathrm {var}(\delta )+2\mathrm {M}(x\delta ). \end{aligned}$$ (7)

Similiar to Sect. D.2, \(E(\mathrm {M}(x\delta ))=0\) for uncorrelated noise, giving

$$\begin{aligned} {\hat{m}} = \frac{\mathrm {cov}({\hat{x}},{\hat{y}})}{\mathrm {var}({\hat{x}})} \simeq \frac{m}{1+\mathrm {var}(\delta )/\mathrm {var}(x)} < m \end{aligned}$$

i.e. the estimate of the slope is not only imprecise, but also biased low if there is uncorrelated noise in x. (We have written this as an approximation because the expectation value of a ratio does not exactly equal the ratio of expectation values.) The slope is underestimated, through the appearance of \(\mathrm {var}(\delta )\) in the denominator, because OLS assumes that all variations in \({\hat{x}}\) cause variations in \({\hat{y}}\). The larger the ratio of noise to signal \(\mathrm {var}(\delta )/\mathrm {var}(x)\), the greater the bias. This bias has been called “regression dilution” (Frost and Thompson 2000).

We illustrate this case with the same x(t) and y(t) as the previous case, but this time we take \({\hat{y}}(t)=y(t)\) and generate N instances of \({\hat{x}}_{i}(t)\) from x(t) by adding independent normally distributed \(\delta _{i}(t)\). The distribution of \({\hat{m}}_{i}\) from regressing y(t) against \({\hat{x}}_{i}(t)\) is negatively skewed and biased low (median 0.95, 5–95% range 0.85–1.09, red in Fig. 11d). For larger noise, the spread and the bias both increase (median 0.92 for green and 0.88 for blue in Fig. 11d). The distribution of correlation coefficients in the three cases are the same as for noise in y, because the formula is symmetrical in x and y.

In our application we are estimating \(m=\alpha\) from \(R=y\) and \(T=x\). The expected magnitude of the bias in \({\hat{\alpha }}\) is therefore

$$\begin{aligned} E({\hat{\alpha }})-\alpha = \frac{-\mathrm {var}(\delta )}{\mathrm {var}(T)+\mathrm {var}(\delta )}\,\alpha . \end{aligned}$$

If \(\mathrm {var}(T)\) and \(\mathrm {var}(\delta )\) are independent of \(\alpha\), this formula predicts that the expected bias in \({\hat{\alpha }}\) will increase in proportion to \(\alpha\). In our set of model simulations of the past, \(\mathrm {var}(T)\) is not independent of \(\alpha\), because we expect that a model with a larger \(\alpha\) (smaller EffCS) will produce a smaller historical T increase. This makes \(\mathrm {var}(T)\) smaller, \(1/(\mathrm {var}(T)+\mathrm {var}(\delta ))\) larger, and strengthens the dependence of the expected negative bias \(E({\hat{\alpha }})-\alpha\) upon \(\alpha\).

D.4 Correct choice of independent variable

If y is independent and perfectly known while x is dependent and noisy, we should instead minimise the RMS deviations of the x from the fitted line in the x-direction, obtaining from ensemble member i an estimate \({\hat{m}}_{i}^{\dag }=\mathrm {cov}({\hat{x}}_{i},{\hat{y}}_{i})/\mathrm {var}({\hat{y}}_{i})\) of the slope \(\mathrm {d}x/\mathrm {d}y\). The product \({\hat{m}}_{i}^{\dag }{\hat{m}}_{i}=(\mathrm {cov}({\hat{x}}_{i},{\hat{y}}_{i}))^{2}/ (\mathrm {var}({\hat{x}}_{i})\mathrm {var}({\hat{y}}_{i}))=r_{i}^{2}\), where \(r_{i}\) is the (product-moment) correlation coefficient between \({\hat{x}}_{i}\) and \({\hat{y}}_{i}\). Thus the lines fitted in the two ways have equal slopes \({\hat{m}}_{i}=1/{\hat{m}}_{i}^{\dag }\) if and only if \({\hat{x}}_{i}\) and \({\hat{y}}_{i}\) are perfectly correlated or anticorrelated (\(r_{i}=\,{\pm }\,1\)).

In the usual situation of imperfect correlation, the choice of independent variable therefore makes a difference to the OLS estimate of the slope. This is because of the bias caused by noise in the independent variable (Sect. D.3). If one of the variables is noisy and the other is not, we must treat the noisy variable as the dependent one to get an unbiased estimate of the slope.

D.5 Uncorrelated noise in both x and y

If there is independent noise in both x and y, we cannot get an unbiased estimate of m using OLS. This case can be be treated with “orthogonal” or “total least-squares” regression, in which the RMS deviation of the points from the line is minimised in a direction orthogonal to the line, but that requires a prior estimate of the relative size of \(\delta\) and \(\epsilon\), which we do not have. Other methods, called “error in variables”, have been developed for this case (e.g. Cahill et al. 2015).

D.6 Correlated noise in x and y

Another situation to consider is that of correlated noise in x and y. Suppose that

$$\begin{aligned} {\hat{x}}_{i}(t)=x(t)+\xi _{i}(t) \qquad {\hat{y}}_{i}(t)=mx(t)+\mu \xi _{i}(t)+\epsilon _{i}(t), \end{aligned}$$ (8)

where \(\mu\) is a constant and \(\xi _{i}\) is noise that is different in each ensemble member. Because \(\xi _{i}\) affects both \({\hat{x}}_{i}\) and \({\hat{y}}_{i}\), the noise \(\hat{x_{i}}(t)-x_{i}(t)=\xi _{i}(t)\) in x and the noise \(\hat{y_{i}}(t)-y_{i}(t)=\mu \xi _{i}(t) + \epsilon _{i}(t)\) in y have a non-zero correlation coefficient \(\mu \,\mathrm {var}(\xi )/\sqrt{\mu ^{2}\mathrm {var}(\xi )+\mathrm {var}(\epsilon )}\). Now by following the method of Appendix D.3 we obtain

$$\begin{aligned} E({\hat{m}})= \,& {} E\left( \frac{\mathrm {cov}({\hat{x}},{\hat{y}})}{\mathrm {var}({\hat{x}})}\right) \simeq \frac{m\,\mathrm {var}(x) + \mu \,\mathrm {var}(\xi )}{\mathrm {var}(x)+\mathrm {var}(\xi )}\\= \,& {} m\,\frac{1+(\mu /m)(\mathrm {var}(\xi )/\mathrm {var}(x))}{1+(\mathrm {var}(\xi )/\mathrm {var}(x))}, \end{aligned}$$

assuming x and \(\xi\) are uncorrelated.

This case is more general than, and encompasses, all of those previously considered. If \(\mathrm {var}(\xi )\ll \mathrm {var}(x)\), the noise in x is negligible, and we recover \(E({\hat{m}})=m\). If \(\mu =m\), \(y_{i}(t)=m(x_{i}(t)+\xi _{i}(t))\), as in Eq. (5), in which case we have shown that \(E({\hat{m}})=m\) still (Appendix D.2). If \(\mu =0\), the noise in x and y is decorrelated, and \(E({\hat{m}})=m/(1+\mathrm {var}(\xi )/\mathrm {var}(x))<m\) (as in Appendix D.3). The general formula with \(\mu

e 0\) applies to two relevant physical situations in which T is x, R is y and m is the climate feedback parameter for forced climate change on multidecadal timescales.

Firstly, suppose there is unforced variability that arises spontaneously in N and causes correlated variability \(T'\) in T. This is the case which Proistosescu et al. (2018) call “radiatively forced”, and we describe it qualitatively in Sect. 4. We can illustrate the effect with a simple model. Suppose that that the spontaneous random variability \({\Phi }(t)\) in N(t) has a stepwise behaviour, such that \({\Phi }(t)={\Phi }_{j}\) for \(\tau _{j} \le t < \tau _{j+1}\), with a step-change in N of \({\Phi }_{j}-{\Phi }_{j-1}\) at \(t=\tau _{j}\). According to the step model (Appendix A), the response of \(T'\) to \({\Phi }\) is

$$\begin{aligned} T'(t)= & {} \sum _{k=-\infty }^{j}\, \Theta (t-\tau _{k})({\Phi }_{k}-{\Phi }_{k-1})\\= \,& {} \Theta (t-\tau _{j}){\Phi }_{j} + \sum _{k=-\infty }^{j}\, {\Phi }_{k-1}(\Theta (t-\tau _{k-1})-\Theta (t-\tau _{k})) \end{aligned}$$

for \(\tau _{j} \le t < \tau _{j+1}\), where \(\Theta (t)\) is the response of T per unit step-change in forcing at \(t=0\). This \(T'\) response will add a further perturbation \(\alpha T'\) to N, assuming the same climate feedback parameter \(\alpha\) applies to both forced and unforced variations. If \(T_{F}(t)\) is the response of T to external forcing F(t), we have \(T=T_{F}+T'\), \(N=F-\alpha T_{F}+{\Phi }_{j}-\alpha T'\) and \(R=F-N=\alpha T_{F}-{\Phi }_{j}+\alpha T'\). We can rewrite this as

$$\begin{aligned}&T(t)=T_{F}(t)+H(t)+\Theta (t-\tau _{j}){\Phi }_{j} \\&\quad R(t)=\alpha (T_{F}(t)+H(t))+{\Phi }_{j}(\alpha \Theta (t-\tau _{j})-1) \end{aligned}$$

with

$$\begin{aligned} H(t)\equiv \sum _{k=-\infty }^{j}\, {\Phi }_{k-1}(\Theta (t-\tau _{k-1})-\Theta (t-\tau _{k})). \end{aligned}$$

This has the form of Eq. (8) for correlated noise, with \(x=T_{F}+H\), \(\xi =\Theta (t-\tau _{j}){\Phi }_{j}\), \(y=R\), \(\mu =(\alpha \Theta (t-\tau _{j})-1)/\Theta (t-\tau _{j})= \alpha -1/\Theta (t-\tau _{j})\) and \(m=\alpha\), where H is the response of T to \({\Phi }\) earlier than \(\tau _{j}\).

Physically, the correlation arises because the noise in T is the response to \({\Phi }_{j}\), while the noise in R is the sum of \({\Phi }_{j}\) itself and the response in N to \({\Phi }_{j}\). Since the responses to \({\Phi }_{j}\) in both N and T are proportional to \({\Phi }_{j}\), the noise in R and T is correlated. From \(\mu =m-1/\Theta (t-\tau _{j})\) we obtain \(\mu -m=-1/\Theta (t-\tau _{j})<0\) because for climate stability we must have \(\Theta (t)>0\). Hence \(\mu<m\Rightarrow E({\hat{m}})<m\). The climate feedback parameter will inevitably be underestimated if the correlation is due to spontaneous fluctuations in N. The effect is therefore similar to regression dilution (Appendix D.3) but it is not formally the same.

The correlation is present because both \({\Phi }\) and T have non-zero timescales of change. A zero timescale of response in T means it changes instantly when the energy balance is perturbed, keeping the system always in equilibrium with \(\alpha T=F+{\Phi }\). This requires \(\Theta (t)=1/\alpha\) for all \(t>0\), and hence \(\mu =0\), so the correlation vanishes. With stepwise variation, \({\Phi }\) has persistence with a non-zero timescale. This can be removed by replacing its step-changes at times \(\tau _{j}\) with \(\delta\)-function spikes. In that case \({\Phi }=0\) between these times, and \({\Phi }_{j}\) does not appear in \(R=\alpha (T_{F}+T')\). This is the situation of perfectly correlated noise described by Eq. (5), with \(\xi =T'\), effectively the same as no noise, because signal and noise cannot be distinguished.

Secondly, \(\xi\) could represent unforced variability that arises spontaneously in T on interannual timescales, causing an immediate radiative response in R that may have a climate feedback parameter \(\mu

e m\). The estimate of m obtained by regression of R against T will be biased in the direction of \(\mu\) by unforced variability. The larger \(\mathrm {var}(\xi )/\mathrm {var}(x)\), the greater the bias. The ratio will be large if unforced variability is large, or if the record is short and hence shows little forced change. Unlike the previous cases, the bias in \({\hat{m}}\) could be in either direction; when \(\mu \lessgtr m\), \(E({\hat{m}})\lessgtr m\).