Recent decades have seen cooling over the eastern tropical Pacific and Southern Oceans while temperatures rise globally. Climate models indicate that these regional features, and others, are not expected to continue into the future under sustained forcing from atmospheric carbon dioxide increases. This matters because climate sensitivity depends on the pattern of warming, so if the past has warmed differently from what we expect in the future, then climate sensitivity estimated from the historical record may not apply to the future. We investigate this with a suite of climate models and show that climate sensitivity simulated for observed historical climate change is smaller than for long‐term carbon dioxide increases. The results imply that historical energy budget changes only weakly constrain climate sensitivity.

Eight atmospheric general circulation models (AGCMs) are forced with observed historical (1871–2010) monthly sea surface temperature and sea ice variations using the Atmospheric Model Intercomparison Project II data set. The AGCMs therefore have a similar temperature pattern and trend to that of observed historical climate change. The AGCMs simulate a spread in climate feedback similar to that seen in coupled simulations of the response to CO 2 quadrupling. However, the feedbacks are robustly more stabilizing and the effective climate sensitivity (EffCS) smaller. This is due to a pattern effect , whereby the pattern of observed historical sea surface temperature change gives rise to more negative cloud and longwave clear‐sky feedbacks. Assuming the patterns of long‐term temperature change simulated by models, and the radiative response to them, are credible; this implies that existing constraints on EffCS from historical energy budget variations give values that are too low and overly constrained, particularly at the upper end. For example, the pattern effect increases the long‐term Otto et al. (2013, https://doi.org/10.1038/ngeo1836 ) EffCS median and 5–95% confidence interval from 1.9 K (0.9–5.0 K) to 3.2 K (1.5–8.1 K).

1 Introduction The relationship between global surface temperature change and the Earth's radiative response—a measure of the radiative feedbacks in the system and a key determinant of the Earth's climate sensitivity—can vary on timescales of decades to millennia. Thus, feedbacks governing warming over the observed historical record may be different from those acting on the Earth's long‐term climate sensitivity to rising greenhouse gas concentrations (e.g., Armour, 2017; Gregory & Andrews, 2016; Marvel et al., 2018; Proistosescu & Huybers, 2017; Silvers et al., 2018; Zhou et al., 2016). This is in contrast to decades of studies that explicitly or implicitly assume that the relationship between historical temperature change and energy budget variations provides a direct constraint on long‐term climate sensitivity (e.g., Gregory et al., 2002; Otto et al., 2013). The primary reason why radiative feedback and sensitivity is not constant is because climate feedback depends on the spatial structure of surface temperature change (Andrews et al., 2015; Andrews & Webb, 2018; Armour et al., 2013; Ceppi & Gregory, 2017; Haugstad et al., 2017; Rose et al., 2014; Silvers et al., 2018; Zhou et al., 2016, 2017). This evolves on annual to decadal timescales with modes of unforced coupled atmosphere‐ocean variability (e.g., Xie et al., 2016) and spatiotemporal variations in anthropogenic or natural forcings (e.g., Smith et al., 2016; Takahashi & Watanabe, 2016). It also evolves on decadal to centennial timescales in response to sustained anthropogenic forcing due to the intrinsic timescales of the climate response (such as delayed warming in the eastern tropical Pacific and Southern Oceans; e.g., Andrews et al., 2015; Armour et al., 2016; Senior & Mitchell, 2000). Thus, the pattern of historical temperature change, and thus radiative feedback, is expected to be different from that in response to long‐term CO 2 increases (see section 5). We refer to the dependency of radiative feedbacks on the evolving pattern of surface temperature change as a pattern effect (Stevens et al., 2016). Most previous estimates of climate sensitivity based upon historical observations of Earth's energy budget have not allowed for a pattern effect between historical climate change and the long‐term response to CO 2 (e.g., Otto et al., 2013). Armour (2017) found that the equilibrium climate sensitivity (ECS; the equilibrium near surface air temperature change in response to a CO 2 doubling) of atmosphere‐ocean general circulation models (AOGCMs; estimated from simulations of abrupt CO 2 quadrupling [abrupt‐4xCO 2 ]) was about 26% larger than climate sensitivity inferred from transient warming (1% CO 2 simulations, taken to be an analogue for historical climate change) due to pattern effects. Armour (2017) therefore concluded that energy budget estimates of Earth's ECS from the historical record should be increased by this amount. Lewis and Curry (2018) argue for a smaller pattern effect, highlighting ambiguities in the methodology when using idealized CO 2 experiments as an analogue for historical climate change. However, as noted in Armour (2017), the use of 1% CO 2 simulations as an analogue for historical climate change has important limitations in that it neglects the impact from non‐CO 2 forcings and unforced climate variability that could have had a significant impact on the pattern of historical temperature change. In particular, under 1% CO 2 , AOGCMs do not show cooling of the tropical eastern Pacific Ocean and Southern Ocean—features that have been observed over recent decades but are not expected in the long‐term response to increased CO 2 (Zhou et al., 2016). These are regions where atmospheric feedbacks (in particular clouds) are sensitive to the patterns of surface temperature change due to their impact on local and remote atmospheric stability (e.g., Andrews & Webb, 2018; Zhou et al., 2017). This suggests that the magnitude of the pattern effect reported in Armour (2017) may be too low relative to historical climate change. This is an outstanding issue that we aim to address and quantify here. Here we will show that a suite of atmospheric general circulation models (AGCMs) forced with historical (post 1870) sea surface temperatures (SSTs) and sea ice changes are ideal simulations for quantifying the relationship between historical climate sensitivity and idealized long‐term model‐derived ECS. They allow us, for the first time, to quantify the pattern effect associated with observed temperature patterns and so provide improved updates to estimates of climate sensitivity derived from historical energy budget constraints. The work builds upon individual studies (Andrews, 2014; Gregory & Andrews, 2016; Silvers et al., 2018; Zhou et al., 2016). Our aim is to (i) bring together these individual model results for an intercomparison of AGCMs forced with historical SST and sea ice variations, (ii) explore the dependence of the experimental design to the underlying SST and sea ice data set, (iii) explore how historical feedbacks in the AGCMs relate to feedbacks diagnosed from their parent AOGCM forced by abrupt‐4xCO 2 , (iv) quantify the pattern effect causing the difference between climate sensitivity under historical climate change and long‐term CO 2 changes, and (v) use this pattern effect to update observed energy budget constraints on Earth's climate sensitivity.

2 Simulations, Models, and Data Eight AGCMs (Table 1) are forced with monthly time‐varying observationally derived fields of SST and sea ice from 1871 to 2010 using the Atmospheric Model Intercomparison Project (AMIP) II boundary condition data set (Gates et al., 1999; Hurrell et al., 2008; Taylor et al., 2000). All simulations have natural and anthropogenic forcings (e.g., greenhouse gases, aerosols, solar radiation, etc.) held constant at assumed preindustrial conditions (except CAM4, which used assumed constant present‐day conditions; we assume the level of background forcing has no impact on the diagnosed feedback of the model). With constant forcings the variation in radiative fluxes comes about solely from the changing SST and sea ice boundary conditions, allowing radiative feedbacks to be accurately diagnosed directly from top‐of‐atmosphere (TOA) radiation fields (e.g., Haugstad et al., 2017). For details of individual simulations see Gregory and Andrews (2016) for HadGEM2 and HadAM3; Silvers et al. (2018) for GFDL‐AM2.1, GFDL‐AM3, and GFDL‐AM4.0; and Zhou et al. (2016) for CAM4 and CAM5.3. We additionally include simulations from ECHAM6.3, which is closely related to the atmospheric component of the MPI‐ESM 1.2 model to be used in CMIP6. This experiment, referred to here as amip‐piForcing (Gregory & Andrews, 2016), is included in the Cloud Feedback Model Intercomparison Project contribution to CMIP6 (Webb et al., 2017). The sensitivity of the results to the AMIP II boundary condition data set is explored with analogous experiments using the HadISST2.1 SST and sea ice data set (Titchner & Rayner, 2014; supporting information). Table 1. Feedback Parameters in amip‐piForcing (λ amip ) and abrupt‐4xCO 2 (λ 4xCO2 ) Atmospheric General Circulation Model and Atmosphere‐Ocean General Circulation Model experiments Model λ amip λ 4xCO2 S = λ 4xCO2 /λ amip Δλ = λ 4xCO2 – λ amip EffCS amip EffCS 4xCO2 (W·m−2·K−1) (W·m−2·K−1) (W·m−2·K−1) (K) (K) CAM4 −2.27 −1.23 0.54 1.04 1.57 2.90 CAM5.3 −1.71 n/a n/a n/a n/a n/a ECHAM6.3 −1.90 −1.36 0.72 0.54 2.17 3.01 GFDL‐AM2.1 −1.67 −1.38 0.83 0.29 2.01 2.43 GFDL‐AM3 −1.40 −0.75 0.53 0.65 2.13 3.99 GFDL‐AM4.0 −1.91 n/a n/a n/a n/a n/a HadAM3 −1.65 −1.04 0.63 0.61 2.14 3.38 HadGEM2 −1.37 −0.64 0.47 0.73 2.14 4.58 Mean(1.645*σ) −1.74(0.48) −1.07(0.52) 0.62(0.22) 0.64(0.40) 2.03(0.38) 3.38(1.29) All simulations ran for 140 years from January 1871 through to December 2010, except for GFDL‐AM2.1 and GFDL‐AM3, which finished in December 2004. All data are global annual mean, and anomalies are presented relative to an 1871–1900 baseline. CAM4 and CAM5.3 results are single realizations; HadGEM2 and HadAM3 simulations are ensembles of four realizations each; ECHAM6.3, GFDL‐AM2.1, and GFDL‐AM4.0 have five realizations each; while GFDL‐AM3 has six realizations. The HadGEM2 results are not identical to those presented in Gregory and Andrews (2016) because it has been discovered that land cover change was included in their HadGEM2 simulations. We have confirmed that the updated simulations used here, which have constant land cover, do not affect the main conclusions of Gregory and Andrews (2016). In fact the multidecadal variability in feedback in HadGEM2 is now found to be more consistent with their HadAM3 results (section 3). For comparison to long‐term climate sensitivity and feedback parameters we make use of an abrupt‐4xCO 2 simulation of each AGCM's parent AOGCM. For CAM4, GFDL‐AM2.1, GFDL‐AM3, and HadGEM2 we use the CCSM4, GFDL‐ESM2M, GFDL‐CM3, and HadGEM2‐ES CMIP5 abrupt‐4xCO 2 simulations, respectively (Taylor et al., 2012). Feedbacks and associated effective climate sensitivity (EffCS; the equilibrium near surface air temperature change in response to a CO 2 doubling assuming constant feedback strength) are derived from the regression of global annual mean change in radiative flux dN against surface air temperature change dT for the 150 years of the simulation, according to EffCS = −F 2x /λ, where F 2x , the forcing from a doubling of CO 2 , is equal to the dN axis intercept divided by 2 (to convert 4xCO 2 to 2xCO 2 ) and λ, the feedback parameter, is equal to the slope of the regression line (Andrews et al., 2012). We have similar simulations for ECHAM6.3 and HadAM3 using the MPI‐ESM 1.1 and HadCM3 models, respectively, though these are not in the CMIP5 archive. The HadCM3 simulation is only 100 years long but is a mean of seven realizations. CAM5.3 and GFDL‐AM4.0 do not yet have equivalent coupled 4xCO 2 simulations. We choose to use EffCS rather than the true ECS since few AOGCMs are run to equilibrium, and thus, the true ECS is not generally known. Paynter et al. (2018) showed that the actual ECS from multimillennial GFDL‐ESM2M and GFDL‐CM3 simulations was nearly 1 K higher than the EffCS we use here from abrupt‐4xCO 2 . Hence, the values we report for EffCS might be viewed as a lower bound on ECS if other models behave in a similar way.

3 Radiative Feedbacks and Sensitivities Figure 1a shows the global annual mean near surface air temperature change (dT) of the eight individual AGCM amip‐piForcing simulations in comparison to HadCRUT4 (Morice et al., 2012). As expected the models capture the observed variability and trends in dT well (the correlation coefficient, r, between observed and simulated dT is >0.95 for every model). However, the AGCMs omit the small part of the recent warming trend over land that arises as a direct adjustment to changes in CO 2 and other forcing agents (dT in HadCRUT4 averaged over 2000–2010 is 0.79 K, whereas it ranges from 0.66 to 0.76 K in the AGCMs; see also, Andrews, 2014; Gregory & Andrews, 2016). Figure 1b shows the net top‐of‐atmosphere radiative flux change, dN. It is generally negative because as dT increases positively the planet loses heat to space. This relationship is shown in Figure 1c for the multimodel ensemble mean. The slope of the regression line (ordinary least squares, over the annual mean 1871–2010 time series data) measures the feedback parameter λ amip (in W·m−2·K−1), where subscript amip is used to indicate that the feedback parameter was derived from the amip‐piForcing experiment. Individual model results are given in Table 1. Figure 1 Open in figure viewer PowerPoint (a) Comparison of historical near‐surface air temperature change (dT) simulated by the atmospheric general circulation models in amip‐piForcing (individual black lines) against observed (HadCRUT4) variations (red). (b) Time series of the change in net top‐of‐atmosphere radiative flux (dN) in the individual atmospheric general circulation model experiments. (c–f) The relationship and correlation coefficient (r) between the multimodel ensemble mean (c) dN; (d) longwave clear‐sky radiative flux change, dLWcs; (e) shortwave clear‐sky radiative flux change, dSWcs; and (f) cloud radiative effect change, dCRE, against dT. All points are global annual means covering the historical period (1871–2010), and fluxes are positive downward. Changes are relative to an 1871–1900 baseline. The equivalent feedback parameters derived from six available parent AOGCM abrupt‐4xCO 2 simulations (λ 4xCO2 ) are compared to λ amip in Figure 2 and Table 1. We find that λ amip is more negative than λ 4xCO2 in all models. In other words, AGCMs forced with historical SST and sea ice changes robustly simulate more stabilizing feedbacks (lower EffCS) than their parent AOGCM forced by long‐term CO 2 changes. On average, the difference in λ between amip‐piForcing and abrupt‐4xCO 2 is Δλ = λ 4xCO2 − λ amip = 0.64 W·m−2·K−1, ranging from 0.29 to 1.04 W·m−2·K−1 across the AGCMs (Table 1). Figure 2 Open in figure viewer PowerPoint Relationship between the feedback parameter evaluated by regression of dN against dT over the historical period (1871–2010) in amip‐piForcing (λ amip ) and 150 years of abrupt‐4xCO 2 (λ 4xCO2 ) for (a) NET radiative feedback, (b) clear‐sky component, (c) CRE component, (d) LW and SW clear‐sky components, and (e) LW and SW CRE components. (f) Time series of λ amip for individual AGCMs evaluated by linear regression of dN against dT in a sliding 30‐year window in the amip‐piForcing experiments, the year represents the center of the window. Colored circles in (f) with horizontal lines show the feedback parameter values from abrupt‐4xCO 2 . LW = longwave; SW = shortwave. The source of Δλ is shown in Figure 2. The clear‐sky feedback (Figures 1d and 1e) is slightly (but robustly) more negative in amip‐piForcing compared to abrupt‐4xCO 2 (Figure 2b) due to differences in longwave (LW) clear‐sky feedback processes that are partly offset by shortwave (SW) clear‐sky feedback differences (Figure 2d). This difference in clear sky feedback between amip‐piForcing and abrupt‐4xCO 2 explains the relatively small change in net sensitivity between these experiments for the GFDL‐AM2.1 model. For the other models, differences in cloud feedback (measured by changes in cloud radiative effect, CRE) (Figure 1f) are a larger source of the reduced sensitivity in amip‐piForcing (Figure 2c). This mostly comes from SW cloud feedback processes, with historical LW cloud feedback processes generally being representative of that seen in abrupt‐4xCO 2 (Figure 2e). These findings are consistent with process‐orientated studies that suggest lapse‐rate (which affect LW clear sky) and low‐cloud (which affect SW, NET, and CRE) feedbacks vary the most with SST patterns, especially in the Pacific (see below and Andrews et al., 2015; Andrews & Webb, 2018; Ceppi & Gregory, 2017; Rose et al., 2014; Silvers et al., 2018; Zhou et al., 2016, 2017). In amip‐piForcing the model mean EffCS amip = −F 2x /λ amip is ~2 K, ranging from 1.6 to 2.2 K across the AGCMs (Table 1). The narrowness of this EffCS amip range does not arise due to reduced uncertainty in λ amip relative to λ 4xCO2 . On the contrary, the spread (measured by 1.645*σ) in λ amip is almost the same size as the spread in λ 4xCO2 (Table 1). The spread in EffCS amip is narrower primarily because λ amip is on average more negative than λ 4xCO2 . Since EffCS depends on the reciprocal of λ, the same spread in λ, shifted to more negative numbers, will give rise to a narrower spread in EffCS (e.g., Roe, 2009). A similar spread in in λ amip and λ 4xCO2 suggests that different patterns of SST change across AOGCMs do not contribute significantly to the spread in atmospheric feedbacks in abrupt‐4xCO 2 experiments (see also Andrews & Webb, 2018; Ringer et al., 2014), which must therefore come about due to differences in atmospheric physics and parameterizations. EffCS 4xCO2 (of the parent AOGCM) is in all cases larger than EffCS amip , ranging from 2.4 to 4.6 K (Table 1). In the multimodel mean, EffCS 4xCO2 is ~67% larger than that implied from EffCS amip . This model mean historical pattern effect is substantially larger than the 26% found by Armour (2017), supporting the hypothesis that the pattern effect is larger in the historical record than simulated in transient 1% CO 2 AOGCM simulations because the later miss key features of the observed warming pattern. This result is even more striking given that Armour (2017) used an EffCS definition from abrupt‐4xCO 2 that gives larger values than ours (they used years 21–150 of abrupt‐4xCO 2 , whereas we use years 1–150). It is also useful to study shorter time periods to help inform our understanding of the relationship between shorter‐term variations in temperature and radiative fluxes, as have been used by many studies to estimate EffCS particularly since the satellite era (e.g., Forster, 2017). Figure 2f shows the feedback parameter for 30‐year moving windows over the historical period in the AGCM simulations (calculated as per Gregory & Andrews, 2016), in comparison to λ 4xCO2 (horizontal lines). There is a substantial multidecadal variability in the feedback parameter that is common to all models, with a peak in feedback parameter (higher EffCS) around the 1940s and a minimum (lower EffCS) in the most recent decades (post ~1980). Generally, λ amip is always more negative than λ 4xCO2. There are only a few instances where the λ amip is similar to λ 4xCO2 , for example, ~1940 for HadGEM2 and GFDL‐AM2.1, but no instances where λ amip is substantially less negative than λ 4xCO2. The difference is greatest in the most recent decades, suggesting that energy budget constraints on ECS based on recent decades of satellite data will be most strongly biased low. This is consistent with process understanding of the pattern effect, since recent decades have shown substantial cooling in the eastern Pacific and Southern Oceans while warming in the west Pacific warm pool (e.g., Zhou et al., 2016). The cooling in the descent region of the tropical Pacific will favor increased cloudiness (a negative feedback), while warming in the west Pacific ascent region efficiently warms free tropospheric air (increasing the negative lapse‐rate feedback widely across the tropics and midlatitudes) as well as further increasing the lower tropospheric stability and cloudiness in the marine low‐cloud descent regions (Andrews & Webb, 2018; Ceppi & Gregory, 2017; Zhou et al., 2016). Most of the multidecadal variation in feedback strength comes from changes in the strength of cloud feedback (the correlation between the NET and CRE feedback time series, calculated in a similar way, is >0.94 in each AGCM), while the clear‐sky feedbacks show less variation (not shown). This, as well as atmospheric variability, helps explain why cloud feedback is not as linearly correlated to dT variations over the full historical period compared to clear‐sky feedbacks (r = 0.48 for CRE compared to 0.99 and 0.93 for the clear‐sky fluxes, Figures 1d–1f).

4 Constraints on Observed Estimates of Climate Sensitivity The pattern effect causing the difference between simulated EffCS under historical climate change and long‐term CO 2 increase implies that historical energy budget constraints on EffCS do not directly apply to long‐term ECS. To account for this, we use the difference in λ between amip‐piForcing and abrupt‐4xCO 2 as a measure of the pattern effect to update historical energy budget estimates of λ and EffCS. This is in contrast to Armour (2017) who had to use 1% CO 2 simulations as a surrogate for historical climate change. Here we are quantifying the pattern effect associated with patterns of temperature change that actually occurred in the real world, relative to those simulated by AOGCMs to long‐term CO 2 increases. The pattern effect therefore assumes that long‐term warming patterns in AOGCMs not yet seen in the historical record, and the radiative response to them, are credible (see section 5). To illustrate the impact of the pattern effect we use the Otto et al. (2013) historical energy budget constraints as our starting point, though other data sets exist (see Forster, 2017) and clearly the EffCS estimates presented below will depend on this. First, we reproduce the historical EffCS estimates reported in Otto et al. (2013) using their best estimate and 5–95% confidence intervals for the historical (denoted by subscript hist) change in temperature (dT hist = 0.48 ± 0.2 K), heat uptake (dN hist = 0.35 ± 0.13 W/m2) and radiative forcing (dF hist = 1.21 ± 0.52 W/m2) for the 40‐year period 1970–2009 relative to preindustrial (which they define as 1860–1879; their Table S1, row 5). To be consistent with Otto et al. (2013) we also use their forcing and its uncertainty for a doubling of CO 2 (F 2x = 3.44 (±10%) W/m2). We randomly sample (with replacement) 10 million times from the Gaussian distributions of dT hist , dN hist , dF hist , and F 2x to calculate λ hist = (dN hist − dF hist )/dT hist and EffCS hist = −F 2x /λ hist . We assume the uncertainty in F 2x and the greenhouse gas component of dF hist are correlated as in Otto et al. (2013). The resulting EffCS values are binned into intervals of 0.02 and normalized to produce a probability density function (PDF), excluding values less than 0 and greater than 20. The resulting PDF and percentiles (Figure 3, black lines) recovers the Otto et al. (2013) EffCS hist median (1.9 K) and 5–95% confidence interval (0.9–5.0 K) to within 0.1 K. Figure 3 Open in figure viewer PowerPoint 2013 2 . Red accounts for the pattern effect by scaling the historical feedback parameter λ hist by the ratio (S = λ 4xCO2 /λ amip ) of the feedbacks found in the amip‐piForcing and abrupt‐4xCO 2 simulations. Blue accounts for the pattern effect by adding the difference in feedbacks (Δλ = λ 4xCO2 − λ amip ) to λ hist (see section Comparison of the effective climate sensitivity probability distribution function from a historical energy budget constraint (Otto et al.,), before (black) and after (colors) accounting for the pattern effect between historical climate change and abrupt‐4xCO. Red accounts for the pattern effect by scaling the historical feedback parameterby the ratio () of the feedbacks found in the amip‐piForcing and abrupt‐4xCOsimulations. Blue accounts for the pattern effect by adding the difference in feedbacks (Δ) to(see section 4 and Table 1 ). Box plots show the 5–95% confidence interval (end bars), the 17–83% confidence interval (box ends), and the median (line in box). Following Armour (2017), we update the Otto et al. (2013) EffCS estimate for the pattern effect between historical climate change and abrupt‐4xCO 2 using two methods. We first scale the historical feedback parameter λ hist by the ratio of the feedbacks found in the amip‐piForcing and abrupt‐4xCO 2 simulations, so λ = λ hist *S where S = λ 4xCO2 /λ amip (Table 1). EffCS is then given by EffCS = −F 2x /λ = −F 2x /(λ hist *S; equivalent to Equation 4 in Armour, 2017). Alternatively, we update λ hist by the difference in feedbacks, according to λ = λ hist + Δλ, where Δλ = λ 4xCO2 − λ amip . EffCS is then given by EffCS = −F 2x /λ = −F 2x /(λ hist + Δλ; equivalent to Equation 5 in Armour, 2017). We then calculate the EffCS PDF as above by randomly sampling from the F 2x and λ hist distributions, along with S and Δλ chosen randomly with equal likelihood from the individual model results (Table 1). Note that using the difference (Δλ) approach increases the likelihood of returning very large (or even negative) EffCS values, since λ = λ hist + Δλ can result in λ values close to 0 or even with a changed sign when sampling λ hist values that are small. Hence, the results of this method are potentially sensitive to the assumption of excluding negative EffCS values or those greater than 20 K. We compare the PDF of EffCS hist (which is an approximation of Otto et al., 2013) against its updated versions that accounts of the pattern effect in Figure 3. The Otto et al. (2013) median and 5–95% confidence interval increases from 1.9 K (0.9–5.0 K) to 3.2 K (1.5–8.1 K) using the ratio (S) approach (Figure 3, red lines) or 2.7 K (1.1–10.2 K) if we use the difference (Δλ) approach (Figure 3, blue lines). Alternatively, if we take the Otto et al. (2013) data relating to their most recent decade (2000–2009; their Table S1 row 4) then the Otto et al. (2013) estimate and 5–95% confidence interval increases from 2.0 K (1.2–3.9 K) to 3.3 K (1.8–6.8 K) using the ratio approach or 3.0 K (1.5–9.7 K) using the difference approach. Thus, either way and for different time periods, the pattern effect from amip‐piForcing to abrupt‐4xCO 2 results in a substantial median ECS increase, while the lowest values of ECS become less likely, and higher ECS values become much harder to rule out. Another way of estimating the pattern effect is by comparing feedbacks in AOGCM historical simulations to abrupt‐4xCO 2 (e.g., Marvel et al., 2018; Paynter & Frölicher, 2015). However, we believe amip‐piForcing is superior, because (i) the diagnosed pattern effect in an AOGCM historical simulation will depend on its ability to correctly simulate the patterns of historical climate change, including the magnitude and timing of unforced variability, which they are not expected to simulate correctly (e.g., Mauritsen, 2016; Zhou et al., 2016) and (ii) determining feedbacks in AOGCM historical simulations requires knowledge of the time‐varying effective radiative forcing of the model, something which is not routinely diagnosed and is difficult to assume because of model diversity in forcing, particularly from aerosols (Forster, 2017). The amip‐piForcing approach alleviates both of the above issues. Note that for simplicity in the above calculations we have assumed that λ amip (calculated via linear regression over the amip‐piForcing simulations, section 3) is appropriate to the time periods and methodology of Otto et al. (2013; who use finite differences, rather than linear regression, between decades to calculate changes). To check this we recompute λ amip and the corresponding S and Δλ values using the same method and time periods as Otto et al., that is, λ amip = dN/dT, where dN and dT are averaged over the relevant decades (though for 2000–2009 we use the 1995–2004 decade, since the GFDL runs finished in 2004). We cannot use an identical baseline as Otto et al. (2013) since our simulations begin in 1871 and their baseline begins in 1860. Regardless, for 1979–2009 or 2000–2009, the resulting updated EffCS PDF has a median and 5–95% confidence interval to within ±15% of the regression methods used above. Hence, in practice our conclusions are not sensitive on this assumption.

5 Summary and Discussion When AGCMs are forced with historical SST and sea ice changes, the models agree on an EffCS of ~2 K, in line with best estimates from historical energy budget variations (e.g., Otto et al., 2013 2 (~2.4–4.6 K for the corresponding set of models). The lower historical EffCS relative to abrupt‐4xCO 2 is predominantly because LW clear‐sky and cloud radiative feedbacks are less positive in response to historical SST and sea ice variations than in long‐term climate sensitivity simulations. This is an example of what is called a pattern effect (Stevens et al., 2016 2018 2017 2016 The models agree that the most recent decades (e.g., 1980–2010) generally give rise to the most negative feedbacks (lowest EffCS). Hence, the pattern effect will be largest for estimates of feedbacks and EffCS based on the satellite era. This is a period when the eastern tropical Pacific and Southern Oceans, regions important for the pattern effect, have been cooling but are not expected to continue to do so in the long‐term response to increased CO 2 (e.g., Zhou et al., 2016 An intercomparison of AGCMs forced with historical (post 1870) SSTs and sea ice from the AMIP II boundary condition data set reveal some common results: The pattern effect causing the difference between EffCS under historical climate change and long‐term CO 2 changes implies that current constraints on climate sensitivity that do not consider this give values that are too low and are overly constrained, particularly at the upper bound. We present an approach to adjust historical energy budget‐derived EffCS estimates for the pattern effect. For example, the historical (1860–1879 to 1970–2009) observational EffCS estimate (median) and 5–95% confidence interval of Otto et al. (2013) increases from 1.9 K (0.9–5.0 K) to 3.2 K (1.5–8.1 K) using an approach that scales the historical feedback parameter by the ratio of the feedbacks found in amip‐piForcing and abrupt‐4xCO 2 . Thus, the pattern effect increases historical EffCS median values, reduces the likelihood of the lowest EffCS values and makes higher values significantly harder to rule out. Determining whether values toward the extremes of these bounds are plausible would require further understanding of the pattern effect or assessing and combining other lines of evidence, such as from process understanding (see Stevens et al., 2016). This is important because a higher EffCS increases the risk of state‐dependent feedbacks and large warmings (Bloch‐Johnson et al., 2015). The pattern effect between historical climate change and long‐term CO 2 increase assumes that key aspects of long‐term warming patterns simulated by AOGCMs not yet seen in the observational record, such as substantial warming of the Southern Ocean and eastern tropical Pacific, and the radiative response to them, are credible. Such patterns are consistent with paleo records (e.g., Fedorov et al., 2015; Masson‐Delmotte et al., 2013) and basic physical understanding of the behavior and timescale of oceanic upwelling (e.g., Armour et al., 2016; Clement et al., 1996; Held et al., 2010), though they are difficult to observationally constrain (Mauritsen, 2016). To argue for a negligible pattern effect (e.g., Lewis & Curry, 2018) would require that atmospheric feedbacks are insensitive to patterns of temperature change or that the pattern of observed historical temperature change represents the equilibrated pattern response to increased CO 2 . This is at odds with basic physical understanding and bodies of work on the role for unforced variability, transient effects, and non‐CO 2 forcings such as aerosols on the pattern of historical climate change (e.g., Armour et al., 2016; Held et al., 2010; Jones et al., 2013; Takahashi & Watanabe, 2016; Xie et al., 2016). Further progress in constraining the pattern effect and EffCS will come from improved understanding of the causes and processes of surface temperature change patterns in observations and AOGCM projections, as well as the radiative response to them.

Acknowledgments Global annual time series data of temperature and radiative flux change in the amip‐piForcing simulations, as well as the abrupt‐4xCO 2 simulations not in the CMIP5 archive, are provided in the supporting information. We thank Michael Winton, Tom Knutson, Mark Ringer, Gareth Jones, and two anonymous reviewers for constructive comments. T. A., J. M. G., and M. J. W. were supported by the Met Office Hadley Centre Climate Programme funded by BEIS and Defra. P. M. F. was supported by grant NE/N006038/1. K. C. M. was supported by NSF award AGS‐1752796.

Supporting Information Filename Description grl57815-sup-0012-Text_SI-S01.docxWord document, 307 KB Supporting Information S1 grl57815-sup-0002-Data_Set_SI-S01.txtplain text document, 3.5 KB Data Set S1 grl57815-sup-0003-Data_Set_SI-S02.txtplain text document, 2.4 KB Data Set S2 grl57815-sup-0004-Data_Set_SI-S03.txtplain text document, 3.3 KB Data Set S3 grl57815-sup-0005-Data_Set_SI-S04.txtplain text document, 3.3 KB Data Set S4 grl57815-sup-0006-Data_Set_SI-S05.txtplain text document, 12.1 KB Data Set S5 grl57815-sup-0007-Data_Set_SI-S06.txtplain text document, 12.1 KB Data Set S6 grl57815-sup-0008-Data_Set_SI-S07.txtplain text document, 14.3 KB Data Set S7 grl57815-sup-0009-Data_Set_SI-S08.txtplain text document, 12.1 KB Data Set S8 grl57815-sup-0010-Data_Set_SI-S09.txtplain text document, 9.9 KB Data Set S9 grl57815-sup-0011-Data_Set_SI-S10.txtplain text document, 9.9 KB Data Set S10 Please note: The publisher is not responsible for the content or functionality of any supporting information supplied by the authors. Any queries (other than missing content) should be directed to the corresponding author for the article.