[1] We utilize energy budget diagnostics from the Coupled Model Intercomparison Project phase 5 (CMIP5) to evaluate the models' climate forcing since preindustrial times employing an established regression technique. The climate forcing evaluated this way, termed the adjusted forcing (AF), includes a rapid adjustment term associated with cloud changes and other tropospheric and land‐surface changes. We estimate a 2010 total anthropogenic and natural AF from CMIP5 models of 1.9 ± 0.9 W m −2 (5–95% range). The projected AF of the Representative Concentration Pathway simulations are lower than their expected radiative forcing (RF) in 2095 but agree well with efficacy weighted forcings from integrated assessment models. The smaller AF, compared to RF, is likely due to cloud adjustment. Multimodel time series of temperature change and AF from 1850 to 2100 have large intermodel spreads throughout the period. The intermodel spread of temperature change is principally driven by forcing differences in the present day and climate feedback differences in 2095, although forcing differences are still important for model spread at 2095. We find no significant relationship between the equilibrium climate sensitivity (ECS) of a model and its 2003 AF, in contrast to that found in older models where higher ECS models generally had less forcing. Given the large present‐day model spread, there is no indication of any tendency by modelling groups to adjust their aerosol forcing in order to produce observed trends. Instead, some CMIP5 models have a relatively large positive forcing and overestimate the observed temperature change.

1 Introduction [2] Radiative forcings (RFs) are used extensively to quantify the drivers of climate change. Forcings can prove very useful in understanding differences between model responses to alternative forcing agents [Shine and Forster, 1999; Hansen et al., 2005]. Offline comparisons between the radiative transfer codes used in atmosphere–ocean general circulation models (AOGCMs) with more accurate line‐by‐line codes have identified potentially important sources of error (> 20%) in how AOGCM radiative transfer codes compute RF [Collins et al., 2006; Forster et al., 2011] so it is important to test the veracity of their forcing estimates when running in coupled mode. However, this calculation of RF is difficult in practice and within climate models adjusted forcings (AFs) are more readily calculated from standard diagnostics using either fixed sea‐surface temperature (SST) [Hansen et al., 2005] or linear regression techniques [Gregory et al., 2004]. [3] AFs are similar to RFs but additionally include rapid adjustments to the land‐surface and troposphere that typically occur within a few days of applying a forcing and are largely due to cloud changes in the troposphere [Andrews and Forster, 2008; Dong et al., 2009; Andrews et al., 2012a]. Importantly these rapid adjustments depend on the magnitude and nature of the forcing agent rather than on global‐mean temperature change [Gregory and Webb, 2008; Andrews et al., 2010], and it has been argued [Rotstayn and Penner, 2001; Gregory and Forster, 2008; Lohmann et al., 2010; Bala et al., 2009] that they are more appropriately regarded as forcings rather than feedbacks. [4] Forster and Taylor [2006], hereinafter FT06, developed a methodology to diagnose globally averaged AF in Coupled Model Intercomparison Project phase 3 (CMIP3) models, and we use the same approach here within Coupled Model Intercomparison Project phase 5 (CMIP5) models, taking advantage of their improved diagnostics and additional integrations to improve the methodology. We use these CMIP5 diagnostics to determine globally averaged AF components and energy budget changes since 1850 and use these to investigate how gross characteristics of the models evolve, concentrating on the factors influencing the spread of simulated time series for global average surface temperature and AF.

2 Methodology [5] The FT06 method makes use of a global linearized energy budget approach where the top of atmosphere (TOA) change in energy imbalance (N) is split between a climate forcing component (F) and a component associated with climate feedbacks that is proportional to globally averaged surface temperature change (ΔT), such that: (1) −2 K−1. To remove the effects of any preindustrial energy imbalance, N and ΔT are quantified as the difference from a preindustrial control simulation. CMIP5 models provide a long preindustrial control simulation from which the historical simulations branch. AOGCMs require a long spin up period for the ocean, and their preindustrial control simulations are not necessarily in equilibrium. Further, even if the surface climate is near a steady state, the TOA net radiation anomaly may still be nonzero as deep‐ocean temperatures continue to evolve. The preindustrial climates of the CMIP5 models analyzed were much closer to equilibrium and had less drift than the CMIP3 models. Nevertheless, some energy imbalance remained (Figure The FT06 method makes use of a global linearized energy budget approach where the top of atmosphere (TOA) change in energy imbalance (N) is split between a climate forcing component (F) and a component associated with climate feedbacks that is proportional to globally averaged surface temperature change (ΔT), such that:where α is the climate feedback parameter in units of W m. To remove the effects of any preindustrial energy imbalance, N and ΔT are quantified as the difference from a preindustrial control simulation. CMIP5 models provide a long preindustrial control simulation from which the historical simulations branch. AOGCMs require a long spin up period for the ocean, and their preindustrial control simulations are not necessarily in equilibrium. Further, even if the surface climate is near a steady state, the TOA net radiation anomaly may still be nonzero as deep‐ocean temperatures continue to evolve. The preindustrial climates of the CMIP5 models analyzed were much closer to equilibrium and had less drift than the CMIP3 models. Nevertheless, some energy imbalance remained (Figure 1 ). In most models, this imbalance was due to problems with closure of their energy budgets rather than a discernible drift. To address this, the individual flux terms and temperatures used in equation ( 1 ) were generated by subtracting any imbalance and its drift from the equivalent segment of each model's own preindustrial control simulation. This drift was calculated as a linear trend over the control segment and removed from the N and ΔT time series of the forced scenarios. Figure 1 Open in figure viewer PowerPoint −2) for the CMIP5 models. These were averaged over the entire preindustrial control period. Note additional models are included, compared to the main analysis (compare Table Preindustrial TOA energy imbalance (W m) for the CMIP5 models. These were averaged over the entire preindustrial control period. Note additional models are included, compared to the main analysis (compare Table 1 ). Table 1. CMIP5 Models Employed in This Paper and Their Feedback Components Computed Adjusted Forcing Climate Sensitivities (K) Transient Feedbacks (W m−2 K−1)a Feedbacks (α) (W m−2 K−1) 2×CO2 (W m−2) ECS TCR ρ Κ LW clear sky SW clear sky Cloud: CRE derived Net ACCESS1‐0 2.98 3.83 2.00 1.49 0.71 1.63 −0.77 −0.08 0.78 bcc‐csm1‐1 3.23 2.82 1.70 1.90 0.76 1.91 −0.83 0.07 1.14 bcc‐csm1‐1‐m 3.55 2.87 2.10 1.69 0.45 1.98 −0.68 −0.06 1.24 CanESM2 3.84 3.69 2.40 1.60 0.56 1.88 −0.71 −0.13 1.04 CCSM4 3.57 2.89 1.80 1.98 0.75 1.95 −0.87 0.16 1.23 CNRM‐CM5 3.72 3.25 2.10 1.77 0.63 1.73 −0.78 0.20 1.14 CSIRO‐Mk3‐6‐0 2.59 4.08 1.80 1.44 0.81 1.70 −0.84 −0.23 0.63 FGOALS‐s2 3.85 4.17 2.40 1.60 0.68 1.46 −1.02 0.48 0.92 GFDL‐CM3 2.99 3.97 2.00 1.50 0.75 1.94 −0.70 −0.48 0.75 GFDL‐ESM2G 3.09 2.39 1.10 2.81 1.52 1.65 −0.61 0.26 1.29 GFDL‐ESM2M 3.36 2.44 1.30 2.58 1.20 1.63 −0.58 0.33 1.38 GISS‐E2‐H 3.81 2.31 1.70 2.24 0.59 1.67 −0.49 0.47 1.65 GISS‐E2‐R 3.78 2.11 1.50 2.52 0.73 1.66 −0.36 0.48 1.79 HadGEM2‐ES 2.93 4.59 2.50 1.17 0.53 1.66 −0.65 −0.37 0.64 inmcm4 2.98 2.08 1.30 2.29 0.86 1.98 −0.67 0.12 1.43 IPSL‐CM5A‐LR 3.10 4.13 2.00 1.55 0.80 1.99 −0.53 −0.70 0.75 IPSL‐CM5B‐LR 2.66 2.61 1.50 1.77 0.75 1.88 −0.59 −0.28 1.02 MIROC5 4.13 2.72 1.50 2.75 1.23 1.85 −0.84 0.51 1.52 MIROC‐ESM 4.26 4.67 2.20 1.93 1.02 1.93 −0.83 −0.19 0.91 MPI‐ESM‐LR 4.09 3.63 2.00 2.05 0.92 1.79 −0.71 0.04 1.13 MPI‐ESM‐P 4.31 3.45 2.00 2.16 0.91 1.80 −0.65 0.10 1.25 MRI‐CGCM3 3.25 2.60 1.60 2.03 0.78 1.99 −0.83 0.09 1.25 NorESM1‐M 3.11 2.80 1.40 2.22 1.11 1.86 −0.86 0.11 1.11 Multimodel mean 3.44 3.22 1.82 1.96 0.83 1.81 −0.71 0.04 1.13 90% uncertainty 0.84 1.32 0.63 0.73 0.41 0.25 0.24 0.53 0.51 [6] As in FT06, we use a two‐step process to derive time series for F. Step 1 uses CO 2 ‐only climate‐simulations to diagnose α terms using linear regression. As in Andrews et al. [2012b], this analysis uses the CMIP5 abrupt 4xCO 2 simulations and regresses N against ΔT to diagnose the 4xCO 2 AF as an intercept term and α as the slope of the regression line. Component α terms are presented in Table 1. Then, assuming α is both independent of forcing agent and time invariant, Step 2 employs equation (1) to diagnose the time series for F in a transient scenario run, using diagnostics of N and ΔT. In step 2 we substitute these α terms into equation (1), using N and ΔT diagnostics from various forced scenarios to compute each model's AF. The AF calculation is performed for the three historical scenarios from the late 19th century to 2005 (Historical ‐ all natural and anthropogenic forcings; HistoricalGHG ‐ long‐lived greenhouse gas changes only; and HistoricalNat ‐ natural solar and volcanic forcings only), and the four Representative Concentration Pathways (RCPs) of future anthropogenic changes in atmospheric composition (RCP2.6, RCP4.5, RCP6.0, and RCP8.5). These RCPs are named after the 2100 RF they aim to generate relative to 1750 [Meinshausen et al., 2011]. RCP2.6 should have a peak RF of 3 W m−2 declining to 2.6 W m−2 by 2100. RCP4.5 and RCP 6.0 should have RFs close to 4.5 W m−2 and 6.0 W m−2, respectively, on stabilization of greenhouse gas concentrations after 2100. RCP8.5 should lead to a RF close to 8.5 W m−2 by 2100. However, Meinshausen et al. [2011] found that integrated assessment models generated smaller RFs in 2100, namely 2.5, 4.1, 5.3, and 8.2 W m−2 for RCP2.6, RCP4.5, RCP6.0, and RCP8.5 respectively. [7] The original FT06 analysis differed from the analysis here (hereinafter referred to as FT06‐updated) into its approach to step 1. In the original FT06 method, each modeling groups' estimate of their model's 2xCO 2 RF, along with N and ΔT values from 1% per year CO 2 increase runs, were used to determine α. The RF was taken as the stratospherically adjusted Intergovernmental Panel on Climate Change (IPCC) forcing definition [Ramaswamy et al., 2001], whereas the forcing methodology in Step 2 has a component of rapid adjustment, as the N time series used to diagnose F was measured as monthly TOA fluxes in a scenario integration that would be continually adjusting to the underlying forcing. Therefore, steps 1 and 2 in the original method used inconsistent forcing definitions. By contrast, in FT06‐updated, step 1 diagnoses both AF and α as the intercept and slope of the regression line, respectively, and therefore uses AF consistently in steps 1 and 2. [8] To elucidate the role of historical forcings other than greenhouse gases, the HistoricalNat and HistoricalGHG scenarios were subtracted from the full historical simulation. Assuming linearity, the resulting residual Historical‐nonGHG scenario was taken to represent the combined effects of aerosol as well as any land‐use and ozone changes. Previous assessments have suggested that forcings from ozone and land‐use could more or less cancel each other in the global mean so that this residual would be dominated by aerosol effects [Forster et al., 2007; Skeie et al., 2011]. For example, Forster et al. [2007] estimated global‐mean RFs in 2005 of: +0.3 W m−2 from ozone changes; −0.2 W m−2 from land‐use albedo changes; and −0.5 W m−2 and −0.7 W m−2 for aerosol direct and indirect effects, respectively. [9] Not all models had the complete set of energy budget variables needed for the sensitivity and forcing analysis. The models in Table 1 were those with the necessary data, as of November 2012. All available ensemble members were used in the analysis and averaged over.

3 AFs [10] Figure 2 shows the time evolution of globally averaged surface temperature and calculated AF, relative to the preindustrial climate, for historical and future scenarios. The variation of AF across models and scenarios is shown in Figure 3. Figure 4 breaks down the components of AF in the models for year 2003 (2001–2005 average) and year 2095 (2090–2100 average). Figure 2 Open in figure viewer PowerPoint Historical‐nonGHG scenario is computed as a residual and approximates the role of aerosols (see (Top) The globally averaged surface temperature change since preindustrial times and (bottom) computed net AF. Thin lines are individual model results averaged over their available ensemble members, and thick lines represent the multimodel mean. Thescenario is computed as a residual and approximates the role of aerosols (see section 2 ). Figure 3 Open in figure viewer PowerPoint (a) Time series of AF from the different historical scenarios. (b) Time series of AF from the different future scenarios. Figure 4 Open in figure viewer PowerPoint Diagnosed AFs (since preindustrial) for the Historical, HistoricalGHG, and RCP8.5 scenarios. The historical scenarios give the AF for 2003 (2001–2005 average) and the RCP scenario for 2095 (2091–2099 average). AFs are given for the LW clear‐sky forcing, the SW clear‐sky forcing, the CRE‐derived cloud forcing, and the net forcing. Note that the CRE‐derived cloud forcing includes a component due to cloud masking effects. Error bars represent the standard deviation of the model range. [11] AFs and temperature changes for the individual models in these years are given in Tables 2 and 3 respectively. In the historical simulations, the 2003 AF (2001–2005 average) was found to be 1.7 ± 0.9 W m−2 from the Historical simulation, 2.4 ± 0.8 W m−2 from the HistoricalGHG simulation, 0.1 ± 0.2 W m−2 from the HistoricalNat simulation, and −0.8 ± 0.9 W m−2 from the Historical‐nonGHG residual simulation. This gives an anthropogenic (Historical minus HistoricalNat) AF of 1.6 W m−2 ± 0.8 in 2003. All errors represent the 5%–95% model range. Multimodel mean AFs for the RCP scenarios all depart from their expected RFs (Table 2 and Figure 2). RCP forcing estimates in 2095 are less than their targeted forcing, but agree very well with the forcing estimates derived from Integrated Assessment Modelling [Meinshausen et al., 2011]. When the different efficacies of the various forcing agents are accounted for, Mienshausen et al. find effective forcings in 2095 of 2.3, 3.9, 5.2, and 8.0 W m−2 for RCP2.6, RCP4.5, RCP6.0, and RCP8.5, respectively, within 10% of the CMIP5 model mean given in Table 2. Table 2. AFs for Different Scenarios Given at 2003 (2001–2005 Average), 2010 (2008–2012 Average), and 2095 (2091 to 2099) Adjusted Forcing (W m−2) for Scenario and Period Hist 2003 HistGHG 2003 HistNat 2003 Hist NonGHG 2003 RCP 4.5 2010 RCP2.6 2095 RCP4.5 2095 RCP6.0 2095 RCP8.5 2095 ACCESS1‐0 1.1 1.4 3.3 6.2 bcc‐csm1‐1 2.2 2.0 0.1 0.0 2.0 2.5 3.3 4.5 7.0 bcc‐csm1‐1‐m 2.2 2.2 1.9 3.3 4.3 7.0 CanESM2 2.0 2.4 0.1 −0.5 2.2 2.9 4.3 8.4 CCSM4 2.5 2.3 0.1 0.1 2.7 2.8 4.3 5.4 8.3 CNRM‐CM5 1.5 2.2 0.1 −0.8 1.2 2.3 3.7 6.9 CSIRO‐Mk3‐6‐0 0.9 1.4 0.1 −0.6 1.0 1.9 2.8 3.4 5.7 FGOALS‐s2 2.3 2.8 2.5 4.3 6.5 10.0 GFDL‐CM3 1.1 2.9 0.5 −2.2 1.7 3.1 4.2 4.9 7.2 GFDL‐ESM2G 2.0 1.9 1.2 2.8 3.9 6.4 GFDL‐ESM2M 2.0 2.5 0.2 −0.7 2.2 2.5 3.5 4.9 7.3 GISS‐E2‐H 2.3 3.2 0.2 −1.0 GISS‐E2‐R 2.5 3.3 0.2 −0.9 2.5 2.6 4.7 5.9 8.6 HadGEM2‐ES 0.8 1.9 0.1 −1.1 1.0 1.7 2.9 4.0 5.9 inmcm4 1.7 1.9 3.8 7.3 IPSL‐CM5A‐LR 1.9 2.4 0.2 −0.7 1.8 2.2 3.5 4.3 7.1 IPSL‐CM5B‐LR 1.0 MIROC5 1.6 2.0 3.0 4.5 5.3 8.7 MIROC‐ESM 1.1 2.2 0.0 −1.0 1.5 2.8 4.0 5.1 8.2 MPI‐ESM‐LR 2.1 2.3 2.2 3.9 7.7 MPI‐ESM‐P 2.3 MRI‐CGCM3 1.2 2.1 0.2 −1.1 1.2 2.1 3.6 4.3 7.0 NorESM1‐M 1.4 2.3 0.0 −0.9 1.7 2.0 3.6 4.2 7.0 Multimodel mean 1.7 2.4 0.1 −0.8 1.9 2.3 3.7 4.7 7.4 90% uncertainty 0.9 0.8 0.2 0.9 0.9 0.8 0.9 1.3 1.8 [12] The 5%–95% uncertainty range of AF in the HistoricalGHG simulation in 2003 is ± 0.8 W m−2, which is nearly as large as the spread associated with nongreenhouse gas AF (Table 2). The evolution of net AF and surface temperature shows considerable spread among models (Figures 2 and 3). The fractional spread of net AF tends to grow much more in the historical period than in the future (Figure 3). Examining Figure 3a and Table 2, natural forcing differences contribute least to the fractional model spread and greenhouse gas, and nongreenhouse gas forcing contribute in roughly equal proportions. [13] Figure 4 examines the components of AF. The positive longwave (LW) clear‐sky forcing is associated with greenhouse gas changes and has least spread between models. The cloud AF terms are calculated from anomalies in cloud radiative effect (CRE) where all‐sky and clear‐sky fluxes are differenced. Because radiative anomalies due to changes in forcing agents, water vapor, surface albedo, etc. are smaller in the presence of clouds than they would be in the absence of clouds, CRE‐derived cloud AF estimates include a component of cloud masking. Model differences in aerosol forcings, rapid adjustments, and/or cloud masking effects can all contribute to the CRE‐derived cloud AF spread. [Zelinka et al., manuscript in revision, 2012]. A LW cloud masking effect of roughly +0.6 W m−2 is expected from a doubling of CO 2 [Andrews and Forster, 2008; Soden et al., 2008; Colman and McAvaney, 2011]. We adopt the sign convention that the cloud masking effect represents an additional positive forcing that needs to be added to CRE‐derived terms. As the forcing from CO 2 is currently around half of its doubled CO 2 value, this suggests that around +0.3 W m−2 of cloud masking needs to be added to the Historical CRE‐derived cloud AF terms. The RCP 8.5 CRE‐derived cloud AF would need to have a larger component of masking added, around +0.6 W m−2. The shortwave (SW) clear‐sky AF and CRE‐derived cloud AF split would also be affected by cloud masking of sea‐ice changes. Nevertheless, a negative CRE‐derived cloud AF beyond that which is expected from cloud masking is seen in all the scenarios in Figure 4. [14] The Historical‐nonGHG AF shows a generally negative trend that turned weakly positive around 1990 in most models (Figures 2 and 3), although some models show a strongly negative AF and others have an AF near zero or slightly positive (Figure 3). Because of the multiple forcing agents represented in the Historical‐nonGHG scenario, the CMIP5 model spread in its AF of −0.8 ± 0.9 W m−2 in 2003 is difficult to interpret (see section 4).

4 Comparing Forcing Definitions [15] In order to interpret the AFs given in section 3, it is important to understand their uncertainty. Here we test three aspects of the analysis: (1) limitations of the two step AF process, (2) representing cloud AF using CRE‐derived AFs, and (3) using the Historical‐nonGHG scenario as a proxy for aerosol AF. 4.1 Limitations of the Two‐step AF Process [16] FT06 found that forcings from the two‐step regression procedure agreed with offline RF calculations in two models. However, variation in climate sensitivity could in principle bias the AF estimates. While some bias cannot be ruled out, for a scenario with CO 2 increasing at 1% per year, ensemble mean AF (derived using the FT06‐updated method) has been found to increase linearly with time (to within the precision set by internal variability), as expected if climate sensitivity were approximately constant [Good et al., 2012 Held et al., 2010 (2) (3) FT06 found that forcings from the two‐step regression procedure agreed with offline RF calculations in two models. However, variation in climate sensitivity could in principle bias the AF estimates. While some bias cannot be ruled out, for a scenario with COincreasing at 1% per year, ensemble mean AF (derived using the FT06‐updated method) has been found to increase linearly with time (to within the precision set by internal variability), as expected if climate sensitivity were approximately constant [.,]. To test this further, we compared the FT06‐updated AF with an AF derived from transient experiments where SSTs are prescribed from observations [.,]. The SST‐derived method used two transient integrations, one with forcing agents and one without. The run with changes in forcing agents gives a heat balance described by equation ( 1 ), and the run without changes in forcing agents gives a heat balance described by (note the primes):where F′ = 0 by definition. As SSTs are identically prescribed in both, ΔT ~ ΔT′, and substituting equation ( 2 ) into equation ( 1 ) gives:AFs derived from these two definitions are compared in Figure 5 . Although there is considerable variability in the FT06‐updated AF, its AF seems to agree very well with the prescribed SST‐derived AF from a 10‐ensemble member average in this one CMIP3 model. The AFs calculated from the two methods could diverge if the integration continued beyond 2000 out to 2100. Nevertheless, this comparison gives some confidence that differences between the FT06‐updated AF and other AF estimates are comparable and not affected by an error associated with possible climate sensitivity drift with the FT06‐updated methodology. Figure 5 Open in figure viewer PowerPoint Held et al., 2010 A comparison of two methods of calculating AF in the CMIP3 GFDL CM2.1 model. The black line is a calculation of AF that uses two prescribed SST integration experiments, with and without forcing agents, and compares TOA fluxes [.,]. The AF in the red line employs our FT06‐updated method in the same model. 4.2 Representing Cloud AF Using CRE‐derived AFs [17] To test the CRE‐derived AF estimates and examine if they arise from a rapid adjustment of cloud or from cloud masking, cloud‐induced radiation anomalies can be computed directly from cloud anomalies diagnosed by the ISCCP simulator [Klein and Jakob, 1999; Webb et al., 2001] in combination with cloud radiative kernels [Zelinka et al., 2012]. The kernels quantify the impact on TOA radiative fluxes of cloud fraction perturbations for each of the 49 different ISCCP simulator cloud types. Multiplying cloud fraction anomalies by the kernels yields TOA radiation anomalies that are purely a result of cloud changes and are free of any noncloud effects. Therefore, we refer to the cloud AFs and feedbacks that are computed from these cloud‐induced anomalies as “unmasked,” to be distinguished from those derived using CRE, which include masking effects. [18] To derive cloud AFs, we follow the exact same FT06‐updated procedure as described in section 2, but replace N in equation (1) with cloud‐induced radiative flux anomalies, so that α is the unmasked cloud feedback. The unmasked cloud feedback α terms are derived from the abrupt 4xCO 2 runs in Zelinka et al. [manuscript under revision 2012] for the five models that have archived the necessary diagnostics. The CRE‐derived and unmasked LW, SW, and net cloud AFs in 2003 for the Historical run are compared in Figure 6. As expected, the unmasked LW cloud AF is systematically more positive than the CRE‐derived value in every model (0.56 W m−2 larger on average), and the unmasked SW cloud AF is systematically less positive or more negative than the CRE‐derived value (0.32 W m−2 smaller on average). This brings the unmasked negative net cloud AF in 2003 closer to zero (−0.33 rather than −0.57 W m−2) and increases the spread in this quantity among the five models. That the unmasked net cloud AF is nonzero indicates that cloud rapid adjustments are physically occurring and are tending to reduce the effective climate forcing. The difference between the unmasked and CRE‐derived cloud AFs quantifies the amount of cloud masking in the section 3 estimates of AF. The net cloud masking effect at the end of the Historical run in these 5 models is systematically positive and averages to 0.24 W m−2. In agreement with expectations from section 3, this is roughly half of the value expected for doubling of CO 2 . Figure 6 Open in figure viewer PowerPoint Multimodel mean and standard deviation of the global‐mean cloud AFs for the unmasked (i.e., cloud kernel‐derived) AF and CRE‐derived AF. Cloud AFs are given for LW, SW, and net variables for five GCMs averaged over years 2001–2005 of the Historical simulations. Unmasked minus CRE‐derived cloud AFs gives an estimate of the cloud masking of the forcing. [19] The SW cloud AF dominates over the LW cloud AF in every model, in agreement with previous studies. However, Zelinka et al. [manuscript under revision, 2012] find a positive unmasked SW rapid adjustment cloud AF under 4xCO 2 for all five models, which raises the question of why most (three out of these five) models give negative unmasked SW cloud AFs in 2003 given that CO 2 is the dominant forcing agent in the latter part of the Historical run. This may be evidence that the non‐CO 2 forcing agents (which are present in the Historical run but not in the idealized 4xCO 2 runs) cause significant cloud adjustments, even if they are not the ones responsible for most of the unadjusted forcing (just like cloud feedbacks are responsible for most of the spread in climate feedback, whereas water vapor is responsible for most of the ensemble mean feedback). Previous studies have found large cloud forcing from rapid adjustments associated with perturbations to the solar constant, black carbon, and ozone [e.g., Hansen et al., 2005; Bala et al., 2009; Ban‐Weiss et al., 2011], but these cloud forcing vary considerably between the location and magnitude of the forcing agent and the model. On the other hand, our diagnosed cloud AF could be an artefact of the assumptions inherent in the two‐step regression technique. 4.3 Using the Historical‐nonGHG Scenario as a Proxy for Aerosol AF [20] To test the aerosol AF estimate, we examined fixed‐SST experiments existing in the CMIP5 archive. In these experiments, individual forcing agents have been introduced; present‐day aerosol perturbation experiments exist for three models, and their AFs can be compared to the FT06‐updated AFs, taken from the Historical‐nonGHG simulations. The fixed‐SST AFs are taken as the difference of TOA fluxes between a forced and a preindustrial control experiment (as in equation (3)). These AFs are given in Table 4, which also shows AFs from the FT06‐updated method, repeated from Table 2. The AFs derived by the two methods are appreciably different, indicating that other nongreenhouse forcing agents, such as land‐use and ozone, as well as the aerosol signal affect the Historical‐nonGHG simulations. Table 3. Temperature Changes Since Preindustrial Times for Different Scenarios Given at 2003 (2001–2005 Average), 2010 (2008–2012 Average), and 2095 (2091 to 2099) Temperature Change Since Preindustrial (K) for Scenario and Period Hist 2003 Hist GHG 2003 Hist Nat 2003 Hist NonGHG 2003 RCP 4.5 2010 RCP2.6 2095 RCP4.5 2095 RCP6.0 2095 RCP8.5 2095 ACCESS1‐0 0.6 0.8 2.7 4.8 bcc‐csm1‐1 1.2 1.4 0.1 −0.3 1.4 2.0 2.5 3.1 4.6 bcc‐csm1‐1‐m 1.7 1.8 2.0 2.7 3.2 4.8 CanESM2 1.0 1.6 −0.1 −0.4 1.2 2.3 3.2 5.5 CCSM4 1.3 1.3 0.0 −0.1 1.3 1.9 2.7 3.2 4.7 CNRM‐CM5 1.0 1.3 0.1 −0.4 1.1 1.8 2.7 4.5 CSIRO‐Mk3‐6‐0 0.7 1.2 0.2 −0.7 0.7 1.9 2.5 2.9 4.8 FGOALS‐s2 1.8 2.0 2.1 3.0 4.4 6.6 GFDL‐CM3 0.3 1.8 −0.1 −1.4 0.9 2.1 2.9 3.5 5.1 GFDL‐ESM2G 0.8 1.0 0.8 1.6 2.2 3.6 GFDL‐ESM2M 0.8 1.0 0.0 −0.2 0.8 1.3 1.8 2.3 3.5 GISS‐E2‐H 1.2 1.4 0.1 −0.3 GISS‐E2‐R 1.1 1.2 0.2 −0.3 1.1 1.4 2.2 2.6 3.7 HadGEM2‐ES 0.5 1.5 0.0 −1.0 0.7 1.7 2.8 3.6 5.2 inmcm4 0.8 0.9 2.0 3.5 IPSL‐CM5A‐LR 1.4 1.9 0.2 −0.7 1.5 2.3 3.3 3.8 5.8 IPSL‐CM5B‐LR 0.9 MIROC5 0.6 0.8 1.4 2.1 2.5 4.0 MIROC‐ESM 0.7 1.3 0.0 −0.6 1.0 2.3 3.1 3.7 5.5 MPI‐ESM‐LR 1.0 1.2 1.5 2.5 4.6 MPI‐ESM‐P 1.0 MRI‐CGCM3 0.6 1.1 0.1 −0.6 0.5 1.3 2.1 2.4 3.9 NorESM1‐M 0.7 1.2 −0.1 −0.4 1.0 1.4 2.2 2.5 4.0 Multimodel mean 1.0 1.4 0.1 −0.5 1.1 1.8 2.5 3.1 4.6 90% uncertanity 0.6 0.4 0.2 0.6 0.6 0.7 0.8 1.1 1.4 Table 4. AFs Calculated for Aerosol‐only Perturbations in Fixed‐SST Experiments Compared to AFs for 2003 From the FT06‐updated Historical‐nonGHG Residual Scenario Model Net Clear Sky Cloud: CRE Derived Forcing (W m−2) Fixed SST CanESM2 −0.86 −0.59 −0.28 CSIRO‐Mk3 −1.41 −1.04 −0.37 HadGEM2‐ES −1.23 −0.35 −0.88 FT06‐updated residual CanESM2 −0.51 −0.33 −0.18 CSIRO‐Mk3 −0.61 −0.59 −0.02 HadGEM2‐ES −1.12 −0.66 −0.46 [21] This section has shown that it is not appropriate to represent aerosol AF by the Historical‐nonGHG residual scenario and that CRE‐derived cloud AFs may not be representative of actual AFs from rapid cloud adjustment. Nevertheless the net AF does correctly capture both RF and cloud adjustment and could be expected to match other AF estimates over 1850–2100 simulations and can therefore provide useful insights into the causes of global‐mean temperature change, examined next.

5 Intermodel Temperature Spread [22] This section uses the AFs diagnosed in section 3 to help understand the gross characteristics of the CMIP5 models' surface temperature response. In particular, we focus on how differences in forcing and climate sensitivity affect the intermodel spread of surface temperature change. [23] A model's historical temperature trend depends on forcing, climate sensitivity, and ocean heat uptake. As aerosol forcing and climate sensitivity are uncertain, modeling centers could be modifying their controlling factors to reproduce the observed globally averaged 20th century temperature trends as well as possible. There was some evidence of a trade off between climate sensitivity and forcing in CMIP3 and earlier generations of models [Kiehl, 2007; Knutti, 2008]. Figure 7 reproduces Figure 1 of Kiehl [2007] for CMIP5 models and finds considerably smaller correlation than in either the CMIP3 analysis of Knutti [2008] or the older model analysis of Kiehl [2007] that are reproduced as blue and red symbols, respectively. The R2 fit in CMIP5 models is slightly smaller than in CMIP3 models and is not significant. The green squares show a subset of the CMIP5 models that match the observed century‐scale linear temperature trends (0.57 to 0.92 K increase over 1906–2006, IPCC [2007]). This subset reproduces the Kiehl [2007] fit almost perfectly. The CMIP5 models that are not in this grouping tend to have a larger positive AF compared to those that match observations and thereby overestimate the observed temperature trend. Variation in the magnitude of the CO 2 AF affects both the AF in 2003 and the equilibrium climate sensitivity (ECS). Figure 8 shows that both AF in 2003 and the 2xCO 2 AF are positively correlated with α [see also Andrews et al., 2012b]. This means that models with smaller climate feedbacks (i.e., higher sensitivities) tend to also have smaller CO 2 AFs which would act to converge models towards similar Historical temperature responses. Figure 7 Open in figure viewer PowerPoint Knutti [ 2008 Kiehl [ 2007 Kiehl [ 2007 Historical simulation. A subset of CMIP5 models is shown by the green squares that are within the 90% uncertainty range of the observed 100 year linear temperature trend. These models have 1906–2005 linear trends between 0.56 K and 0.92 K, the IPCC [2007] 90% uncertainty range. R2 values are computed with respect to the nonlinear fit shown. The relationship between 2003 AF and ECS in CMIP5 and earlier generations of models. CMIP3 numbers are taken from] and older models from]. The solid line fits are made using the inverse relationship between forcing and climate sensitivity postulated by]. Data are shown for all CMIP5 models as black diamonds, using thesimulation. A subset of CMIP5 models is shown by the green squares that are within the 90% uncertainty range of the observed 100 year linear temperature trend. These models have 1906–2005 linear trends between 0.56 K and 0.92 K, the[2007] 90% uncertainty range. Rvalues are computed with respect to the nonlinear fit shown. Figure 8 Open in figure viewer PowerPoint Scatterplots of (a) Historical 2003 AF against α and (b) 2xCO 2 AF against α. [24] The transient response of a model depends on ocean heat uptake as well as the ECS. If modelling groups are adjusting forcing to match the observed temperature trends then one might expect that the correlation between 2003 AF and the transient climate response (TCR) to be larger than the correlation between 2003 AF and ECS. However, these correlations are −0.11 and −0.41, respectively, and neither is significant at the 5% level. [25] The causes of model spread can be further examined by using the approach of Gregory and Forster [ 2008 (4) The causes of model spread can be further examined by using the approach of], whereby the global‐mean temperature change under a scenario of continually increasing forcing is:where the climate resistance ρ = α + κ, κ being the ocean heat uptake efficiency. [26] The estimates of ρ and κ from the 1% per year CO 2 increase simulations are given in Table 1. The α values used are derived from the 4xCO 2 abrupt integration from section 3 and are also presented in Table 1. The α values derived from the 1% per year CO 2 increase integration (not shown) were very similar to values diagnosed from the 4xCO 2 abrupt integration [see also Kuhlbrodt and Gregory, 2012]. Figure 9 examines how AF in 2003, ρ, α, and κ influence the temperature change. As expected, AF / ρ (Figure 9a) explains most of the variation in temperature, and AF (Figure 9b) is by far the most important influence. Models with a Historical AF in 2003 that is more positive than about 2 W m−2 typically have a temperature change that is larger than observed. In contrast, ρ, α, and κ (Figures 9c, 9d, and 9e) show no systematic tendency for affecting temperature. For example, the HadGEM2‐ES and GFDL‐CM3 models exhibit two of the smallest temperature changes but also have two of the smallest α values (high ECS). Therefore, their small temperature change results primarily from a small forcing. These results suggest that AF in some models may be too positive to accurately reproduce historic temperature trends. Figure 9 Open in figure viewer PowerPoint Scatterplots of (a) AF/ρ, (b) AF, (c) ρ, (d) α, and (e) κ against the temperature change in 2003 from the Historical simulation. [27] Multiple linear regression was used to model the CMIP5 spread of temperatures using explanatory variables of AF, α, ρ, and κ from Tables 1 and 2. Of these, the strongest correlation was found between AF in 2003 and α at 0.62 (see Figure 8a). ρ and κ were somewhat positively correlated with F, but not by as much (0.45 and 0.02 respectively). These correlations mean that whilst models with larger AF generally have larger feedback parameters (smaller sensitivities) and more efficient ocean heat uptake (larger κ), no clear pattern of compensation emerges between climate model feedback parameters, or ocean heat uptake, and AF (see also Figure 9). [28] Figure 10 compares ρ derived from two RCPs with increasing forcing over 2000–2050, with ρ derived from the 1% per year CO 2 increase simulation that is used to define TCR. Estimates of ρ are generally well correlated between the RCP scenarios and the 1% per year CO 2 increase simulation. κ values are not shown but follow a similar pattern. The 1% per year run has a larger forcing increase than RCP 8.5, and models have a consistently larger κ and ρ for this scenario than those derived from the other scenarios. Likewise, RCP8.5, compared to RCP 4.5 has a larger forcing increase and larger κ and ρ over the period. A more rapid forcing increase would be better at maintaining stronger vertical temperature gradients within the ocean. These would be expected to be more efficient at transferring heat from the surface to the subsurface ocean, leading to a larger κ and, therefore, a larger ρ value. Figure 10 Open in figure viewer PowerPoint Resistance (ρ) derived from 2000–2050 trends in the RCP 4.5 and RCP 8.5 scenarios compared to those derived for the 1% per year CO 2 increase scenario that is used to diagnose the TCR. The black line represents the 1:1 relationship. [29] Figure 11a shows how the standard deviation in AF and temperature change projections between models varies with time for the RCP 8.5 scenario. Note the similarity of the two quantities, consistent with the expectation from equation (4) that temperature change is proportional to AF if climate resistance is constant. The coefficient of variation (standard deviation/mean) is largest for the present day (Figure 11b) because the standard deviation does not grow as rapidly as the model mean. Figure 11 Open in figure viewer PowerPoint The (a) standard deviation and (b) coefficient of variation (standard deviation/mean) between models for temperature (black) and AF (red) as a function of time for the RCP8.5 scenario. Note the different time scales on the x axis. [30] Multiple linear regression was performed on the model temperature change in 2010 and 2095, regressing the temperature change across models against their AF in the same year and α. Examining model spread, an across‐model regression of temperature change simultaneously against α and AF gave a good fit to the data for both 2010 and 2095 (see Figure 12). In RCP4.5, this regression explained 72% of the variation in temperature change and slope coefficients for both AF and α were statistically significant at the 0.1% significance level. For 2010 data, AF explained the largest proportion of variation in the temperature change (49%) with α improving the fit across the full range of temperature changes. In contrast, α explained the largest proportion of variation in the temperature change in the 2095 data (42%) with forcing improving the fit particularly for data points with more extreme (both large and small) temperature changes. Temperature change is much more sensitive to variations in α in the 2095 data than in the 2010 data, with a regression slope coefficient of 1.45 ± 0.22 for 2095 compared to 0.56 ± 0.21 for 2010. There was no significant difference in sensitivity to AF between 2010 and 2095. Figure 12 Open in figure viewer PowerPoint Modeled temperature changes for 2010 and 2095 for the RCP 4.5 scenario, compared to fitted values from the linear regression. The red line represents the 1:1 relationship. The fitted values are for the linear regression with both α and AF included as explanatory variables. [31] This analysis shows that large forcing differences between models today give a large spread in model temperature change. This is partly due to the current strong aerosol forcing that varies considerably between models, but this aerosol forcing is projected to weaken. Any relationship between α and AF has little effect on model spread, and there is no indication of models herding towards similar 20th century temperature trends. In the future, the role of forcing remains important, and, therefore, differences in forcing will need to be considered when comparing model simulations within a given scenario.

6 Discussion and Conclusions [32] The estimated anthropogenic AF of 1.6 W m−2 ± 0.9 and the estimated greenhouse gas AF of 2.4 ± 0.8 W m−2 in 2003 agree well with the last IPCC report and more recent estimates of RF, even though the definition of the two forcings differ. For example, Forster et al. [2007] estimated a total anthropogenic forcing of 1.6 ± 1.0 W m−2 in 2005, and Skeie et al. [2011] estimated a year 2000 greenhouse gas RF of 2.5 W m−2. [33] The total AF from CMIP5 models, estimated to be 1.7 ± 0.9 W m−2 in 2003, grows to 1.9 ± 0.9 W m−2 in 2010. In contrast to the 2007 IPCC estimate, where the spread was principally attributed to aerosols, the spread found here comes from both nongreenhouse gas forcing agents and differences in the rapid adjustment of cloud to greenhouse gases. [34] The AF estimates made in this paper include a significant cloud component that acts to make the AF smaller than the expected RF. Because of this, the projected 2095 AFs are lower than the corresponding estimate of RF from the original RCP scenario. However, they agree well with the effective forcing estimate of the integrated assessment models [Meinshausen et al., 2011]. Consistent with a lower AF, Andrews et al. [2012b] found that CMIP5 models had a 4xCO 2 AF that ranged between 5.6 and 8.5 W m−2 and was, on average, 0.4 W m−2 lower than the expected RF of 7.4 W m−2. Figures 1 and 3 in Andrews et al. [2012b] suggest that rapid adjustments within this framework are not necessarily an immediate physical cloud change but could also be associated, in some AOGCMs, with a nonlinear response in SW CRE principally found over oceans. This is further supported in Zelinka et al. [manuscript in preparation, 2012] who show that unmasked cloud AFs diagnosed using this linear framework (i.e., the linear regression line intercept) tend to be negatively biased with respect to those diagnosed in fixed SST and perturbed CO 2 simulations. These caveats limit our ability to interpret RF and AF differences as a genuine cloud adjustment. [35] Generally, it would be useful to test the FT06‐updated approach under a wider set of models and scenarios to better quantify and understand its errors, quantify differences with other AF methodologies, and quantify the role of rapid adjustment. [36] Issues remain around the definitions of AF and the assumption of constant climate sensitivity within a transient forcing framework. The forcing/climate sensitivity concept developed essentially for slab‐ocean models at equilibrium obviously does not provide a complete picture of climate evolution in today's nonlinear AOGCMs. Nevertheless, we argue that forcings are useful for understanding why models differ in their gross behavior and forcings explain the spread of RCP projections rather well. Careful analysis of the Earth's energy budget examining climate response on multiple timescales is recommended.

Acknowledgments [37] PF was supported by EPSRC grant EP/I014721/1 and A Royal Society Wolfson Merit Award. We acknowledge the World Climate Research Program's Working Group on Coupled Modelling, which is responsible for CMIP, and we thank the climate modelling groups (listed in Table 1 of this paper) for producing and making available their model output. For CMIP, the U.S. Department of Energy's Program for Climate Model Diagnosis and Intercomparison provides coordinating support and led development of software infrastructure in partnership with the Global Organization for Earth System Science Portals. TA and JG were supported by the Joint DECC/Defra Met Office Hadley Center Climate Program (GA01101). We thank Isaac Held for providing the forcing data from Held et al. [2010]. The contribution of MDZ was performed under the auspices of the U.S. Department of Energy by Lawrence Livermore National Laboratory (LLNL) under Contract DE‐AC52‐07NA27344 and was supported by the LLNL Institutional Postdoctoral Program. Very helpful review comments were provided by Jeff Kiehl, and two anonymous reviewers.