Abstract Future global warming estimates have been similar across past assessments, but several climate models of the latest Sixth Coupled Model Intercomparison Project (CMIP6) simulate much stronger warming, apparently inconsistent with past assessments. Here, we show that projected future warming is correlated with the simulated warming trend during recent decades across CMIP5 and CMIP6 models, enabling us to constrain future warming based on consistency with the observed warming. These findings carry important policy-relevant implications: The observationally constrained CMIP6 median warming in high emissions and ambitious mitigation scenarios is over 16 and 14% lower by 2050 compared to the raw CMIP6 median, respectively, and over 14 and 8% lower by 2090, relative to 1995–2014. Observationally constrained CMIP6 warming is consistent with previous assessments based on CMIP5 models, and in an ambitious mitigation scenario, the likely range is consistent with reaching the Paris Agreement target.

INTRODUCTION Both international climate assessments [e.g., Intergovernmental Panel on Climate Change (IPCC) Assessment Reports (1)] and national climate scenarios rely heavily on results from multiple climate model simulations collected in model intercomparisons. Hence, the reliability of and confidence in these model intercomparisons have a wide-ranging influence on science and ultimately policy-targeted science communication. Model intercomparisons have always featured diverging model projections, for example, for the question of how much warming to expect for a doubling of global atmospheric CO 2 concentration. However, the spread across such ad hoc model ensembles of opportunity is challenging to interpret (2). This is because not all models are equally plausible (3), and the multimodel spread may be partly inconsistent with evidence from observations, theory, or process understanding. The range of models may be too wide when unrealistic models are included or too narrow when models underestimate uncertainties from processes that are not or poorly represented. The multimodel mean may be biased high or low when many models are biased in the same way or when near-duplicate models are included (4). It is therefore essential to relate and, when necessary, recalibrate (e.g., by reweighting models) the raw spread of such model ensembles, based on other constraints from process evidence, past trends, climatology, or probabilistic estimates from perturbed physics ensembles, to produce projections (including robust uncertainty estimates) of future climate that are consistent with our understanding and with observations of the current climate. The long-term warming range of the Coupled Model Intercomparison Project Phase 5 (CMIP5) (5) models was interpreted in the IPCC Fifth Assessment Report (AR5) (1) to be unbiased in its raw mean, but the 5 to 95% ranges in global temperature projections were interpreted as “likely” (>66% probability) to account for structural model uncertainties. Phase 6 of the Coupled Model Intercomparison Project (CMIP6) will inform much of the physical science basis for the upcoming Sixth Assessment Report (AR6) of the IPCC (6). It includes the latest generation of comprehensive Earth system models (ESMs), driven by historical greenhouse gas concentrations, and followed by different future greenhouse gas and aerosol concentrations according to the Shared Socioeconomic Pathways (SSP) scenarios (7). The first models submitted to the archive suggest that CMIP6 will span a wider range of warming responses than CMIP5. Several ESMs submitted to CMIP6 have equilibrium climate sensitivity (ECS) values (table S1) higher than any of the CMIP5 models (8), and a third of CMIP6 models submitted to date (10 of 29 models; table S1) exceed the range of 1.5° to 4.5°C for ECS assessed as likely (17 to 83% range) in the IPCC AR5 report. Note that for simplicity we use the term “equilibrium climate sensitivity,” although the values are derived from nonequilibrium conditions and rather represent the “effective climate sensitivities” [i.e., a measure of the feedbacks during the transient regime that is extrapolated to equilibrium (9)]. As a result of higher climate sensitivity values, future climate projections from these models show stronger future global mean warming than the warming previously reported in AR5, although a direct comparison is challenging due to a novel generation of emission scenarios used to drive the models (10). Some models, for instance, project warming of 2.5° to 3°C for scenarios that were designed to be consistent with the Paris temperature target of well below 2°C (7). Therefore, the critical question arises whether projections of such models with high future warming are realistic. If they are, that would result in much higher risks and costs of future climate change than previously assessed and imply even faster mitigation to achieve climate targets. If the models, on the other hand, are biased high, that would imply that climate assessments need to recalibrate the raw ensemble. A more near-term (transient) global warming that arises after 70 years of a 1% per year increase in atmospheric CO 2 concentration is referred to as the transient climate response (TCR). TCR and ECS metrics are often used to develop and calibrate simple climate model emulators, which are used with integrated assessment models and provide policy-relevant information regarding emission pathways and related climate responses (11). Estimates of TCR also affect the allowed carbon emissions for the Paris Agreement climate target (12) and are important for climate projections and risk assessment (13), with substantial economic benefits resulting from narrowing down the TCR range (14). Therefore, consistency of the simulated TCR range with observational evidence is crucial and potentially narrowing the spread of TCR benefits not only the climate science community but also many other sectors. Here, we make use of an emergent relationship between the simulated warming trend in recent decades and projected future warming in different emission scenarios, as well as between the simulated warming and the more idealized metric of future warming (TCR). On the basis of these correlations across models, we constrain the ranges of TCR and future warming projections.

DISCUSSION Our results show that most models with high climate sensitivity (outside the AR5 likely range) or high transient response overestimate recent warming trends, with differences that cannot be explained by internal variability. This probably leads to future warming projections being biased high. Thus, the raw ensemble median and spread of future warming in CMIP6 (and therefore most other variables that scale to first order with global mean temperature) are not representative of a distribution constrained by observed trends, even if some of those models show a more realistic representation of processes in individual components than their CMIP5 predecessors (20–22). Conversely, CMIP6 models with climate sensitivity values that are within the IPCC AR5 likely range show warming trends much more consistent with the observations. We demonstrate that the observed recent warming trends from 1981–2014 and 1981–2017 (see the Supplementary Materials for sensitivity analysis) are highly correlated with TCR across CMIP6 as well as CMIP5. Given the theoretical background (18) and robust correlations across two generations of ESMs, we provide an estimate of the observationally constrained likely range for TCR based on CMIP6 models of 1.20° to 1.99°C (17 to 83% range). The constrained CMIP6 median TCR (1.60°C) is substantially lower than the raw CMIP6 median (1.95°C) and is consistent with other recently published TCR estimates (18, 38). We also show that the observational constraint on TCR remains robust, with high-TCR CMIP6 models being consistently different from the remainder of CMIP5 and CMIP6 models, even if only the spatial warming pattern is considered (with the global mean temperature trend removed). We emphasize that our goal is to provide a defensible constraint on future warming (i.e., TCR or future warming in SSP scenarios), acknowledging that additional predictors might yield an even more robust constraint [e.g., using ocean heat content (16, 39)]. Therefore, the past warming trend is only one of many possible ways of constraining future warming in climate models. The emergent constraints derived here may underrepresent uncertainty from the statistical assumption of interpreting the observed trend as a random sample from the same distribution as the simulated trends (40). Some processes that are not represented in CMIP6 models, but are present in reality, and potential systematic biases in the models, could therefore contribute to a wider uncertainty range (40). On the other hand, the estimated uncertainty may be too large if the relationship is weakened by models that are unrealistic in aspects unrelated to the constraint. The fact that the relationships between the past and future global mean warming (and TCR) hold over two generations of models and are supported by theoretical arguments provides evidence that the emergent constraints derived here are robust. Correlations are similarly high between the recent warming and future warming in the SSP scenarios, thus suggesting that future warming in the SSP scenarios simulated by models with high climate sensitivity is also likely to be biased high. Observationally constrained future warming in the SSP5-8.5 scenario, with respect to the 1995–2014 baseline, by the mid-century (years 2041–2060) is estimated at 1.01° to 1.90°C (5 to 95% range), and by the end of the century (years 2981–2100) is estimated at 2.26° to 4.60°C (5 to 95% range). The constrained median warming is 16% lower by mid-century and 14% lower by the end of the century than the unconstrained warming simulated by the CMIP6 ensemble (table S4). For comparison, the observationally constrained warming of the CMIP5 ensemble is essentially unchanged from its unconstrained warming, which justifies the use of the CMIP5 raw mean in AR5. Despite the expectation that the constraint should be weaker in emission scenarios where non-CO 2 forcings such as aerosol reduction have a substantial contribution to the future temperature evolution, the SSP1-2.6 warming is also highly correlated with warming during the past decades. Constrained warming in SSP1-2.6, with respect to the 1850–1900 baseline consistent with the Paris Agreement (35), by mid-century (years 2041–2060) is estimated at 1.36° to 1.86°C (likely range), and by the end of the century (years 2081–2100) is estimated at 1.33° to 1.99°C (likely range). Our results thus suggest that this ambitious mitigation scenario is consistent with meeting the Paris Agreement target based on the observationally constrained CMIP6 models, while the Paris Agreement target would be exceeded by several high ECS models. Last, we show that the CMIP6 projections are consistent with the CMIP5 projections after observationally constraining the CMIP6 ensemble and accounting for scenario differences, in this case through a simple rescaling CMIP5 warming by the ratio of the anthropogenic radiative forcings in the respective SSP and RCP scenarios. The difference of about 0.83°C between the raw CMIP5 RCP 8.5 and raw CMIP6 SSP5-8.5 warming by the end of the century (with respect to the 1995–2014 baseline) is primarily due to the higher TCR values in CMIP6. Given the constraint from past warming, the CMIP6 raw model ensemble is therefore likely biased high and is not representative of the constrained distribution, while the observationally constrained CMIP6 ensemble is generally consistent with the raw and constrained CMIP5 estimates. The high ECS models that are outside of the observationally constrained range may still provide very useful information regarding earth system behavior at high levels of warming, such as exploring climate and carbon cycle feedbacks for large deviations from present-day climate, for estimating pattern scaling of extreme events (per degree of warming), or a basis of storylines relevant for high impacts (13). It also remains important to improve our understanding of the regional responses to global warming across the full range of models. However, the clustering of models at the high end of global mean warming in the ensemble of opportunity needs to be accounted for (e.g., through model weighting or rescaling the ensemble) to avoid projections that are biased high.

MATERIALS AND METHODS We make use of available CMIP6 ESMs (6) (table S1) driven by historical forcings for the period 1850–2014 and extended by different SSP scenarios (SSP1-2.6 and SSP5-8.5 in the main text; and SSP2-4.5 and SSP3-7.0 in the Supplementary Materials) until the year 2100. We use the 1981–2014 period in Figs. 2 to 4 (for which more model simulations were available; table S1) and the 1981–2017 period for Figs. 5 and 6, which are based on fewer models that had SSP simulations available. These periods are chosen such that there is little trend in aerosol cooling (Fig. 1) and that they are only weakly influenced by known modes of internal variability (see below). For the simulated warming from 1981–2017, we extend the CMIP6 historical simulations by the SSP5-8.5 scenario and the CMIP5 (5) historical simulations by the RCP 8.5 scenario. The warming trend until the year 2017 should, however, be very similar across the scenarios (41). The CMIP5 scenarios also deviate slightly from observed changes (e.g., in stratospheric aerosol or solar variability) (42). As the CMIP6 models were forced with updated external drivers up to 2014, this is less of a concern for the CMIP6 ensemble. Both the CMIP5 and CMIP6 ensembles, however, lead to consistent constrained TCR estimates (table S3), suggesting that the results are not strongly influenced by the differences in radiative forcing. For the models’ output, we take ensemble means from models that provide multiple ensemble members, which reduce noise due to internal variability in the models. The observed warming trends are calculated as the mean of two spatially interpolated datasets: Cowtan and Way (27) v2 updated with HadSST4 (43) and GISTEMP (v4) (28, 29). We also examined the Berkeley Earth Surface Temperature (BEST) (31) dataset, but it shows nearly identical warming as the Cowtan and Way dataset over the two periods considered (fig. S3B). We did not include the BEST dataset into the observational mean as it is structurally similar to the Cowtan and Way dataset, and both use SST datasets based on HadSST. On the contrary, GISTEMP uses a more independent SST dataset. We quantify structural data uncertainty of the observed trend by the standard deviation (SD) across the 100 members of the Cowtan and Way v2 (with HadSST3) dataset. Some of the model-observation mismatches can be explained by the differences in global mean temperature definitions (44). The models’ output is the global mean near-surface air temperature (GSAT), while observation-based datasets report a blend of land near-surface air and sea surface temperatures [here referred to as global blended surface temperature (GBST)], which on average have been warming slightly slower than GSAT only (44). However, for future climate projections and impact assessments, the GSAT temperature metric is more relevant (35). To quantify the blending bias, we use data of (44) and compare simulated GSAT with simulated GBST (constructed from temperature anomalies). The difference from 1981 to 2017 (or 1981 to 2014) is an estimate of the blending bias in a model simulation during this period. To allow a like-for-like comparison among models and observations, we add an estimate of the blending effect (difference between GSAT and GBST) to the GBST observations to make them GSAT-like. We regress the simulated GBST increase over the examined period against the blending effect over the same period across the CMIP5 ensemble. Models that simulate greater warming also tend to show a larger blending effect. Using this relationship, we estimate the blending effect for GBST observations and use the prediction error of the linear fit as an estimate of uncertainty. For an observed warming trend of about 0.19°C per decade (for the period 1981–2017), the blending effect is estimated at 0.014° ± 0.005°C per decade (1σ). For 1981–2014, the observed warming is slightly lower and accordingly also the estimated blending effect (0.013° ± 0.005°C per decade). Both observational datasets considered [Cowtan and Way (27) and GISTEMP (28, 29)] are interpolated to near-full coverage, and we therefore compare them with the simulated temperature field averaged over the whole Earth. To quantify the role of unforced internal variability to a potential difference between observed and simulated trends, we make two independent estimates: one based on climate model simulations and one based on observed GBST. For the first estimate, we use a mean estimate of the SD across the warming trends for the period 1981–2014 in 12 large initial condition ensembles of CMIP5 and CMIP6 ESMs, resulting in a noise estimate of 0.035°C per decade due to internal variability (ranging from 0.023° to 0.049°C per decade between the models; table S2). Under the assumption that internal variability and the forced signal are independent, which is likely the case for relatively weak radiative forcing but may break down under larger climate change (45), we estimate internal variability from 32 CMIP6 control simulations (from each simulation separately). The mean SD of 34-year-long trends is with 0.037°C per decade similar to the smaller set of large ensembles. For the second estimate, we subtract both the raw and the scaled CMIP5 and CMIP6 GBST ensemble means from the observations from 1900 to 2018 [the multimodel means are scaled towards the observations (46)]. These residuals from different combinations of the simulated and observed GBST are an estimate of internal variability (46), but due to observational and forcing uncertainties (26), we interpret them as an upper estimate. Based on this, we estimate an SD of 0.038°C per decade for 34-year-long trends, slightly higher, but consistent with the model simulations in agreement with the findings in (46). As a conservative choice, we use this last estimate throughout the paper for the 1981–2014 period. For the 1981–2017 period, the internal variability estimates are slightly lower (cf. table S2), and we again use a conservative estimate based on the difference between observed and simulated GBST of 0.035°C per decade for the analyses in that period. The overall observational uncertainty is calculated as the sum in quadrature of the above three effects: structural uncertainty, internal variability, and uncertainty of the blending effect. Uncertainty from internal variability dominates the trend uncertainty. The presence of internal variability in the observed GBST may bias the central value of the constrained climate response. We estimate the contribution of Pacific and Atlantic low-frequency variability to GBST using variability analogues (35). To quantify the influence of Pacific variability (fig. S1), we search for simulated 40-month-long periods from the CMIP5 and CMIP6 control simulations that follow the observed (ERSSTv5 and COBE-SST2) SST evolution in the tropical Pacific (15°N to 15°S, 180° to 90°W). In addition, we search for analogues that follow the observed (ERA5, MERRA2, and JRA55) wind stress evolution over the western tropical Pacific (150°E to 150°W, 10°S to 10°N). The observational datasets are introduced and described in (47–51). For the contribution of Atlantic variability, we smooth (with 13-month-long running mean) the observed extratropical North Atlantic (30° to 60°N) SST before selecting 120-month-long analogues. Thereby, we remove some of the high-frequency variability and highlight the role of the Atlantic variability on a multidecadal time scale. Before selecting the best matching variability analogues (based on the root mean square deviation), we remove the CMIP5 and CMIP6 multimodel means from the observed tropical Pacific and extratropical North Atlantic SST to obtain estimates of the internal variability component in these regions. In addition, we estimate the forced signal in these two regions by scaling the CMIP6 multimodel mean GBST time series against the observations from 1900 to 2018 to reduce biases in the simulated warming and also remove these scaled multimodel means from the observations (46). The models do not simulate substantial trends in wind stress over the western tropical Pacific, and therefore, we directly use the observed wind stress variability. For the Pacific SST, we further estimate the forced signal with the method in (52). Different to the North Atlantic SST and tropical Pacific wind stress, we standardize the time series of equatorial Pacific SST before selecting analogues. Standardization favors models that under- or overestimate observed variability, but it has only a small influence on the results. We interpret the results from estimating the forced signal by scaling the GBST as a best estimate but show the range from the other approaches in fig. S1. Pacific variability has contributed a cooling over both examined periods, but less so over 1981–2017, consistent with other studies (fig. S1). As the Atlantic contribution is weak and similar in both periods, the observed 1981–2017 warming period is probably less influenced by internal variability. The central estimate of TCR constrained by 1981–2014 warming might therefore be slightly underestimated (see the “Constraints on the TCR” section). We use ordinary least squares (OLS) regression for the relationship between the recent simulated warming rate, which consists of a forced signal and noise (depending on the ensemble sizes, the noise is smaller or larger for individual models) and future warming or TCR. The presence of noise in the predictor biases the OLS regression slope toward zero, i.e., we underestimate the relationship between forced signal and future warming. Errors-in-variables regression models, such as total least squares (TLS), allow to account for that. Forcing the regression line to intercept with the origin (0,0), as in fig. S3A, is based on an assumption of strict linearity between the simulated forced trend and the future warming (or TCR). This results in a similar TCR estimate of 1.45°C as obtained by TLS, but is slightly lower than the OLS estimate without fixing the intercept of 1.60°C (using 1981–2014; Fig. 2; constrained TCR; table S3). The assumption of strict linearity, however, is not satisfied in ESMs, due to imperfect representation of different feedbacks and simulated response to forcing, and due to the presence of internal variability. Because the observed trend is also influenced by internal variability as discussed above, we argue that we are rather interested in estimating the relationship between simulated warming and future warming than the relationship between forced warming and future warming, and OLS results in an unbiased estimate of the former. It is therefore generally accepted for predictive modeling (53). A concern, however, is that depending on the ensemble size, amounts of variability present in the models and in the observations greatly differ. An alternative approach would be to take one ensemble member per model, which, however, neglects a lot of the available data, or to use all ensemble members and weight them such that each model receives the same weight. This approach results in a slightly higher TCR estimate of 1.71°C (fig. S3D) than the OLS regression on ensemble means. Given the simplicity of OLS and that the results do not depend strongly on the regression approach, we use OLS for the main analysis. The dotted lines around the linear regressions in Figs. 2 and 5 show the prediction error for the fit. The gray rectangles in Figs. 2 to 5 represent the observed GSAT trend (i.e., GBST with our estimate of the blending effect) and its combined uncertainty from internal variability, structural uncertainties in the observational datasets, and the blending effect as introduced above (shown are the ±1σ and ±2σ ranges; for Fig. 4, only the effect of internal variability is included). The blue rectangles on Figs. 2 and 5 represent the uncertainty (likely range; 17 to 83%) in the observationally constrained future warming, and the blue dashed lines show the 5 to 95% ranges. We obtain this uncertainty by randomly sampling from the distribution of observed warming (gray square) and its associated future warming given by the linear regression and its prediction error. The ECS of each CMIP6 model is here estimated by regressing the top-of-atmosphere radiative imbalance against the GSAT change during the first 150 years in a CO 2 -only simulation that quadruples the amount of atmospheric CO 2 (8). This estimate is scaled by a factor of 2 (we neglect that CO 2 forcing rises slightly faster than logarithmic (54)]. The so-obtained ECS is an effective sensitivity and underestimates the actual equilibrium climate response for most models (16), but it is consistent with the ECS values reported for the CMIP5 ensemble (8). TCR is calculated from the CO 2 -only simulation, where the atmospheric CO 2 concentration increases at a rate of 1% per year, centered on the time of doubling of the atmospheric CO 2 , which occurs during simulation year 70 (we use the mean of the years 61 to 80). Note that in the GISS-E2-1-G simulations, the CO 2 concentration only increases until year 70. Therefore, TCR of this model is slightly underestimated. To estimate the forced change in each idealized CO 2 -only simulation, we subtract a linear fit to the corresponding segment of the unforced control simulation. For INM-CM5-0, no control simulation was available at the time of writing, and we therefore estimate TCR with respect to the first 5 years of its +1% CO 2 per year experiment. Its control experiment became available after the revisions, and estimating warming with respect to the control climate indicates a slightly higher TCR of 1.39°C instead of 1.31°C, which does not change our conclusions. The ECS and TCR values of the CMIP6 ensemble are reported in table S1. The ECS and TCR values for CMIP5 models can be found in table 1 of (8). CMIP6 models used in this paper are listed in table S1. (Note: Not all models had SSP data available. Also, simulations with CAMS-CSM1-0 run only to the year 2099, so instead of the change for the 2081–2100 period, the change for 2081–2099 was calculated in this model only.) We make use of the following CMIP5 models (historical scenario, followed by RCP 2.6 and RCP 8.5 scenario): ACCESS1-0, bcc-csm1-1, bcc-csm1-1-m, CCSM4, CNRM-CM5, CSIRO-Mk3-6-0, CanESM2, FGOALS-g2, GFDL-CM3, GFDL-ESM2G, GFDL-ESM2M, GISS-E2-H, GISS-E2-R, HadGEM2-ES, inmcm4, IPSL-CM5A-LR, IPSL-CM5B-LR, MIROC-ESM, MIROC5, MPI-ESM-LR, MRI-CGCM3, and NorESM1-M. For CMIP5 models, we use all available ensemble members in the “p1”-only variant.

SUPPLEMENTARY MATERIALS Supplementary material for this article is available at http://advances.sciencemag.org/cgi/content/full/6/12/eaaz9549/DC1 Fig. S1. Estimated contribution of Pacific and Atlantic internal variability to GSAT in °C per decade during 1981–2014 and 1981–2017. Fig. S2. Correlation of the simulated warming trend for the period 1981–2017 with TCR. Fig. S3. Correlation of the simulated warming trend for the period 1981–2014 with TCR, showing different types of regression and methods of estimating the uncertainty of the regression. Fig. S4. Correlations of future warming in CMIP5 and CMIP6 models (with respect to 1995–2014 baseline), with the simulated past warming trend (1981–2017). Fig. S5. Correlations of future warming in CMIP6 models (with respect to 1995–2014 baseline), with the simulated past warming trend (1981–2017). Fig. S6. Correlations of TCR and ECS with future warming in CMIP6 and CMIP5 models. Table S1. CMIP6 models used in this study with their TCR and ECS values. Table S2. GSAT trends for the periods 1981–2017 and 1981–2014 and estimates of the effect of internal variability of CMIP5 and CMIP6 models. Table S3. TCR ranges (constrained and unconstrained) in CMIP6 and CMIP5 models. Table S4. Future warming (constrained and unconstrained) in CMIP6 models under different SSP scenarios, as labeled. References (55, 56)

This is an open-access article distributed under the terms of the Creative Commons Attribution-NonCommercial license, which permits use, distribution, and reproduction in any medium, so long as the resultant use is not for commercial advantage and provided the original work is properly cited.