We have a paper out in Nature titled “Greater future global warming inferred from Earth’s recent energy budget”.

The Carnegie press release can be found here and Coverage from the Washington Post can be found here.

A video abstract summarizing the study is below.

The study addresses one of the key questions in climate science: How much global warming should we expect for a given increase in the atmospheric concentration of greenhouse gases?

One strategy for attempting to answer this question is to use mathematical models of the global climate system called global climate models. Basically, you can simulate an increase in greenhouse gas concentrations in a climate model and have it calculate, based on our best physical understanding of the climate system, how much the planet should warm. There are somewhere between 30 and 40 prominent global climate models and they all project different amounts of global warming for given change in greenhouse gas concentrations. Different models project different amounts of warming primarily because there is not a consensus on how to best model many key aspects of the climate system.

To be more specific, if we were to assume that humans will continue to increases greenhouse gas emissions substantially throughout the 21st century (the RCP8.5 future emissions scenario), climate models tell us that we can expect anywhere from about 3.2°C to 5.9°C (5.8°F to 10.6°F) of global warming above pre-industrial levels by 2100. This means that for identical changes in greenhouse gas concentrations (more technically, identical changes in radiative forcing), climate models simulate a range of global warming that differs by almost a factor of 2.

The primary goal of our study was to narrow this range of model uncertainty and to assess whether the upper or lower end of the range is more likely. We utilize the idea that the models that are going to be the most skillful in their projections of future warming should also be the most skillful in other contexts like simulating the recent past. Thus, if there is a relationship between how well models simulate the recent past and how much warming models simulate in the future, then we should be able to use this relationship, along with observations of the recent past, to narrow the range of future warming projections (this general technique falls under the “emergent constraint” paradigm, see e.g., Hall and Qu [2006] or Klein and Hall [2015]). The principal theme here is that models and observations together give us a more complete picture of reality than models can give us alone.

So, what variables are most appropriate to use to evaluate climate models in this context? Global warming is fundamentally a result of a global energy imbalance at the top of the atmosphere so we chose to assess models in their ability to simulate various aspects of the Earth’s top-of-atmosphere energy budget. We used three variables in particular: reflected solar radiation, outgoing infrared radiation, and the net energy balance. Also, we used three attributes of these variables: their average (AKA climatological) values, the average magnitude of their seasonal variability and the average magnitude of their month-to-month variability. These three variables and three attributes combine to make nine features of the climate system that we used to evaluate the climate models (see below for more information on our decision to use these nine features).

We found that that there is indeed a relationship between the way that climate models simulate these nine features over the recent past, and how much warming they simulate in the future. Importantly, models that match observations the best over the recent past, tend to simulate more 21st-century warming than the average model. This indicates that we should expect greater warming than previously calculated for any given emissions scenario, or it means that we need to reduce greenhouse gas emissions more than previously thought to achieve any given temperature stabilization target.

Using the steepest future emissions scenario as an example (the RCP8.5 emissions scenario), the figure below shows the comparison of the raw-model projections used by the Intergovernmental Panel on Climate Change, to our projections that incorporate information from observations.

It is also noteworthy that the observationally-informed best-estimate for end-of-21st-century warming under the RCP 4.5 scenario is approximately the same as the raw best estimate for the RCP 6.0 scenario. This indicates that even if society were to decarbonize at a rate consistent with the RCP 4.5 pathway (which equates to ~800 gigatonnes less cumulative CO 2 emissions than the RCP 6.0 pathway), we should still expect global temperatures to approximately follow the trajectory previously associated with RCP 6.0.

So why do models with the most skill at simulating the recent past tend to project more future warming? It has long been known that the large spread in model-simulated global warming results mostly from uncertainty in the behavior of feedbacks in the climate system like the cloud feedback.

So clouds, for example, reflect the Sun’s energy back to space and this has a large cooling effect on the planet. As the Earth warms due to increases in greenhouse gases, some climate models simulate that this cooling effect from clouds will become stronger, canceling out some of the warming from increases in greenhouse gases. Other models, however, simulate that this cooling effect from clouds will become weaker, which would enhance the initial warming due to increases in greenhouse gases.

Our work is consistent with many previous studies that show that models that warm the most, do so mostly because they simulate a reduction in the cooling effect from clouds. Thus, our study indicates that models that simulate the Earth’s recent energy budget with the most fidelity also simulate a larger reduction in the cooling effect from clouds in the future and thus more future warming.

One point that is worth bringing up is that it is sometimes argued that climate model-projected global warming should be taken less seriously on the grounds that climate models are imperfect in their simulation of the current climate. Our study confirms important model-observation discrepancies and ample room for climate model improvement. However, we show that models that simulate the current climate with the most skill, tend to be models that project above-average global warming over the remainder of the 21st-century. Thus, it makes little sense to dismiss the most severe global warming projections because of model deficiencies. On the contrary, our results suggest that model shortcomings can likely be used to dismiss the least severe projections.

Questions regarding specifics of the study

Below are answers to some specific questions that we anticipate interested readers might have. This discussion is more technical than the text above.

1) How exactly are the constrained projection ranges derived and how do you guard against over-fitting?

In order to assess the skill by which the statistical relationships identified in the study help inform future warming, we employ a technique called cross-validation. In the main text, we show results for ‘hold-one-out’ cross-validation and in the Extended Data, we show results for ‘4-fold’ cross-validation.

Under ‘hold-one-out’ cross-validation, each climate model takes a turn acting as a test model with the remaining models designated as training models. The test model is held out of the procedure and the training models are used to define the statistical relationship between the energy budget features (the predictor variables) and future warming (the predictand). Then, the test model is treated as if it where the observations in the sense that we use the statistical relationship from the training models as well as the test model’s simulated energy budget features to “predict” the amount of future warming that we would expect for the test model. Unlike the true observations, the amount of future warming for the test model is actually known. This means that we can quantify how well the statistical procedure did at predicting the precise amount of future warming for the test model.

We allow every model to act as the test model once so that we can obtain a distribution of errors between the magnitude of statistically-predicted and ‘actual’ future warming. This distribution is used to quantify the constrained projection spread. A visualization of this procedure is shown in the video below:

2) Does your constrained spread represent the full range of uncertainty for future warming?

No.

First, it is important to note that most of the uncertainty associated with the magnitude of future warming is attributable to uncertainty in the amount of greenhouse gases humans will actually emit in the future. Our study does not address this uncertainty and instead focuses on the range of warming that we should expect for a given change in radiative forcing.

Secondly, the range of modeled global warming projections for a given change in radiative forcing does not represent the true full uncertainty. This is because there are a finite number of models, they are not comprehensive, and they do not sample the full uncertainty space of various physical processes. For example, a rapid nonlinear melting of the Greenland and Antarctic ice sheets has some plausibility (e.g., Hansen et al. 2016) but this is not represented in any of the models studied here and thus it has an effective probability of zero in both the raw unconstrained and observationally-informed projections. Because of considerations like this, the raw model spread is best thought of as a lower bound on total uncertainty (Caldwell et al., 2016) and thus our observationally-informed spread represents a reduction in this lower bound rather than a reduction in the upper bound.

3) Why did you use these particular energy budget features as your predictor variables?

Overall, we chose predictor variables that were of the most fundamental and comprehensive nature as possible, that still offered the potential for a straight-forward physical connection to the magnitude of future warming. In particular, we did not want to ‘data mine’ in an effort to find any variable with a high across-model correlation between its contemporary value and the magnitude of future warming. Doing so would have resulted in relationships that would be very likely to be spurious (e.g., Caldwell et al, 2014).

Additionally, we chose to emphasize broad and fundamental predictor variables in order to avoid the ceteris paribus (all else being equal) assumptions that are often evoked when more specific predictor variables are used. For example, it might be the case that models with larger mean surface ice albedo in a given location have larger positive surface ice albedo feedbacks in that location. This would indicate that ceteris paribus, these models should produce more warming. However, it might be the case that there is some across-model compensation from another climate process such that the ceteris paribus assumption is not satisfied and these models do not actually produce more warming than average. For example, maybe models with more mean surface ice albedo in the given location tend to have less mean cloud albedo and less positive cloud albedo feedbacks with warming. Practically speaking, it is the net energy flux that is going to matter for the change in temperature not the precise partitioning of the net flux into its individual subcomponents. Thus, in an attempt to account for potential compensation across space and across processes, we used globally complete, aggregate measures of the earth’s energy budget as our predictor variables. In the context of the above example, this would mean using total reflected shortwave radiation as the predictor variables rather than the reflected shortwave radiation from only one subcomponent of the energy budget like surface ice albedo.

To be more specific, we had five primary objectives in mind when we choose the features that we used to serve as predictor variables to inform future warming projections.

Objective 1: The features should have a relatively straight-forward connection to physical processes that will influence the magnitude of projected global warming.

The central premise that underlies our study is that climate models that are going to be the most skillful in their projections of future warming should also be the most skillful in other contexts like simulating the recent past. However, it should be relatively apparent why there would be a relationship between how well a model simulates a given variable over the recent past and how much warming that model simulates in the future.

Uncertainty in model-projected global warming originates primarily from differences in how models simulate the Earth’s top-of-atmosphere radiative energy budget and its adjustment to warming. Thus, we specifically choose to use the Earth’s net top-of-atmosphere energy budget and its two most fundamental components (it’s reflected shortwave radiation and outgoing longwave radiation) as predictor variables. This made it much easier to assess why relationships might emerge between the predictor variables and future warming than it would have been if we were opened to using any variables in which some positive across-model correlation could be found with future warming.

Objective 2: The features should represent processes as fundamental to the climate system as possible.

We used three attributes of the energy budget variables: the mean climatology, the magnitude of the seasonal cycle, and the magnitude of monthly variability.

We choose these variables in order to try to keep the predictors as simple and fundamental as possible. We did not want to ‘data mine’ for more specific features that might have more apparent predictive power because it would be likely that such predictive power would be illusory (e.g., Caldwell et al, 2014). Furthermore, these choices were informed by previous studies which have indicated that seasonal and monthly variability in properties of Earth’s climate system can be useful as predictors of future warming because behavior on these timescales is relatable to the behavior of long-term radiative feedbacks. Average (or climatological) predictors were used because the mean state of the climate system can affect the strength of radiative feedbacks primarily because the mean state influences how much potential there is for change.

Objective 3: The features should have global spatial coverage.

The climate system is dynamically linked through horizontal energy transport so modeled processes at a given location inevitably influences modeled processes elsewhere. This means that there may be compensation in a given energy budget field across space. For example, suppose that models with greater mean (climatological) albedo, have larger albedo feedbacks. Further, suppose that models with greater mean albedo over location X, tend to have less mean albedo over location Y. If we were to restrict our attention to location X, we would be tempted to say that models with more mean albedo at location X should warm more in response to forcing. However, this would only be the case if the ceteris paribus assumption holds.

Since the magnitude of global warming will depend on the global change in albedo, it is important to account for any potential compensation across space. Thus, we required that the features that we used as predictor variables be globally complete fields so that any spatial compensation between processes could be accounted for.

Objective 4: The features should represent the net influence of many processes simultaneously.

In addition to considering compensation within a given process in space, it was also a goal of ours to consider possible compensation amongst processes at a given location. For example, suppose again that models with greater mean (climatological) albedo, have larger albedo feedbacks. Further, suppose that models with greater mean cloud albedo, tend to have less mean surface snow/ice albedo. If we were to restrict our attention to cloud albedo, we would be tempted to say that models with more mean cloud albedo should warm more in response to forcing. Again, this would only be the case if the ceteris paribus assumption holds.

Since the magnitude of global warming will depend on the net influence of many processes on the energy budget, it is important to account for any potential compensation across processes. Thus, rather than looking at specific subcomponents of the energy budget (e.g., cloud albedo), we use the net energy imbalance and only its two most fundamental components (its shortwave and longwave components) as predictor variables.

Objective 5: The features should be measurable with sufficiently small observational uncertainty.

Our procedure required that the observational uncertainty in the predictor variables was smaller than the across-model spreads. This was essential so that it would be possible to use observations to discriminate between well and poor performing models. This objective was met for the top-of-atmosphere energy flux variables that we used from the CERES EBAF satellite product but it would not have been met for, e.g, surface heat fluxes over a large portion of the planet.

4) Why did you choose global temperature response and not a more specific physical metric as your predictand?

Our ultimate goal was to constrain the magnitude of future warming. Others have argued that it is easier to draw physical connections if the predictand is something more specific than global temperature like an aspect of the magnitude of the cloud feedback (e.g., Klein and Hall, 2015). For example, it is more straight-forward to relate the magnitude of the seasonal cycle in cloud albedo at some location to the magnitude of long-term cloud albedo feedback in that location, than it is to relate the magnitude of the seasonal cycle in cloud albedo to the magnitude of global warming. We agree with this. However, models with more-positive cloud albedo feedbacks in a given location will be the models that warm more only if the ceteris paribus assumption holds. It could be the case that models with more-positive cloud albedo feedbacks in a given location tend to have less-positive cloud albedo feedbacks elsewhere or tend to have more-negative feedbacks in other processes.

Thus, it should be recognized that using a specific predictand like the magnitude of the local cloud albedo feedback can make it easier to draw a physical connection between predictor and predictand but this can come at the cost of actually being able to constrain the ultimate variable of interest. Since our goal was to constrain global temperature change, we felt that it was most practical to use global temperature change as our predictand even if this made drawing a physical connection less straightforward.

5) Are your results sensitive to the use of alternative predictors or predictands?

One of the more striking aspects of our study is the qualitative insensitivity of the results to the use of differing predictors and predictands. Our findings of generally reduced spreads and increased mean warming are robust to which of the nine predictor fields are used (or if they are used simultaneously) and robust to which to the ten predictands is targeted (mean warming over the years 2046-2065 and 2081-2100 for RCP 2.6, RCP 4.5, RCP 6.0 and RCP 8.5, as well as equilibrium climate sensitivity, and net feedback strength).

6) Why did you use the statistical technique that you used?

We used Partial Least Squares (PLS) regression to relate simulated features of the Earth’s energy budget over the recent past to the magnitude of model-simulated future warming. PLS regression is applicable to partial correlation problems analogously to the more widely used Multiple Linear Regression (MLR). As discussed above, we wanted to relate globally complete energy budget fields (our predictor matrices) to the magnitude of future warming (our predictand vector). Because of the high degree of spatial autocorrelation in the energy budget fields, the columns in the predictor matrix end up being highly collinear which makes MLR inappropriate to the problem. PLS, however, offers a solution to this issue by creating linear combinations of the columns in the predictor matrix (PLS components) that represent a large portion of the predictor matrix’s variability. The procedure is similar to Principle Component Analysis (PCA) common in climate science but instead of seeking components that explain the maximum variability in some matrix itself, PLS seeks components in the predictor matrix that explain the covariability between the predictor matrix and the predictand vector.

7) How do you know that statistical procedure itself isn’t producing the results that you are seeing?

In conjunction with cross-validation, we perform three additional experiments designed to expose any systematic biases in our methodology. These three experiments involve supplying the statistical procedure with data that should not produce any constraint on the magnitude of future global warming. In one experiment, we substitute the described energy budget features with global surface air temperature (SAT) annual anomalies for each model. Since annual SAT anomaly fields are dominated by chaotic unforced variability, the across-model relationship of these patterns for any given year is unlikely to be related to the magnitude of future warming.

In a second experiment, we substitute the original global warming predictand vector with versions of the vector that have had its values randomly reordered or scrambled. Thus, these scrambled predictand vectors have the same statistical moments as the original vector but any real across-model relationship between predictors and predictands should be eliminated on average.

Finally, in a third experiment, we use both the SAT anomaly fields and the scrambled predictand vectors as the predictors and predictands respectively.

In contrast to the main results between the energy budget predictor fields and the magnitude of future global warming, the three experiments described above all demonstrate no skill in constraining future warming. This indicates that the findings reported in our study are a result of real underlying relationships between the predictors and predictands and are not an artifact of the statistical procedure itself.

8) Why are the values in Table 1 of the paper slightly different from those implied in Figure 1?

Our results generally show warmer central estimates and smaller ranges in the projections of future warming. However, there are multiple ways to compare our results with raw/previous model results. One way would be to compare our results with what is reported in the last IPCC report (Chapter 12 in Assessment Report 5). This is probably the most useful from the perspective of a casual observer and it is the comparison shown in our Table 1. One issue with this comparison is that it is not a perfect apples-to-apples comparison because we used a slightly different suite of climate models than those used in the IPCC report (see our Supplementary Table 1 and IPCC AR5 Chapter 12). Since many casual observers will read the abstract and look at Table 1, we wanted the numerical values in these two places to match. So, the numerical values in the abstract (the ones reported in degrees Celsius) correspond to our results compared to what was reported previously in IPCC AR5.

It is also useful to make the apples-to-apples comparison where our observationally-informed results are compared to raw model results using the exact same models. This is what is done using the “spread ratio” and “prediction ratio” discussed in the paper’s text and shown in Figure 1. These dimensionless values (the ones reported in percent changes) also appear in our abstract. This was done so that the spread ratio and prediction ratio numbers in the abstract would be consistent with those seen in Fig. 1.

So to expand/clarify the numbers reported in the abstract, there are two sets of comparisons that are relevant:

Under the comparison where our results were compared directly with that from IPCC AR5, the observationally informed warming projection for the end of the twenty-first century for the RCP8.5 scenario is about 12 percent warmer (+~0.5 degrees Celsius) with a reduction of about 43 percent in the two standard deviation spread (-~1.2 degrees Celsius) relative to the raw model projections.

Under the comparison where the exact same models are used, the observationally informed warming projection for the end of the twenty-first century for the RCP8.5 scenario is about 15 percent warmer (+~0.6 degrees Celsius) with a reduction of about a third in the two standard deviation spread (-~0.8 degrees Celsius) relative to the raw model projections.