Climate sensitivity characterises the response of the climate to changes in radiative forcing and can be measured in many different ways. However, estimates derived from observations of historical global temperatures have tended to be lower than those suggested by state-of-the-art climate simulators. Are the models too sensitive?

A new study largely explains the difference – it is because the comparison has not been done ‘like-with-like’.

The implications for understanding historical global temperature change are also significant. It is suggested that changes in global air temperature are actually ~24% larger than measured by the HadCRUT4 global temperature dataset.

Post based on Richardson et al. (2016, Nature Climate Change) [code & data]

Historical weather observations include air temperatures measured over land areas and sea surface temperatures in ocean regions, but only where there were existing thermometers (& ships). The observed change in ‘global mean temperature’ is produced by blending together these sparse, differently measured changes of temperature over time (e.g. for HadCRUT4).

However, when considering global mean temperature from climate models, we tend to use simulated air temperatures everywhere, mainly because this is much easier to calculate. Does this difference matter?

An earlier study by Cowtan et al. demonstrated that these subtle differences in producing estimates of global temperature can make a significant difference to conclusions drawn from comparisons of observations and simulations.

As we cannot travel back in time to take additional historical observations, the only way to perform a like-with-like comparison is by treating the models differently, in terms of the spatial coverage and observation type. Figure 1 highlights the dual effects of ‘masking’ the simulations to have the same spatial coverage as the observations and ‘blending’ the simulated air and sea temperatures together, instead of using air temperature everywhere.

When using simulated air temperatures everywhere (Fig. 1a, red line), the models tend to show more warming than the observations (HadCRUT4, grey). However, when the comparison is performed fairly, this difference disappears (blue). Roughly half of the difference is due to masking and half due to blending.

The size of the effect is not trivial. According to the CMIP5 simulations, more than 0.2°C of global air temperature change has been ‘hidden’ due to our incomplete observations and use of historical sea surface temperatures (Fig. 1b).

But what effect does this have on estimates of climate sensitivity?

Otto et al. (2013) used global temperature observations and a simple energy-budget approach to estimate the transient climate response (TCR) as 1.3K (range, 0.9-2.0K), which is at the lower end of the IPCC AR5 assessed range (1.0-2.5K).

A new study, led by Mark Richardson, repeated the analysis of Otto et al. but used updated observations, improved uncertainty estimates of aerosol radiative forcing and, critically, considered the blending and masking effects described above.

Figure 2 shows how TCR estimates from different CMIP5 models depend on the assumptions made about observation availability. The pink circles highlight that the blending procedure lowers the TCR when compared to using air temperature (tas) everywhere (dotted black line). The blue triangles demonstrate that the sparse availability of observations further reduces the TCR estimates.

So, according to the CMIP5 simulations, the TCR estimated from our available observations will always be lower than if we had full spatial coverage of global near-surface air temperature.

To summarise the effect of the differences in analysis methodology, Figure 3 shows various estimates of TCR. The top red bar shows the raw estimates of TCR from the CMIP5 simulations and the bottom blue bar is the original result from Otto et al. based on observations. The various bars inbetween reconcile the difference between Otto et al. and CMIP5.

For example, if the models are treated in the same way as the observations, then the estimated TCR is the top blue bar, in much better agreement with Otto et al. There is then no discrepency between the observation and simulation estimates of TCR when they are treated the same way. (The second blue bar shows the impact of updating the uncertainty estimates of aerosol radiative forcing on the Otto et al result, which is a separate issue.)

However, we can also reverse the procedure above and estimate by how much we need to scale the TCR estimated from the observations to reproduce what would be derived if we had air temperatures everywhere. This is the second red bar in Figure 3, which overlaps the CMIP5 simulation range and has a best estimate of 1.7K (range, 1.0-3.3K).

Richardson et al conclude that previous analyses which reported observation-based estimates of TCR toward the low end of the model range did so largely because of inconsistencies between the temperature reconstruction methods in models and observations.

As observational coverage improves the masking effect will reduce in importance but will still remain for the historical period unless we can rescue additional, currently undigitised, weather observations. The blending issue is here to stay unless estimates of changes in air temperature can be produced over the ocean regions. The physical mechanisms for the different simulated warming rates between ocean and air temperatures also need to be further explored.

Finally, if the reported air-ocean warming and masking differences are robust, then which global mean temperature is relevant for informing policy? As observed? Or what those observations imply for ‘true’ global near-surface air temperature change? If it is decided that climate targets refer to the latter, then the warming is actually 24% (9-40%) larger than reported by HadCRUT4.

And that is a big difference, especially when considering lower global temperature targets.