Satellite atmospheric temperature data

We used satellite estimates of atmospheric temperature produced by RSS4, STAR5, and UAH7. All three groups provide satellite measurements of the temperatures of the mid- to upper troposphere (TMT) and the lower stratosphere (TLS). Our focus here is on assessing the significance of observed trends in TMT. TLS is required for correcting TMT for the influence it receives from stratospheric cooling.

Each group provides the most recent version and the previous version of their datasets. The versions available are: 3.3 and 4.0 (RSS), 3.0 and 4.0 (STAR), and 5.6 and 6.0 (UAH). Satellite datasets are in the form of monthly means on 2.5° × 2.5° latitude/longitude grids. At the time this analysis was performed, temperature data were available for the 456-month period from January 1979 to December 2016.

There are differences in the spatial coverage of the satellite temperature data produced by the three groups. While UAH TLS and TMT datasets have global coverage, areas poleward of 87.5° (82.5°) are excluded from STAR (RSS). To avoid any impact of spatial coverage differences on trend comparisons, we calculated all near-global averages of actual and synthetic satellite temperatures over the area of common coverage in the RSS, UAH, and STAR datasets (82.5°N to 82.5°S).

Method used for correcting TMT data

Trends in TMT estimated from microwave sounders receive a substantial contribution from the cooling of the lower stratosphere8,9,10,11. In ref. 8, a regression-based approach was developed for removing the bulk of this stratospheric cooling component of TMT. This method has been validated with both observed and model atmospheric temperature data9, 36, 37. Correction was performed at each observational and model grid-point. Corrected grid-point data were then spatially averaged over 82.5°N–82.5°S. Further details of the correction method are provided in the Supplementary Information.

Details of model output

We used model output from phase 5 of the Coupled Model Intercomparison Project38 (CMIP5). The simulations analyzed here were contributed by 18 different research groups (see Supplementary Table S1). Our focus was on pre-industrial control runs with no changes in external influences on climate, which provide estimates of the natural internal variability of the climate system (see Supplementary Table S2).

To compare satellite-derived atmospheric temperature trends with model estimates of trends arising from natural internal variability, we calculate synthetic TMT and TLS from CMIP5 control runs. This calculation relies on a local weighting function method developed at RSS. At each model grid-point, simulated temperature profiles were convolved with local weighting functions. Local weights depend on the grid-point surface pressure, the surface type (land or ocean), and the selected layer-average temperature (TMT or TLS).

Statistical analysis

We use model estimates of natural internal variability to evaluate the statistical significance of trends in the observed temperature time series T o (k, t), where k and t are (respectively) indices over the number of satellite TMT datasets and the time in months. Internal variability estimates are obtained from CMIP5 control runs. Rather than focusing on one specific period, we analyze maximally overlapping 20-year trends in T o (k, t). “Maximally overlapping” indicates that an 20-year sliding window is being used for trend calculations. This window advances in increments of one month until the end of the current window reaches the final month of the satellite or control run TMT time series.

Anomalies in the satellite observations are defined relative to climatological monthly means calculated over the 38-year period from January 1979 to December 2016. Control run anomalies are with respect to climatological monthly means over the full length of each model’s control integration.

We assess trend significance using weighted p-values, which account for inter-model differences in control run length3.

The weighted p-value, \(\overline{{p}_{c}}(i,k)^{\prime} \), is defined as:

$$\overline{{p}_{c}}(i,k)^{\prime} =\sum _{j=1}^{{N}_{model}}{p}_{c}(i,j,k)/{N}_{model}$$ (1)

$$i=\mathrm{1,}\ldots ,{N}_{o};j=\mathrm{1,}\ldots ,{N}_{model};k=\mathrm{1,}\ldots ,{N}_{sat}$$

where the index i is over N o , the number of maximally overlapping 20-year trends in T o (k, t), and the index j spans N model , the number of model control runs (which is 36 here). The sample size N sat is the total number of satellite datasets. Here, N sat = 6, and N o = 217 for 20-year trends.

The individual p c (i, j, k) values for each model pre-industrial control run are calculated as follows:

$${p}_{c}(i,j,k)={K}_{c}(i,j,k)\,/{N}_{c}(j)$$ (2)

$$i=\mathrm{1,}\ldots ,{N}_{o};j=\mathrm{1,}\ldots ,{N}_{model};k=\mathrm{1,}\ldots ,{N}_{sat}$$

where the summation variable K c (i, j, k) is the number of 20-year trends in each model control run that are larger than b o (i, k), the current 20-year trend in T o (k, t). The sample size N c (j) is the number of maximally overlapping 20-year trends in the j th control run. Further information on the statistical notation and analysis is given in the Supplementary Information.

Sensitivity of results to model variability errors

The credibility of our trend significance results rests on the assumption that model control runs yield reliable estimates of internal variability on the timescales considered here (20 years in Fig. 1C and D, 38 years in Fig. 1E, and 18 years in Supplementary Fig. 1C and D). On these multi-decadal timescales, it is not feasible to use the single realization of the observed 38-year satellite TMT record to evaluate how reliably models capture “observed” internal variability. The primary difficulty is that observed temperature records are simultaneously influenced by both internal variability (operating on a wide range of different space and timescales) and multiple external forcings. Unambiguous partitioning of observational temperature records into internally generated and externally forced components is an aspirational goal, but not attainable in practice. All model-versus-observed internal variability comparisons are affected by the uncertainties involved in isolating multi-decadal internal variability from observational climate records27.

Other approaches must therefore be employed to enhance confidence in the reliability of model variability on 18- to 38-year timescales, such as variability comparisons involving longer SST and land + ocean surface temperature records12, 28. The latter work shows no evidence that models systematically underestimate observed variability on multi-decadal timescales – see, e.g., Fig. 4 in ref. 28. The same applies to model-versus-data variability comparisons on shorter timescales of roughly 10 years27.