We find that some persistent changes in the distribution of temperature extremes occurred as early as the 1960s (i.e., about one and half decades from the baseline period) (Fig. 1a,b for TN90p and TX90p and Fig. S2a,b for TN10p and TX10p). By the year 2000, persistent changes to the distributions of TN90p and TN10p had occurred over the majority of land covered by the HadEX2 observations (65% for TN90p and 70% for TN10p), while changes in the distributions of TX90p and TX10p had occurred for a substantial fraction (22% for TX90p and 32% for TX10p). The TOE was as early as the 1960s for nighttime temperature extremes in some regions, including parts of Eurasia, the Asia-Pacific region, and Australia. We obtain very similar results using a 5-year block-bootstrap K-S test, which minimizes the influence of autocorrelation of temperature extremes on the original K-S test (Fig. S3; see the description for the block-bootstrap K-S test in the figure caption). Considering the relatively higher signal-to-noise ratio for temperature extremes in the tropics where these is no observational data, we suspect that persistent changes in the distribution of temperature extremes may have had occurred in the tropics as well. Further, we find that persistent changes to temperature extremes, especially the nighttime temperature extremes, tend to emerge earlier and are more widespread than persistent changes to annual mean temperature (Fig. S4). This is probably because a small shift in the distribution of daily temperature may substantially affect the occurrence probability of extremes hot/cold days and nights21. The use of different observational datasets for deriving the percentile-based extreme temperature indices and for annual mean temperature might also play a role.

Figure 1 Persistent changes to TN90p (hot nights) and TX90p (hot days) have already occurred over large parts of the Earth and climate models underestimate these persistent changes. Top panels show time of emergence (TOE) of persistent changes to TN90p (a) and TX90p (b) derived from HadEX2 observations. Warm (cool) color marks regions where the emergence of persistent changes occurs in the direction consistent with warming (cooling). Gray color marks regions for which there is no emergence in HadEX2 observations by the year 2000. White regions have no data. See Fig. S7 for CMIP5 results corresponding to these panels. Bottom panels show the fraction of CMIP5 ‘Historical’ simulations that exhibit a delay of emergence of persistent changes to TN90p (c) and TX90p (d) or show emergence in a direction opposite to that observed (see Methods). It is noted that simulated emergence in the opposite direction to observed is restricted mainly to the ‘warming hole’ in southeast/central USA and to a few ensemble members (see Fig. S8). See Fig. S2 for TN10p (cold nights) and TX10p (cold days). The map is produced using R version 3.0.3 software (https://www.r-project.org/). Full size image

The observed persistent changes to temperature extremes represent a shift towards more hot days and nights, and fewer cold days and nights, consistent with first-order expectations for a warming world. There are some exceptions to the direction of these shifts (grid cells shaded with cool colors in Fig. 1a,b, and stippled in Fig. 2), such as parts of the ‘warming hole’ in the southeast/central USA20, where cold days and nights have increased while hot days and nights have decreased. Several factors could potentially contribute to the ‘warming hole’, such as anthropogenic aerosol emissions, land cover change, unforced internal climate variability (e.g., North Atlantic Oscillation and Pacific Decadal Oscillation) or a combination thereof22,23,24. With the continued emissions of greenhouse gases and the reduction of anthropogenic aerosol emissions, warming in this region is expected to increase in future decades22.

Figure 2 Underestimated emergence of persistent changes to TN90p (hot nights) and TX90p (hot days) in CMIP5 ‘Historical’ simulations is linked to a combination of biases in the simulated change (‘signal’) and the simulated variability (‘noise’). Panels show the fraction of CMIP5 ‘Historical’ simulations with signal (a,b), noise (c,d) and signal-to-noise ratio (e,f) of TN90p (left panel) and TX90p (right panel) that would result in a delay of emergence or produce emergence in the opposite direction to observed (see Method). Signal is approximated as the absolute total linear trend in temperature extremes over 1921–2005 and noise as the standard deviation of residuals after removing this linear trend. Stippling indicates where the linear trend in HadEX2 observations is consistent with cooling rather than warming (i.e., a negative trend for TN90p and TX90p). Gray color marks regions for which there is no emergence in HadEX2 observations by the year 2000. White regions have no data. See Fig. S10 for TN10p (cold nights) and TX10p (cold days). The map is produced using R version 3.0.3 software (https://www.r-project.org/). Full size image

Consistent with some earlier studies25,26, we find that the changes in the distribution of temperature extremes are primarily the result of a significant shift in the center of the distribution of the examined temperature extremes, rather than a change in the variability of temperature extremes, as reflected by the nearly identical patterns of emergence of persistent changes in the mean of the distribution of temperature extremes (Fig. S5), and the lack of emergence of persistent changes in the variance of the distribution of temperature extremes (Fig. S6). (See captions of Figs S5,S6 for the approaches used for the detection of persistent changes in the mean and variance of the distribution of temperature extremes, respectively.)

We find that climate models largely underestimate how quickly persistent changes in the distribution of temperature extremes have emerged in the direction consistent with warming during the historical period, or fail to represent emergence consistent with cooling (such as the emergence in TX90p over the U.S. ‘warming hole’) (Fig. 1c,d for TN90p and TX90p and Fig S2c,d for TN10p and TX10p). The ensemble median TOE calculated from the CMIP5 ‘Historical’ simulations fails to show emergence by the year 2000 over almost all land area covered by the HadEX2 observations (Fig. S7). In fact, on average across the ‘Historical’ ensemble members, 19–30% of the land with data coverage exhibits emergence of persistent changes consistent with warming (28% for TN90p, 18% for TX90p, 30% for TN10p, and 19% for TX10p). These percentages are considerably smaller than those in the observations, with the exception of TX90p (55% for TN90p, 16% for TX90p, 67% for TN10p, and 28% for TX10p). Furthermore, over 74–92% of the land where persistent changes have already occurred in HadEX2 (74% for TN90p, 92% for TX90p, 83% for TN10p, and 91% for TX10p), more than 80% of the ensemble members in the ‘Historical’ simulations either exhibit a delay of emergence or show emergence in a direction opposite to that observed (Fig. 1c,d and Fig. S2c,d). It is noted that over the regions where persistent changes have already occurred, the simulated emergence that is opposite to the observed direction occurs in a few ensemble members only, and mainly in the U.S. ‘warming hole’ (Fig. S8). Similar underestimation of persistent changes is also found in annual mean temperature (Fig. S4).

We find that the discrepancy between the observed and simulated TOE is unlikely to be caused by internal climate variability (Fig. S9; see the descriptions in the figure caption for the approaches used for statistical tests of this sort). Rather, it primarily results from the joint effects of biases in the externally forced response (signal) and internal climate variability (noise) (Fig. 2 for TN90p and TX90p and Fig. S10 for TN10p and TX10p). When comparing the observed signal to the simulated ‘Historical’ signal (which are approximated by the absolute total linear trends in temperature extremes over the period 1921–2005), we find a marked underestimation over a large majority of land where persistent changes have already occurred (Fig. 2a,b and Fig. S10a,b). Comparing observed noise to simulated noise (which is approximated by the standard deviation of residuals after removing a linear trend) reveals a consistent overestimation throughout the emerged domain, with more than 90% of the ensemble members exhibiting excessive noise over almost all land area covered by the HadEX2 observations (Fig. 2c,d and Fig. S10c,d), which could be due to overly strong land-atmosphere feedbacks in climate models27. Taken together, the signal-to-noise ratio is underestimated (Fig. 2e,f and Fig. S10e,f), implying that the simulated temperature extremes require a longer time to exceed the internal variability than is seen in observations. Over the U.S. ‘warming hole’ where the observed emergence has occurred toward cooling, we find fewer than 10% of the ensemble members having a cooling signal that is as strong or stronger than observed. As a result, models are unable to capture the observed emergence in this region as well.

Although the CMIP5 ensemble exhibits biases in both the simulated signal and the simulated noise, biases in the simulated signal appear to play a larger role in delaying the TOE (Figs S11,S12). Based on the ‘Historical’ simulations, only 17–36% of the observed emergence area (36% for TN90p, 17% for TX90p, 28% for TN10p, and 18% for TX10p) is captured by the simulations (i.e., the observed TOE falls in the 16–84% range of the simulated TOE, which is equivalent to ± σ for a Gaussian distribution but is more suitable for measuring the dispersion of a non-Gaussian distribution, such as the right-truncated distribution of TOE at 2000, where σ is the standard deviation of the Gaussian distribution19). This percentage increases to 51–75% (75% for TN90p, 51% for TX90p, 63% for TN10p, and 67% for TX10p) when correcting for the bias in signal (Fig. S9a,d), but only increases to 32–67% (67% for TN90p, 44% for TX90p, 48% for TN10p, and 32% for TX10p) when correcting for the bias in noise (Fig. S12a,d; see Methods). Furthermore, we estimate that over the land where more than 84% of the ensemble members in the ‘Historical’ simulations exhibit a delay of emergence (i.e., fail to show emergence by the year 2000 or exhibit a later TOE than observed), biases in the signal have delayed the emergence by ~1–2 decades (Fig. S11e,f). In contrast, biases in the noise have delayed the emergence by <1 decade (Fig. S12e,f). These results imply that improvement in the simulation of the externally forced response is likely to yield the greatest improvement in prediction of the TOE, although the role of internal variability should not be neglected, especially for regions where the externally forced response is relatively weak. These results also suggest the potential benefits of bias correction procedures that can reduce uncertainties in the projected TOE of future persistent changes to temperature extremes.

Although internal climate variability may delay or accelerate the emergence of persistent changes to a climate variable4,19, internal variability alone is unlikely to have caused the observed persistent changes in the distribution of daily-scale temperature extremes (Fig. 3 for TN90p and TX90p and Fig. S13 for TN10p and TX10p). To test the role of internal variability in creating persistent changes, we implement our analysis on an ensemble of 540 85-year time series of temperature extremes, drawn from the bias-corrected ‘piControl’ simulations using a block-bootstrap approach (see Methods). The 85-year block-bootstrap is designed to mimic the length of the 1921–2005 historical period, and to maintain the spatial-temporal correlations of temperature extreme fields. Since the overestimation of internal variability (Fig. 2c,d and Fig. S10c,d) may lead to an underestimation of the chance of temporary emergence induced by internal variability, a bias correction procedure is implemented to adjust the simulated internal variability to be consistent in magnitude with the HadEX2 observations. It is found that the likelihood of spurious emergence due to internal variability alone is less than 5% (Fig. 3 and Fig. S13), implying that the observed changes are unlikely to have arisen from internal variability alone.

Figure 3 Emergence of persistent changes to TN90p (hot nights) and TX90p (hot days), cannot be explained by natural external forcing, but is likely due to anthropogenic influence, especially anthropogenic emissions greenhouse gases. Panels show the fraction of simulations exhibiting emergence consistent with warming by the year 2000 in ‘HistoricalNat’ simulations (a,b), ‘Historical’ simulations (c,d) and ‘HistoricalGHG’ simulations (e,f) of TN90p (left panels) and TX90p (right panels). White regions have no data. See Fig. S14 for TN10p (cold nights) and TX10p (cold days). The map is produced using R version 3.0.3 software (https://www.r-project.org/). Full size image

Similarly, persistent emergence is not detected by the end of the historical period in the ‘HistoricalNat’ simulations, which lack anthropogenic forcings (Fig. 4a,b for TN90p and TX90p and Fig. S14a,b for TN10p and TX10p). In contrast, the ‘HistoricalGHG’ simulations, which are forced only by anthropogenic increases in greenhouse gas concentrations, result in >50% of ensemble members showing emergence consistent with warming over most of the areas of observed emergence, including >90% of such ensemble members over large areas of observed emergence in nighttime extremes (Fig. 4e,f and Fig. S14e,f). Combined with our results that internal variability alone is unlikely to have caused the observed emergence (Fig. 3 and S11), these qualitative comparisons indicate that the historical emergence of persistent changes in temperature extremes consistent with warming is more likely to be anthropogenic than natural in origin. Our results are in agreement with the optimal fingerprinting-based attribution studies of absolute changes in the examined extreme temperature indices (e.g., trend)28,29, as well as extreme event attribution studies based on a similar emergence analysis30, thus adding to an increasing body of evidence of the anthropogenic influence on temperature extremes. Based on the present analysis, however, we could not ascertain whether the detected emergence in the U.S. ‘warming hole’ is due to anthropogenic activity.

Figure 4 Emergence of persistent changes to TN90p (hot nights) and TX90p (hot days) is unlikely to be explained by internal variability alone. Panels show the fraction of simulations exhibiting emergence consistent with warming by the year 2000 in an ensemble of 540 85-year time series of TN90p (a) and TX90p (b) drawn from the bias-corrected ‘piControl’ simulations in terms of a block-bootstrap approach to mimic the length of the 1921–2005 historical period (see Methods). A bias correction is implemented to adjust the simulated internal variability to be consistent in magnitude with the HadEX2 observations. White regions have no data. See Fig. S13 for TN10p (cold nights) and TX10p (cold days). The map is produced using R version 3.0.3 software (https://www.r-project.org/). Full size image

Our model assessment focuses mainly on regions where persistent changes had already occurred by the year 2000. In regions where persistent changes had not emerged up to the year 2000, we find that over half of the ensemble members of the ‘Historical’ simulations also do not show emergence (Fig. S7). On the other hand, we find that the ratios of the observed 1921–2005 linear trends to the noise of climate variability are generally consistent with the 16–84% range of the simulations over most parts of these regions (Fig. S15). We therefore expect that future emergence of persistent changes to temperature extremes in these regions is likely to be reasonably simulated by climate models. Moreover, due to the lack of sufficient observational data in the tropics and over large portions of the southern hemisphere, at present we cannot access observed TOE and model performance in these regions. In this sense, our analysis is restricted by the observational data availability to the northern hemispheric land areas with relatively lower signal-to-noise ratio for temperature extremes.

We note that the reported underestimation of signal in temperature extremes over many parts of the land where persistent changes have already occurred is not inconsistent with existing studies20,28,29,31,32,33, which report that in general climate models reproduce the observed temperature extremes reasonably well at the global scale, but may underestimate changes in different extreme temperature indices in some sub-continental regions.