Here, using the rigorous “fingerprint” method and HadISDH observational records for surface humidity and temperature, we first examine whether anthropogenic influence is detectable in the historical summer WBGT changes by comparing the evolution of WBGT in the observations and in model simulations under different external forcings over northern hemispheric land areas with sufficient observations. We then assess the extent to which anthropogenic influence has altered the likelihood of record high summer WBGT during 1973–2012 in different land regions. Results from the “fingerprint” detection analysis are used to constrain projections of the future as observation‐constrained projections give us better confidence compared to native model projections (e.g., Allen et al., 2000 ; Stott & Kettleborough, 2002 ).

To determine whether anthropogenic influence has had a detectable impact on environmental conditions conducive to heat stress, Knutson and Ploshay ( 2016 ) compared observed trends in global and regional mean summer WBGT since 1973 with those simulated by climate models. They found that the observed trends are consistent with model simulations only if anthropogenic forcing is included. Nevertheless, a rigorous statistical comparison of models and observations (e.g., a “fingerprint” analysis) is still needed to reinforce our confidence in the complex causes of WBGT change. WBGT and its variants have also been used to project future changes in heat stress environmental conditions (e.g., Diffenabugh et al., 2007 ; Dunne et al., 2013 ; Fisher & Knutti, 2012 ; Pal & Eltahir, 2015 ; Willet & Sherwood, 2012 ; Zhao et al., 2015 ). Most of these studies rely primarily on simulations with little reference to observations, due primarily to the paucity of homogenous long‐term humidity records. As climate models tend to underestimate typical heat stress indices (e.g., WBGT) over many parts of the world such as in the humid tropics and subtropics (Zhao et al., 2015 ), projections of heat stress indices constrained by past observations would potentially be more reliable for future adaptation planning. The recently compiled HadISDH observational records for surface humidity and temperature (Willett et al., 2014 ) enable us to disentangle the complex causes of WBGT change through a statistically rigorous “fingerprint” analysis, and to explore future WBGT change constrained by observations.

Summer mean air temperature can be an important indicator of environmental conditions conducive to heat stress. Sun et al. ( 2014 ) showed that higher summer temperatures are directly connected to a greater number of heatwave days defined as daily maximum temperature greater than 35°C and an earlier start and later end of heatwave seasons in Eastern China. Air temperature alone is not necessarily the best indicator of environmental conditions conducive to heat stress on human health, however. Humans maintain a body core temperature near 37°C through convection, conduction, radiation, and evaporation. A human body core temperature above 39°C will create health risks (Parsons, 2003 ). Hot temperatures with high humidity reduce the ability of the human body to dissipate metabolic heat, leading to increased body core temperature. It is therefore the combination of heat and humidity that really matters (Sherwood & Huber, 2010 ). Wet bulb globe temperature (WBGT), an indicator that takes temperature and humidity into account, is often used in the management of physical human workloads in direct sunlight, such as endured by athletes, outside workers, and military personnel.

It is now well recognized that the likelihood of extreme hot summers has significantly increased due to anthropogenically induced warming. Stott et al. ( 2004 ) showed that human influence had more than doubled the likelihood of the extremely hot European summer of 2003 at the time, and human influence has further increased the likelihood after 10 years with additional warming (Christidis et al., 2014 ). Sun et al. ( 2014 ) estimated that anthropogenic influence has increased the likelihood of the historically hot summer of 2013 in Eastern China to more than 60 times its preindustrial level. Building on Sun et al. ( 2014 ) and Mueller et al. ( 2016 ) systematically examined the occurrence of record hot summers since 1950 in many land regions. They found that human influence has made record high summer temperature at least 10 times as likely and projected that the historically hottest summers would become the norm for more than half of the world's population within 20 years.

2 Materials and Methods

2.1 Wet Bulb Globe Temperature Among the various heat stress indictors (Buzan et al., 2015), WBGT is the ISO standard for quantifying human thermal comfort (ISO, 1989). It has well validated thresholds that relate directly to levels of industrial and military labor capacity (Willet & Sherwood, 2012). For example, the U.S. military recommends a cycle of 30 min work and 30 min rest for heat‐acclimated individuals to conduct heavy labor (350–500 kcal/h) under an environment with WBGT between 31.1 and 32.2°C (88–90°F). Different methods have been used to compute WBGT (Lemke & Kjellstrom, 2012). Two general types of WBGT have been calculated to respectively represent outdoor and indoor conditions. WBGT for indoor conditions involves the use of dry bulb air temperature as well as humidity. The calculation of outdoor WBGT additionally involves the use of wind and some measure of direct sunlight exposure. Due to the lack of reliable wind and sunlight data, we compute WBGT (°C) as a weighted sum of wet bulb temperature (WBT, °C) and air temperature (TA, °C) such that WBGT = 0.7 × WBT + 0.3 × TA (Liljegren et al., 2008). The resulting temperature represents a lower limit for WBGT in well shaded or indoor conditions, similar to that studied by Dunne et al. (2013) and Knutson and Ploshay (2016). We compute WBT (and thus WBGT) based on TA and relative humidity (RH) following the method of Stull (2011). This method has been shown to be reliable for TA between −20 and 50°C and for 5–100% RH.

2.2 Observational Data and Climate Model Output We use observed monthly mean surface TA and RH for 1973–2012 from the recently compiled HadISDH2.0.1.2015p 5° × 5° gridded dataset (Willett et al., 2014). We use simulated monthly mean surface TA and RH from climate models participating in the Coupled Model Intercomparison Project Phase 5 (CMIP5) to estimate climate responses to external forcing and natural internal variability. Simulations from a particular model are selected if there are at least three ensemble members under the same external forcing and if historical simulations are either available or can be extended to the year 2012. These include 78 simulations from 14 models with historical variations in all forcing agents (ALL), including greenhouse gases, aerosol, ozone, land cover, and natural external forcing, 30 simulations from 6 models with historical greenhouse‐gas changes (GHG), and 42 simulations from 7 models with historical natural forcing only (NAT). Many CMIP5 ALL simulations end in the year 2005. We extend them to 2012 with either extended ALL simulations from the historicalExt experiment, or if not available the corresponding projections under the RCP8.5 emissions scenario (Representative Concentration Pathway emissions scenario) with approximate total radiative forcing in 2100 relative to 1750 of 8.5 W m−2. The use of other RCPs for data extension makes little difference because radiative forcing differs negligibly between RCPs over the period 2006–2012 (Intergovernmental Panel on Climate Change [IPCC], 2013). We also use over 18,000 model‐years of preindustrial control simulations from 35 models for estimating internal variability. Projection of future heat stress is based on 49 RCP8.5 simulations from 12 climate models. See Table S1, Supporting information for details on the climate models and simulations.

2.3 Data Processing We calculate the observation‐based estimates of summer WBGT by averaging monthly values over three summer months if all monthly values are available; the estimate is marked as missing otherwise. Limited data availability restricts our analysis to the region between 10 and 60°N (Figure 1). As such, the three summer months are June, July, and August. Summer WBGT anomalies relative to a 30‐year baseline period 1973–2002 are then calculated for grid cells with 27 or more years of data during the baseline period, and are otherwise flagged as missing. Grid cells with at least 36 years of anomalies data during 1973–2012 are analyzed. Figure 1 Open in figure viewer PowerPoint Summer wet bulb globe temperature (WBGT) has increased during 1973–2012. Panels show the 1973–2012 trends (°C) in summer (Jun–Jul–Aug) WBGT for HadISDH observations (a), for CMIP5 model responses to ALL (b), GHG (c), OANT (d), and NAT (e). Regions without sufficient observations are marked as white. ALL, GHG, OANT, and NAT denote respectively to modeled responses to historical natural and anthropogenic forcings, historical greenhouse‐gas changes, historical anthropogenic forcings other than greenhouse gases, and historical natural forcings. Modeled responses to ALL, GHG, and NAT are estimated as the multi‐model ensemble means of the corresponding native model simulations. The model simulated OANT response is estimated by subtracting the sum of responses to GHG and NAT from ALL. We compute monthly values of WBGT with monthly TA and RH on the native grids of the models to preserve internal consistency between TA and RH. These WBGT data are then regridded to the 5° × 5° HadISDH grid using the four nearest grid cell values and inverse distance weighting. Model ocean grid cells are excluded from the interpolation. The regridded ALL, GHG, and NAT simulations are masked (set to missing) to mimic the availability of observations. We divide preindustrial control simulations into 474 nonoverlapping 40‐year segments to match the 1973–2012 historical period, and then similarly interpolate and mask each segment. Summer WBGT anomalies relative to 1973–2002 are then calculated for all simulations in the same way as for the observations. We consider changes of summer WBGT in 10 SREX land regions (Seneviratne et al., 2012) between 10 and 60°N with sufficient observations (Figure 2a). These regions include WNA (West North America), CNA (Central North America), ENA (East North America), NEU (North Europe), MED (Mediterranean), NAS (North Asia), CAS (Central Asia), TIB (Tibetan Plateau), EAS (East Asia), and SAS (South Asia). Figure 2 Open in figure viewer PowerPoint The observed increase of summer wet bulb globe temperature (WBGT) can be reproduced only when anthropogenic forcing is included. (a) Geographic boundaries of the 10 SREX regions between 10 and 60° N (WNA, Western North America; CAN, Central North America; ENA, Eastern North America; NEU, Northern Europe; MED, Mediterranean Basin; NAS, North Asia; CAS, Central Asia; TIB, Tibet; SAS, South Asia; EAS, East Asia). (b) Observed and simulated 5‐year mean summer WBGT. To plot this figure, we reconstituted full WBGT values from the observed and simulated anomalies. We added the observed 1973–2002 summer WBGT climatology to the observed anomalies to obtain the full observed summer WBGT values, added the observed climatology to the model‐simulated anomalies under ALL forcings to obtain climatologically‐bias‐adjusted summer WBGT values under ALL, added the observed climatology minus the modeled climatological difference between ALL and NAT to the NAT anomalies to obtain climatologically bias‐adjusted values under NAT, and added the observed climatology minus the modeled climatological difference between ALL and GHG to the GHG anomalies to obtain climatologically‐bias‐adjusted values under GHG. Solid lines show the time series for HadISDH observations (black), adjusted CMIP5 model responses to ALL (yellow), GHG (purple), and NAT (green) forcings. Shaded bands represent the 5–95% uncertainty ranges of the corresponding adjusted simulations. (c) Best estimates of scaling factors (white horizontal lines) and their 5–95% uncertainty ranges (bars) for the global spatiotemporal fingerprint analyses. (d) The observed trend in land average summer WBGT (solid black line), and the best estimates (white horizontal lines) and 5–95% ranges (bars) of the attributable trends to ANT (pink) and NAT (green) from the 2‐signal analysis, and to GHG (purple), OANT (blue) and NAT (green) from a three‐signal analysis.

2.4 The Optimal Detection Methodology 2003 2013 Y is the vector of observations whose elements are the 5‐year mean summer WBGT anomalies of different spatial regions (as will be further explained); Y* is the climate response to all external forcings; ϵ Y represents the unforced internal variability in the observations; X* is the matrix of true but unknown modeled responses to different external forcings, with each column corresponding to the response to a particular forcing; X is the matrix of estimates of modeled responses to different external forcings based on the available simulations; ϵ X represent the effects of internal variability on the estimates of modeled responses that cannot be averaged out due to limited ensemble size; β is the vector of scaling factors to be estimated. The estimates of modeled responses are multi‐model ensemble means. The optimal analysis requires two independent estimates of the covariance matrix of the internal variability, which are obtained using a regularized estimator (Ribes et al., 2013 We compare observed and modeled summer WBGT using the total least squares based optimal fingerprinting method (Allen & Stott,) as implemented by Ribes et al. (). The method represents the observations as follows:whereis the vector of observations whose elements are the 5‐year mean summer WBGT anomalies of different spatial regions (as will be further explained);is the climate response to all external forcings;represents the unforced internal variability in the observations;is the matrix of true but unknown modeled responses to different external forcings, with each column corresponding to the response to a particular forcing;is the matrix of estimates of modeled responses to different external forcings based on the available simulations;represent the effects of internal variability on the estimates of modeled responses that cannot be averaged out due to limited ensemble size;is the vector of scaling factors to be estimated. The estimates of modeled responses are multi‐model ensemble means. The optimal analysis requires two independent estimates of the covariance matrix of the internal variability, which are obtained using a regularized estimator (Ribes et al.,) from both preindustrial control simulations and interensemble differences of historical simulations (see Supporting Information). Detection of the influence of a particular external forcing agent while accounting for the influence of other external forcings enhances our confidence in detection and attribution. To this end, we conduct a two‐signal analysis that uses ALL and NAT responses as predictors in the regression to derive scaling factors for anthropogenic forcing (ANT) and NAT. We also conduct a three‐signal analysis in which ALL, GHG, and NAT responses are predictors in the regression to derive scaling factors for GHG, other anthropogenic forcing agents consisting of aerosols, ozone, and land use change (OANT), and NAT. The online supporting information provides details for the derivation of the relevant scaling factors. Our analyses are conducted on space–time series of summer WBGT consisting of four spatial dimensions and 8 – 1 = 7 temporal dimensions. That is the observations vector Y has 4 × 7 = 28 dimensions. The 8 – 1 = 7 temporal dimensions correspond to nonoverlapping 5‐year means of regional anomalies during 1973–2012 relative to the 1973–2002 baseline period (one temporal degree of freedom is lost in each region because all data involved in the regression analysis are expressed as anomalies). The four spatial dimensions are represented by area weighted averages over four continental regions, with North America consisting of WNA, CNA, and ENA regions, Europe consisting of NEU and MED regions, Northwest Asia consisting of NAS, CAS, and TIB regions, and Southeast Asia consisting of EAS and SAS regions. We also implemented the analyses on space–time series of WBGT consisting of 10 spatial dimensions corresponding to the 10 SREX regions and the same 7 temporal periods (i.e., vector Y has 10 × 7 = 70 elements). Note that all else being equal, a smaller spatiotemporal dimension is preferred because covariance matrix estimation uncertainty is reduced. We obtained qualitatively similar results for the 4 and 10 spatial dimension cases (Figure S1). The scaling factors and their confidence intervals are used to infer whether the modeled responses to external forcings are detectable in the observations and whether the modeled responses are consistent with the observations. A scaling factor with a 90% uncertainty range above zero indicates the detection of the corresponding response in the observations at the 5% significance level. If the confidence interval also includes one, the modeled response is considered to be consistent with the observations. Reliable attribution of observed changes to particular forcings also requires that other competing explanations have been eliminated, that model simulated variability is comparable to those in the observations, and that the mechanisms leading to the changes are understood. As our analyses are based on modeled responses to the most important known external forcings, we use the regression analyses to estimate attributable warming.

2.5 Long‐Term Trend Attribution Trends along with their 90% confidence intervals are estimated using the nonparametric Sen's slope estimator (Sen, 1968). Warming attributable to external forcings is estimated based on the detection results. Choices have to be made when estimating attributable warming at regional scales. These include a choice between the use of results from regional detection analysis for individual regions (e.g., Christidis et al., 2014; Mueller et al., 2016; Stott et al., 2004; Sun et al., 2014), or from a global detection analysis that includes individual regions (e.g. Christidis et al., 2010, 2012) as well as choice about the way in which the detection results are used when computing attributable warming and the associated confidence interval. Estimation based on regional analysis has the advantage of fitting the modeled responses to observations for individual regions, but it also has the disadvantage of having larger uncertainty in regional detection due to the reduced signal‐to‐noise ratio at regional scales. Estimation based on a spatiotemporal global analysis may be a better compromise between the reduction in regional details and the increase of internal variability at the regional scale. In this study, we estimate attributable warming based on detection results of such a global analysis. While the widely used method of estimating attributable warming is to first compute the linear trends in the modeled responses and then multiply the trends by the corresponding scaling factors and their confidence intervals, here we use observation‐constrained reconstructions to estimate attributable warming, based on the method of Christidis et al. (2010). Noise‐contaminated responses. To consider internal variability in the modeled responses, we produce multiple realizations of noise‐contaminated responses by adding a sample of noise to the modeled responses to ANT or NAT forcing. Noise is obtained by scaling the 40‐year segments of preindustrial control simulations by , where ne is the effective ensemble size used in estimating the modeled responses. For the NAT response, the effective ensemble size ne NAT is , where N is the number of climate models and N i is the number of ensemble members for model i. The ANT response is estimated as the difference between ALL and NAT responses, and thus, the effective ensemble size for ANT ne ANT is ne ALL ne NAT /(ne ALL + ne NAT ) , where ne ALL and ne NAT are respectively the effective ensemble sizes in estimating the ALL and NAT responses. As there are 474 segments of preindustrial control simulations available, we can produce 474 samples of the responses to either ANT or NAT forcing with an internal variability component that reflects the variation in multi‐model responses that would be expected in independent replications. Observation‐constrained reconstructions. We generate a sample of 500 plausible estimates of each scaling factor for, for example, the two‐signal analysis (see the Supporting information for the generation procedure). We then apply each estimate of the scaling factors to the 474 samples of noise‐contaminated responses, producing 474 realizations of observation‐constrained responses to either ANT or NAT forcing. Observation‐constrained responses to ALL are obtained by adding the observation‐constrained ANT and NAT responses. These 474 observation‐constrained responses to ANT, NAT, or ALL are then added respectively to the 474 40‐year segments of preindustrial control simulations to produce 474 samples of observation‐constrained reconstructions of 5‐year mean anomalies of summer WBGT during 1973–2012 under the corresponding forcing. Repeating the process across the 500 estimates of scaling factors, we obtain 500 × 474 samples of observation‐constrained reconstructions under ANT, NAT, and ALL forcing. Observation‐constrained reconstructions under GHG, and OANT are obtained similarly based on a three‐signal analysis. Attributable warming and its confidence interval. We use the Sen's slope estimator (Sen, 1968 Our method for calculating attributable trends and the uncertainties involves the following steps:

2.6 Event Attribution for Record WBGT P (1975, NAT) as the probability for WBGT to be at least as large as the record WBGT in the climate around the year 1975 (represented by 1973–1977) in the naturally forced world, and P (1975, ALL) and P (2010, ALL) as the corresponding probabilities in the climates around 1975 (P (1975, ALL) ) and around 2010 (represented by 2008–2012; P (2010, ALL) ) in the world that is influenced by both natural and anthropogenic forcings. We define three risk ratios to assess anthropogenic influence on the occurrence of the record WBGT: The summer with the highest mean WBGT during 1973–2012 is defined as the record hot summer and the corresponding summer mean WBGT as the record WBGT. Table S2 lists the year of occurrence of the record hot summer and the associated record WBGT value for the entire land area and for individual SREX regions. We defineas the probability for WBGT to be at least as large as the record WBGT in the climate around the year 1975 (represented by 1973–1977) in the naturally forced world, andandas the corresponding probabilities in the climates around 1975 () and around 2010 (represented by 2008–2012;) in the world that is influenced by both natural and anthropogenic forcings. We define three risk ratios to assess anthropogenic influence on the occurrence of the record WBGT: These risk ratios quantify how many times as likely the occurrence of the record WBGT is under one climate scenario compared to under another. In particular, RR 1 reflects the impact of anthropogenic influence as of the year around 1975; RR 2 reflects the further impact of changes in external forcing up to the year around 2010, which is expected to be mainly anthropogenic in origin; RR 3 measures the total impact of anthropogenic influence relative to the naturally forced world, assuming only minor differences due to the impact of natural forcing between the year around 1975 (represented by 1973–1977) and the year around 2010 (represented by 2008–2012). We estimate the probabilities and thus the risk ratios as follows. From step (ii) in Section ‘Long‐Term Trend Attribution’, we have obtained 500 × 474 samples of observation‐constrained reconstructions of 5‐year mean anomalies of summer WBGT under NAT or ALL forcing, each consisting of eight values corresponding to the eight nonoverlapping 5‐year mean anomalies during 1973–2012. We use the first and last data values to represent respectively the expected summer WBGT anomalies around the years 1975 and 2010 under the relevant forcing. To represent the year‐to‐year variability in the three epochs (1973–1977 under NAT, 1973–1977 under ALL, and 2008–2012 under ALL), the first data value in each of the reconstructions under NAT, the first under ALL and the last under ALL are then added respectively to the 474 40‐year segments of control simulations of annual summer WBGT anomalies, resulting in samples of 474 × 40 plausible annual anomalies in these three epochs. We transform the resulted anomalies to absolute summer WBGT values by adding the observed 1973–2002 WBGT climatology to the anomalies under ALL, and adding the observed climatology minus the modeled climatological difference between ALL and NAT to the anomalies under NAT. Empirical probabilities P (1975, NAT) , P (1975, ALL) , and P (2010, ALL) are subsequently computed and are then used to compute the defined risk ratios. This process is repeated across the 500 × 474 reconstructions, leading to 500 × 474 estimates of each risk ratio. All estimates are used to derive the median and the 90% confidence intervals of the three defined risk ratios.