We also analyzed two additional well‐known atmospheric indices based on the observed and simulated sea level pressure (SLP) data using, once again, 20CR to define the observed SLP indices. These indices were the North Atlantic Oscillation index (NAO) [ Hurrell , 1995 ; Hurrell and Deser , 2009 ] and an analog of the Aleutian Low Pressure index ALPI [ Beamish et al., 1997 ]. We computed the NAO and ALPI as the leading principal components of monthly wintertime (DJFM) SLP over the Atlantic (15°N–75°N, 90°W–10°W) and Pacific (15°N–75°N, 130°E–120°W) regions, respectively (see Kravtsov et al. [ 2014 ] for the resulting loading patterns). In addition, we used the station‐based NAO index ( https://climatedataguide.ucar.edu ) as an alternative NAO estimate. We normalized the observed and simulated monthly NAO and ALPI indices to have the unit standard deviation and then formed and analyzed their DJFM mean 1880–2005 time series.

Following Steinman et al. [ 2015a ] and KC201 7 , we computed the observed SST indices by averaging over three SST products, namely the Hadley Centre Global Sea Ice and Sea Surface Temperature (HadISST) [ Rayner et al. , 2003 ], National Oceanic and Atmospheric Administration (NOAA) Extended Reconstructed Sea Surface Temperature (ERSST) [ Xue et al. , 2003 ; Smith et al. , 2008 ], and Kaplan SSTs [ Kaplan et al. , 1998 ; Parker et al. , 1994 ; Reynolds and Smith , 1994 ]. Steinman et al. [ 2015a ] and KC2017 used Goddard Institute for Space Studies (GISS) Surface Temperature (GISTEMP) [ Hansen et al. , 2010 ] Northern Hemisphere mean as their estimate for the observed NMO. This estimate may, however, be slightly different from the one computed using the original NMO definition, which only involves the data from 0°N to 60°N. To alleviate this possible discrepancy, we here used instead the gridded surface temperature product from NOAA's twentieth century reanalysis (20CR) [ Compo et al. , 2011 ] and estimated the observed NMO using the exact recipe above. We must note that our main results are largely insensitive to the version of the observed NMO index used.

Three of the indices considered were identical to the ones used in Steinman et al. [ 2015a ] and KC2017. They represent the Atlantic Multidecadal Oscillation (AMO) [ Kerr, 2000 ; Enfield et al., 2001 ], Pacific Multidecadal Oscillation (PMO) [ Steinman et al., 2015a ], and Northern Hemisphere Multidecadal Oscillation (NMO) [ Steinman et al., 2015a ]. The AMO and PMO indices were computed as the sea surface temperature (SST) area averages over the 0°N–60°N, 80°W–0° and 0°N–60°N, 120°E–100°W regions, respectively. The NMO index was formed by computing the 0°N–60°N mean surface atmospheric temperature (SAT) over ocean and over land separately, and then adding them together with the weights of 0.61 and 0.39 reflecting the ocean/land fractional composition of the entire Northern Hemisphere. Prior to further analysis, we computed the annual mean values for all indices and formed anomalies by subtracting their 1880–2005 climatology.

We analyzed observed and simulated variability in a set of climate indices (1880–2005) subjectively chosen to sample the Northern Hemisphere's large‐scale climate dynamics. We computed these indices in historical CMIP5 simulations encompassing the all‐forcing (historicalALL), greenhouse gas (historicalGHG) forcing, and natural forcing (historicalNat) runs, as well as in the preindustrial control runs (Table 1 ). To process control runs, we further created 1000 snippets—126 year long each, to match in length the duration of historical runs—randomly pulled from the multimodel ensemble of long preindustrial simulations available.

2.2 Methodology

Following KC2017, we first estimated forced signals for each of the 17 individual models considered as the 5 year low‐pass filtered ensemble mean over the set of a given model's twentieth century simulations (for each index), using the data adaptive filter of Mann [2008]. Subtracting the forced signals of a given model from this model's individual simulations provides multiple realizations of this model's simulated internal variability. The resulting estimates of internal variability are, however, necessarily characterized by smaller variance relative to that of the true internal variability, since the forced signal used to define them in fact contains some of the true internal variability due to insufficient averaging over a small number of individual model runs [Steinman et al., 2015b; Frankcombe et al., 2015]. To alleviate this problem, we derived model‐dependent and frequency‐dependent inflation factors (Figure S1 of the supporting information), which allowed us to obtain variance‐bias‐corrected estimates of the simulated internal variability for all of the 111 model simulations considered (see section S1 of the supporting information). These inflation factors were computed from synthetic stochastic versions of the individual model simulations that mimic their statistical characteristics (section S2 of the supporting information) [Kravtsov et al., 2005, 2010; Mann et al., 2016].

As a byproduct of the latter procedure, we also obtained 100 estimates of the synthetic forced signals for each of the 17 models available, totaling 1700 forced‐signal estimates. Following Steinman et al. [2015a, 2015b], Frankcombe et al. [2015], Kravtsov et al. [2015], and KC2017, we then rescaled each of these forced‐signal estimates via least squares to best match the full observed time series of the climate index considered; this rescaling is meant to correct for climate sensitivity biases of individual models. Figure S2 of the supporting information displays the multimodel ensemble mean forced signals so estimated and their uncertainty—defined as the spread among the 1700 available estimates of the forced signal—along with the original (raw) observed indices. Subtracting these forced signals from the raw indices provides 1700 estimates of the internal variability in each of the five observed indices considered. Following Frankcombe et al. [2015], we also computed three additional estimates of the forced signal and the residual internal variability in observations based on rescaled multimodel ensembles means (MMEMs) from CMIP5's historicalALL, historicalGHG, and historicalNat simulations using one‐, two‐, and three‐factor scaling methods (see section S3 of the supporting information for details).

We then analyzed the resulting 111 (variance‐bias‐corrected) time series of the simulated internal variability, as well as 1700 (KC2017 synthetic ensemble) + 3 (MMEM scaling approaches) estimates of the internal component of the observed internal variability for our network of five climate indices using the multichannel version of the Singular Spectrum Analysis (SSA) Broomhead and King, 1986; Elsner and Tsonis, 1996] called M‐SSA [Moron et al., 1998; Ghil et al., 2002]. M‐SSA is an extended variant of a widely used Empirical Orthogonal Function (EOF) analysis technique [Monahan et al., 2009], which looks for the space‐time patterns that maximize lagged covariance for a given multivariate time series within a range of M lags; M is referred to as the embedding dimension of M‐SSA. The original raw time series can be fully recovered as the sum, over all modes, of the so‐called reconstructed components (RCs) associated with each M‐SSA mode. Prior to M‐SSA analysis, we smoothed the internal components of the NAO and ALPI indices using the 7 year running mean boxcar filter to focus on multidecadal time scales. In addition, each index of the five‐index network—whether observed or simulated—was normalized to have the unit standard deviation. We performed M‐SSA analysis using embedding dimensions from M = 20 to 40; all of the results reported below are robust with respect to the choice of M.

We verified the above KC2017 methodology for isolating the forced and internal components of variability using the simulations from the Community Earth System Model (CESM) Large Ensemble Project (LENS) [Kay et al., 2015]: http://www.cesm.ucar.edu/projects/community‐projects/LENS/. In particular, we showed that our methodology provides estimates of the two components consistent with those based on averaging realizations over the entire LENS ensemble (section S5 and Figures S4 and S5 in the supporting information).