An “offline” approach to DA is used, where static ensemble samples are drawn from existing CMIP climate‐model simulations to serve as the prior estimate of climate variables. We use linear, univariate forward models (“proxy system models (PSMs)”) that map climate variables to proxy measurements by fitting proxy data to 2 m air temperature from gridded instrumental temperature data; the linear PSMs are then used to predict proxy values from the prior estimate. Results for the LMR are compared against six gridded instrumental temperature data sets and 25% of the proxy records are withheld from assimilation for independent verification. Results show broad agreement with previous reconstructions of Northern Hemisphere mean 2 m air temperature, with millennial‐scale cooling, a multicentennial warm period around 1000 C.E., and a cold period coincident with the Little Ice Age (circa 1450–1800 C.E.). Verification against gridded instrumental data sets during 1880–2000 C.E. reveals greatest skill in the tropics and lowest skill over Northern Hemisphere land areas. Verification against independent proxy records indicates substantial improvement relative to the model (prior) data without proxy assimilation. As an illustrative example, we present multivariate reconstructed fields for a singular event, the 1808/1809 “mystery” volcanic eruption, which reveal global cooling that is strongly enhanced locally due to the presence of the Pacific‐North America wave pattern in the 500 hPa geopotential height field.

1 Introduction The paleoclimate record provides abundant evidence for significant decadal to centennial variability in a host of climate variables [e.g., Mann et al., 1998; Mayewski et al., 2004; Jones et al., 2009; Masson‐Delmotte et al., 2013], but the instrumental record is too short to adequately characterize this variability. Yet most applications—notably, the evaluation of decadal‐to‐century scale variability simulated by general circulation models (GCMs)—are facilitated by spatially gridded estimates of climate fields such as surface air temperature, sea level pressure, geopotential height, and precipitation. Consequently, there is a pressing need for methods that can accurately estimate such fields from the sparse, noisy, and indirect observations of past climates obtained from climate proxy records. Since the high‐profile work of Mann et al. [1998, 1999], many investigators have applied a range of methods to infer climate fields from proxy observations, a process known as climate field reconstruction (CFR) [Tingley et al., 2012, and references therein]. These methods fall into three main categories: Regression Models. The most traditional approach employs multivariate regression of climate variables onto the proxy records, in which the relationship between proxies and fields of interest are obtained from a well‐observed “training” period and are assumed to be stationary [e.g., Mann et al., 1998, 1999; Evans et al., 2002; Luterbacher et al., 2004; Cook et al., 2004; Rutherford et al., 2005; Mann et al., 2009; Cook et al., 2010; Barriopedro et al., 2011]. A principal challenge is that the grid size often greatly exceeds the number of samples available for calibration. This underdetermined setting introduces a major challenge: the sample covariance matrix used to quantify the relationship between proxies and climate fields is poorly constrained and needs to be regularized to yield meaningful estimates of past climates. A popular regularization approach consists of truncating the canonical expansion of the covariance matrix using singular value decomposition. Other approaches include the expectation‐maximization algorithm [Dempster et al., 1977; Schneider, 2001; Little and Rubin, 2002] regularized by ridge regression [Hoerl and Kennard, 1970a, 1970b; Tikhonov and Arsenin, 1977], and truncated total least squares (TTLS) [Van Huffel and Vandewalle, 1991; Fierro et al., 1997]. Other approaches employ canonical correlation analysis [Smerdon et al., 2010] and Markov random fields [Guillot et al., 2015]. Since regularization is not unique, these various approaches yield significantly different estimates of past climate fields [Smerdon et al., 2011; Wang et al., 2014, 2015; Smerdon et al., 2015]. Another difficulty with these regression models derives from the fact that they express climate fields as a function of proxy values, whereas reality works in the opposite direction (e.g., tree growth responds to climate variability; climate does not respond to tree growth). von Storch et al. [2004] point out that this functional assignment leads to reconstructions that overly damp the amplitude of past variability. Inverse regression [e.g., Christiansen, 2010] proposes to alleviate this issue but suffers from the opposite problem: potentially unbounded estimates of past variability [Tingley and Li, 2012]. Bayesian Hierarchical Models. Another major challenge to climate field reconstruction concerns the relationship between proxies and climate variables, which can be nonlinear, thresholded, and multivariate [Evans et al., 2013 2014 Tingley and Huybers, 2010a 2010b Tingley and Huybers, 2013 It includes inference about only one climate field (temperature, in this case); generalizations to multiple dependent fields are difficult. The inference is computationally demanding, even for the simplest hierarchical models. While the BARCAST framework allows many possible extensions, it has so far been applied only to covariance functions that vary with distance, which cannot easily represent the highly anisotropic character of many atmospheric and oceanic features (e.g., mountain ranges, coastlines, jets, and currents). . Another major challenge to climate field reconstruction concerns the relationship between proxies and climate variables, which can be nonlinear, thresholded, and multivariate []. As a consequence, it can be useful to include knowledge on these aspects into an inference process at the data level of a Bayesian hierarchical model. Bayes's theorem offers a natural way to invert the proxy‐climate relation, yielding a fully probabilistic analysis of the predicted climate field that incorporates all known sources of error []. The resulting algorithm, Bayesian Algorithm for Reconstructing Climate Anomalies in Space and Time (BARCAST), has been used to reconstruct hemispheric‐scale surface temperature over the past 600 years []. Regularization of the covariance estimation problem comes from imposing analytical covariance functions as priors, whose parameters are estimated from the data. There are, however, limitations to this approach: Data Assimilation. Paleoclimate data assimilation (PDA) [see, e.g., Dirren and Hakim, 2005 Annan et al., 2005 Goosse et al., 2006 Ridgwell et al., 2007 Widmann et al., 2010 Bhend et al., 2012 Steiger et al., 2014 PDA can infer multiple climate fields simultaneously. While specifying a statistical model that links several fields together is possible in principle, the physical basis of GCMs provides a more natural framework to represent such relationships. In general, PDA does not assume stationary teleconnections, unlike most purely statistical approaches. PDA uses dynamical models to infer spatial relationships within and between climate fields. The fields so reconstructed are, ipso facto, dynamically consistent. PDA allows flexibility in accounting for the dependence of proxies on multiple state parameters (e.g., temperature and moisture controls on tree‐ring width, a common climate proxy) and therefore can better constrain multivariate states. PDA admits proxies having different timescales (e.g., annual and decadal resolution) to participate in a reconstruction, without interpolation or smoothing [Steiger and Hakim, 2015 . Paleoclimate data assimilation (PDA) [see, e.g.,], provides a dynamically motivated alternative to the CFR problem. PDA uses model‐simulated climate states to measure the novel information in proxy data and to spatially spread that information through all climate variables subject to the dynamical constraints of the climate model. This has several advantages: PDA is conveniently posed within a Bayesian framework [Wikle and Berliner, 2007] and offers the opportunity to inject more information at the process level (via GCMs) and at the data level (via proxy system models PSMs, [Evans et al., 2013; Dee et al., 2015]) than other approaches. PDA may be understood as a limiting case of a Bayesian hierarchical model like BARCAST, using climate models as a covariance estimation device. A caveat is that the resulting prior inherits some potentially large biases from climate models [Flato et al., 2013]. This paper describes the PDA‐based framework for the Last Millennium climate Reanalysis (LMR) project. In the lineage of DA‐based reconstructions of meteorological and oceanographic fields [see, e.g., Kalnay et al., 1996; Compo et al., 2011; Dee et al., 2014; Carton and Giese, 2008; Köhl, 2015], the LMR aims to extend the realm of regularly gridded climate fields beyond the instrumental record by making use of the information contained in paleoclimate proxies of the past 1000 years or more. The two main objectives of this paper are the following: first, to document the essential elements of the PDA methodology for paleoclimate reconstruction, which will be used (and improved) in future applications to the archives of existing paleoclimate data, and second, to present results that illustrate and validate the method. With regard to the second objective, the paper focuses on a detailed comparison to existing reconstructions, reanalyses, and instrumental temperature data sets. In this initial effort, we consider a relatively limited subset of proxies and climate models and a simplified representation of proxy—climate–variable relationships. Despite these simplifications, we show that this initial PDA CFR performs on par with reanalysis products based on instrumental observations. Furthermore, the results identify a climate anomaly associated with a previously unconfirmed volcanic event in 1808/1809, which is examined to succinctly illustrate the advantages of a multivariate dynamically constrained reconstruction. The remainder of the paper is organized as follows. In section 2 we present the PDA methodology as implemented by the LMR. Section 3 presents preliminary results on two select climate fields, 2 m air temperature, and 500 hPa geopotential height. The sensitivity of the results to various aspects of the method is investigated in section 4, followed by a case study documenting the dynamical context of the “mystery” volcanic eruption of 1808/1809 [Guevara‐Murua et al., 2014], which illustrates the advantages of the multivariate aspect of the PDA approach. We conclude with a discussion of the merits and limitations of the assimilation system and provide perspective on opportunities to improve estimates of historical climate using a synthesis of advanced models and comprehensive collections of high‐quality proxy data.

2 Method Our experimental design consists of paleoclimate data assimilation applied to the PAGES2K data set described in PAGES2K Consortium [2013]. There are five aspects discussed subsequently: the data assimilation method, the proxy data, the prior data, modeling the proxy data from the prior data, and error analysis. The PDA framework is graphically summarized in Figure 1. Figure 1 Open in figure viewer PowerPoint are compared to the actual proxy measurements y to compute the innovation, . These innovations are then used to update the prior via the Kalman filter equations, which also update the error covariance. The cycle is repeated many (104) times to sample the distribution of the prior ensemble. Conceptual framework for the Last Millennium Reanalysis, outlining our paleoassimilation approach. Starting from the prior (a collection of simulated climate states) from which random draws are pulled, the states are mapped to proxy space via a proxy system model (PSM). These predictionsare compared to the actual proxy measurementsto compute the innovation,. These innovations are then used to update the prior via the Kalman filter equations, which also update the error covariance. The cycle is repeated many (10) times to sample the distribution of the prior ensemble. 2.1 Paleoclimate Data Assimilation In general, data assimilation starts with some prior estimate of the field(s) to be analyzed. This may be an “uninformed” prior, such as a random sample from a model climatology, or a highly informed prior, such as a short‐term forecast from a very accurate weather analysis. In either case, the prior provides an independent estimate of the true value, and the difference between these two estimates of the true value (from the proxy and the prior) is used to update the prior. In order to estimate the observation, a forward model is used to map from the climate variables to the measurement (e.g., linearly interpolating a regularly gridded temperature field to a measurement taken at a specific location). Steiger et al. [ 2014 Oke et al., 2002 Evensen, 2003 Bhend et al., 2012 Steiger et al., 2014 Matsikaris et al., 2015 Kalnay, 2003 (1) The method we employ here is similar to that used in pseudoproxy experiments by] but for real proxies and a number of different calibration and prior data sets. Specifically, we use an “offline” (no cycling) [] data assimilation approach and a fixed prior to solve the update equation of the Kalman filter [see, e.g.,], xp and xa are the prior and analysis state vectors, respectively, which contain the climate variables of interest averaged to a particular time period [Huntley and Hakim, 2010 y contains the proxy data, and is a vector estimate of the proxies based on the proxy system model . The innovation, , represents the new information from the proxies not already contained in the prior. This new information is weighted against the prior by the Kalman gain matrix (2) B is the covariance matrix for the prior data, R is the error covariance matrix for the proxy data, and H is the linearization of about the mean value of the prior. We employ an ensemble square root approach [Whitaker and Hamill, 2002 R as a diagonal matrix, and the diagonal values represent the error variance for each proxy (defined in section Hereandare the prior and analysis state vectors, respectively, which contain the climate variables of interest averaged to a particular time period [], including scalars and grid point data for spatial fields. Vectorcontains the proxy data, andis a vector estimate of the proxies based on the proxy system model. The innovation,, represents the new information from the proxies not already contained in the prior. This new information is weighted against the prior by the Kalman gain matrixwhereis the covariance matrix for the prior data,is the error covariance matrix for the proxy data, andis the linearization ofabout the mean value of the prior. We employ an ensemble square root approach [], which uses a sample estimate for the prior covariance and solves for the ensemble mean and perturbations about the ensemble mean separately. This approach treatsas a diagonal matrix, and the diagonal values represent the error variance for each proxy (defined in section 2.4 ). A 100‐member ensemble is used, and results are insensitive to this choice provided that the ensemble has at least 50 members. ), will differ considerably from the proxy value; i.e., the innovation will be large. How much weight the innovation gets in equation R (see equation r. Given the innovation, which in this case is a scalar value, K determines the weight and transfers information from the innovation (in units of tree ring width) to the 500 hPa geopotential height field. Consider the 500 hPa geopotential height at a single point in the global grid and call it x. From (3) n is the ensemble size and x is a row vector containing the ensemble of n values of 500 hPa geopotential heights (x) at the point, with the ensemble mean removed. The right side of equation K, HBHT is a scalar, representing the variance of the ensemble estimate of the proxy, var(ye) (directly comparable to r). Therefore, we update the ensemble mean 500 hPa geopotential height at the point by (4) ye = Hxp is the prior estimated proxy; i.e., this equation says that the analysis 500 hPa geopotential height at the point is determined by linearly regressing the prior estimate against the innovation. As an explicit illustration of how multivariate fields are recovered, consider a single tree ring width proxy measurement for 1 year and assume a PSM that depends only on 2 m air temperature. We take the prior to consist of 2 m air temperature at the location of the proxy and a globally gridded 500 hPa geopotential height field. The prior fields consist of an ensemble of annual‐mean values, randomly drawn from a long climate simulation. From this, we should expect that the prior estimate of the proxy,), will differ considerably from the proxy value; i.e., the innovation will be large. How much weight the innovation gets in equation 1 depends on(see equation 2 ), which in this example is simply a scalar variance,. Given the innovation, which in this case is a scalar value,determines the weight and transfers information from the innovation (in units of tree ring width) to the 500 hPa geopotential height field. Consider the 500 hPa geopotential height at a single point in the global grid and call it. From 2 whereis the ensemble size andis a row vector containing the ensemble ofvalues of 500 hPa geopotential heights () at the point, with the ensemble mean removed. The right side of equation 3 represents the covariance between the 500 hPa height at the point and the prior estimate of the proxy. In the denominator ofis a scalar, representing the variance of the ensemble estimate of the proxy, var() (directly comparable to). Therefore, we update the ensemble mean 500 hPa geopotential height at the point bywhereis the prior estimated proxy; i.e., this equation says that the analysis 500 hPa geopotential height at the point is determined by linearly regressing the prior estimate against the innovation. 2.2 Paleoclimate Proxy Data For proxy data we use the multiproxy database developed by the PAGES2K Consortium [2013]. Only time series labeled annually resolved have been considered, representing a maximum of 465 time series out of the original 508 in the database. Advantages of this network over previous compilations are its emphasis on global coverage, long time series, and multiple proxies (in the subset used here: tree ring width, mixed latewood density, ice core oxygen isotope ratio, ice core hydrogen isotope ratio, coral oxygen isotopes, coral luminescence, lake sediment varve thickness, marine sediment magnesium to calcium ratio, and speleothem laminae thickness). The PAGES network selection benefits from expert knowledge, where regional groups “identified the proxy climate records that they found were best suited for reconstructing annual or warm‐season temperature variability within their region, using a priori established criteria” [PAGES2K Consortium 2013]. Potential shortcomings of the database include the still‐incomplete geographic coverage (Africa is not represented, and the oceans are poorly sampled). Moreover, some of the proxies have multiple influences in addition to temperature, and the proxies differ in their ability to capture low‐frequency variability. The PAGES database consists of time series in their native units (e.g., ring width and isotopic ratio), and to provide the most direct comparison to the PAGES results, we use exactly the same time series to develop the univariate linear proxy system models (see section 2.4). A revised PAGES2k database, with many improvements including expanded ocean coverage, is under development http://www.pages-igbp.org/ini/wg/2k-network/data and will be used in future iterations of the LMR project. 2.3 Prior Data The data forming the prior state vector are taken from various sources, which serves as a basis for sensitivity experiments described in section 4. These sources, summarized in Table 1, range from existing long preindustrial simulations from the Coupled Model Intercomparison Phase 5 (CMIP5) project [Taylor et al., 2012], to Twentieth Century reanalysis data sets [Compo et al., 2011; Dee et al., 2014] . The 100‐member ensemble used in the data assimilation solver is drawn randomly from a chosen prior source, and the same sample is used for every year. Recall that this sampling strategy reflects an ensemble forecast having no skill over the model climatology and thus represents a random sample from the model climate. For our control reconstruction, we use the CCSM4 last millennium (CCSM4‐LM) simulation, which is a coupled atmosphere‐ocean‐sea ice simulation covering the 850–1850 C.E. time period. It includes variable long‐lived greenhouse gases and stratospheric aerosols reflecting known volcanic eruptions. Incoming solar variability is also included as well as prescribed anthropogenic changes to global land cover. For more details, see Landrum et al. [2013]. Monthly data are averaged to a calendar year, and the temporal mean over the entire data set is removed. The resulting annual‐mean anomalies are then spatially truncated in spherical harmonic space to T42. For the last millennium simulations, this leaves 1000 truncated spatial fields from which we draw the 100‐member ensemble random sample. Table 1. Gridded Data Sets Used to Form a Prior in the Data Assimilation Data Set Acronym References Community Climate System Model version 4, CMIP5 last millennium simulation CCSM4 Taylor et al. [ 2012 Landrum et al. [ 2013 Max‐Planck‐Institute Earth System Model paleomode (MPI‐ESM‐P), CMIP5 last millennium simulation MPI Taylor et al. [ 2012 NOAA‐CIRES twentieth century reanalysis, version 2c 20CR‐V2 Compo et al. [ 2011 ECMWF reanalysis of the twentieth century ERA‐20C Dee et al. [ 2014 Using the offline PDA approach provides several advantages over an online approach. First, the computational cost of a 100‐member ensemble of millennial‐scale simulations is barely feasible at the current time, which means that, at best, only a single reconstruction could be performed. Since current models have little predictive skill on the annual timescale of the proxies considered here, there is little benefit to making such forecasts; the prior ensembles in this case are well approximated by the “climate” prior we use by randomly sampling a long simulation of the same model. An additional computational advantage derives from the fact that since each year is reconstructed independently, the solver can be efficiently parallelized. A second advantage derives from the fact that ensembles from different models can be used as a way to assess the sensitivity of solutions to the source of the prior. Multimodel ensembles, which consist of samples drawn from multiple models, are not considered here. Since we use the same prior ensemble for every year, a third advantage is that the prior anomalies have exactly zero temporal anomaly; all trends and temporal structure in the reconstructions derive from the proxies. Finally, we note that since a forecast model does not need to be initialized, we may be selective about the climate variables we reconstruct (e.g., 500 hPa geopotential height), which is a subspace of that required for model initialization. 2.4 Proxy System Models Evans et al., 2013 Dee et al., 2015 (5) is the annual‐mean 2 m air temperature anomaly from a calibration data set, β 0 ,β 1 are the intercept and slope, and ε is a Gaussian random variable with zero mean and variance σ2. Calibration temperatures concurrent with available proxy data are taken at the grid point nearest the proxy location and the ordinary linear least squares solution determines parameters β 0 ,β 1 ,andσ. Table An essential feature of assimilation is the direct comparison between the measurement (proxy value) and an estimate of the measurement based on the prior. For PDA, this comparison requires a mapping from the climate state variables (e.g., temperature and precipitation) to the quantity that forms the proxy record (e.g., tree ring width). This mapping is achieved by forward modeling the proxy, which is the reverse of the commonly used inverse approach where climate variables are estimated from the proxies. This proxy system model (PSM) can range in sophistication from a simple statistical relationship to more complex mechanistically based models. Indeed, PSMs [] are analogous to the “forward models” used in the data assimilation and estimation literature. In the interest of establishing a baseline measure of performance, in this paper we use a simple linear approach, where each proxy is linearly regressed against a temperature calibration data set during the instrumental period. Such linear, univariate PSMs take the form:whereis the annual‐mean 2 m air temperature anomaly from a calibration data set,are the intercept and slope, andis a Gaussian random variable with zero mean and variance. Calibration temperatures concurrent with available proxy data are taken at the grid point nearest the proxy location and the ordinary linear least squares solution determines parameters,and. Table 2 summarizes the various gridded temperature data sets used for calibration in this study. Table 2. Gridded Temperature Anomaly Data Sets Used for the Calibration of Proxy System Models and for the Verification of Reconstructions Data Set Acronym Reference NASA Goddard Institute for Space Studies Surface Temperature Analysis GISTEMP Hansen et al. [ 2010 Met Office Hadley Centre Climatic Research Unit Temperature, version 4.4.0.0 HadCRUT4 Morice et al. [ 2012 Berkeley Earth Surface Temperatures BE Rohde et al. [ 2013 NOAA Merged Land‐Ocean Surface Temperature Analysis, version 3.5.4 MLOST Smith et al. [ 2008 A critically important aspect of this approach is that the variance of the regression residuals, σ2, is used to define the diagonal elements of matrix R in equation 2. This ensures that the proper weight is applied to the innovation through the Kalman gain. Proxies for which the fit equation 5 is poor have large residual variance and get relatively less weight. As a result, this approach allows for the use of all available proxy data, without having to prefilter the proxies to select only those having a significant correlation with 2 m air temperature. In fact, we find similar results in both cases (not shown), but in the interest of simplicity and transparency, we include all proxies to the PDA solver. It would, of course, be better to fit the proxy data to the climate field to which it is sensitive (e.g., soil moisture for many tree ring width chronologies), but calibration data for these other fields are not as readily available as for temperature. We leave this refinement for future work, along with tests involving more‐sophisticated PSMs. We take the opportunity here to emphasize that the role of the PSM in the PDA algorithm is to predict the proxy value from prior data (i.e., the randomly drawn states from the climate simulation). It is essential to appreciate that the data on which the PSM is calibrated are independent of the prior data. Specifically, the PSM is fit on data from the instrumental period for which there is a trend, but the resulting linear model is used to forecast proxy data with a predictor that comes from millennial‐scale preindustrial simulations that have little or no trend. We will return to this point in section 4. 2.5 Error Analysis 2.5.1 Proxy Sampling Ensemble PDA provides a natural source for uncertainty quantification in terms of the perturbations about the ensemble mean. Nevertheless, we take additional steps in both the construction of the experiments and the verification of the results, to account for uncertainty. In the solution method, we randomly sample from the proxies 100 times to yield a sample of 100 reconstructions; we shall refer to these as Monte Carlo realizations. Specifically, we randomly sample 75% of the proxies to assimilate for the entire 2000 years for a given prior. This has two advantages. First, it provides a measure of uncertainty due to the proxies that participate in the reconstruction. This is rarely, if ever, done in reanalysis projects and is made possible here by the offline PDA approach. Second, the remaining 25% of the proxies provide a source for independent verification, which is particularly useful during the preinstrumental time period. We note that because we have 100 different Monte Carlo realizations, each proxy chronology may participate as both an assimilated and independent proxy. Details of this error analysis are provided in section 2.5.3. 2.5.2 Measures of Skill n values of a climate variable for reconstruction values x, and verification values, v, the correlation is defined by (6) Nash and Sutcliffe, 1970 (7) We verify results for 2 m air temperature and 500 hPa geopotential height against both the calibration and reanalysis data sets summarized in sections 2.3 and 2.4 using two primary verification metrics: correlation and coefficient of efficiency. Given a time series ofvalues of a climate variable for reconstruction values, and verification values,, the correlation is defined byand the coefficient of efficiency (CE) by [ Here an overbar represents a mean value and σ the standard deviation. We note that the correlation measures signal timing and is not affected by errors in signal amplitude or bias. CE is affected by these factors, and as such, it is a useful measure for these aspects of the reconstruction. CE sensitivity to bias depends on the definition of a reference time period to define anomalies, which may be nonoverlapping between proxies, calibration data, and prior data. CE can be negative even when the reconstruction is skillful [e.g., Cook et al., 1999 x i =a, a constant, we find (8) a = 0, as in the case for our millennial‐scale simulation priors with respect to their time mean, CE < 0 when verified against proxies, which in general have a nonzero time mean relative to the same baseline. As a result, for the reconstructed fields, the basis for evaluation should be the reference CE values for the prior; negative values may still reflect improvement upon the prior. When discussing the results, we therefore include both the CE measure and a ΔCE, which is the difference between CE and its value for the prior. A known issue with CE is that if the means over the verification period for the proxies and the verification data are different, thencan be negative even when the reconstruction is skillful [e.g.,]. For example, returning to equation 7 in the caseconstant, we findwhich is negative semidefinite. For= 0, as in the case for our millennial‐scale simulation priors with respect to their time mean, CE < 0 when verified against proxies, which in general have a nonzero time mean relative to the same baseline. As a result, for the reconstructed fields, the basis for evaluation should be the reference CE values for the prior; negative values may still reflect improvement upon the prior. When discussing the results, we therefore include both the CE measure and a ΔCE, which is the difference between CE and its value for the prior. We note that having ensembles admits a wide range of probabilistic verification measures, but with a large range of topics to consider, we opt to leave this to a future report with one exception. Ensemble calibration is an important consideration, as it provides a measure of the performance of the PDA technique. Ensemble calibration is defined in observation space by the ratio of the mean square error of the ensemble mean to the average ensemble variance [e.g., Houtekamer et al., 2005]. In the limit of a large ensemble, this ratio should be in unity, which indicates that the ensemble is drawn from true distribution. Typically, this ratio is larger than one because the ensemble has too little spread for the given ensemble mean error [Hamill, 2001]. We find that the LMR ensembles are well calibrated in both the prior and the reconstructed values. Verified against proxies during 1880–2000, the distribution of ensemble calibration ratios has a median value of 1.03 for both the prior and reconstructed fields; for 0–1879 C.E., the ratio for both is 1.17. 2.5.3 Independent Proxy Data As mentioned above, 25% of the proxies in each experiment are withheld from data assimilation. After the reconstruction is complete, the linear PSM equation 5 is used on the reconstructed temperature field to predict the value of the independent proxies. The predicted values can then be verified against the proxy chronologies themselves using the correlation and CE performance measures.

4 Sensitivity Analysis We turn now to the sensitivity of the results to each component of the PDA framework: PSM, prior, and proxies. In order to effectively summarize the results, we consider only global‐mean air temperature and note that qualitatively similar findings emerge when other dynamical fields are considered. We combine the sensitivity analysis for the PSM calibration and prior data into a set of 16 experiments, each with 100 Monte Carlo realizations for 100‐member ensembles (160,000 reconstructions in all), for all combinations of calibration and prior data (Figure 12). For each calibration data set, the linear PSM equation 5 is fit and used to predict the proxy values from the prior data for that experiment. For example, the MLOST/MPI experiment uses PSMs calibrated on MLOST 2 m air temperature data and the MPI last millennium simulation as the prior; i.e., proxy values are predicted from the MPI using the MLOST‐calibrated PSMs. Figure 12 Open in figure viewer PowerPoint Sensitivity of global‐mean temperature to calibration and prior data sources. Legend entries for each experiment are denoted by calibration/prior data source. Each curve denotes the mean over 100 realizations that randomly sample over 75% of the proxy data using a 100‐member ensemble for each. CE verification against consensus mean is shown in the legend. Results show little sensitivity, and close correspondence to the control reconstruction verified in section 3.1. In all cases, verification is against the consensus global‐mean temperature shown in Figure 4. There is, however, an apparent correspondence between calibration and prior data and skill in the reconstruction. Skill for 20CR‐V2 and ERA‐20C priors are clearly lower than for other priors. This is potentially surprising, since the other priors pertain to preindustrial simulations whereas 20CR‐V2 and ERA‐20C apply to the verification time period. We suspect that the source of this lower skill relates to the lower skill on annual‐mean 2 m air temperature over continents (compare Figure 7 (top row) and Figure 7 (bottom row), and see supporting information) because these reanalysis products rely primarily on surface pressure measurements rather the surface temperature data. The MPI prior has the highest skill for three of the four calibration data sets considered. Skill for each calibration data set averaged over all prior data sets is 0.75, 0.73, 0.67, and 0.62 for MLOST, GISTEMP, HadCRUT4, and BE, respectively. The reconstruction with the highest skill, MLOST/MPI, is the only one with higher skill than the grand ensemble average over all 160,000 experiments. Sensitivity to proxy data has already been considered by randomly sampling the entire network. Here we consider the sensitivity to proxy type. Since proxy data from trees dominate the PAGES2k data set, we perform two experiments: one with all proxies other than trees (27 chronologies total), and one with only tree data (19 tree ring density chronologies and 401 tree ring width chronologies; 420 chronologies total). In both cases, all of the proxy data available were used (i.e., we did not withhold 25% as in all other experiments), and we use MLOST PSM calibration and the CCSM4 for the prior, as for the control. We find that the reconstruction with only trees captures most of the signal in the case with all proxies (Figure 13). Despite having only 27 chronologies, the reconstruction without trees still has a positive skill, but with smaller amplitude in the main signals (trend and multidecadal variability). Figure 13 Open in figure viewer PowerPoint Sensitivity of global‐mean temperature to proxy source (trees only, no trees) and the effect of eliminating the trend from the PSM calibration. Finally, the blue curve in Figure 13 gives the result when the proxy PSMs are fit on detrended data. In this case, both the proxy and the calibration data are detrended over the entire calibration period, and the proxy error variance, diagonal elements of R in equation 2, is defined relative to the detrended time series. Since the signal decreases relative to the noise in this PSM construction, the proxies have less influence, resulting in a shallower trend and less variability in the reconstruction. Our interpretation of this result is that the twentieth century trend provides a large signal on which to calibrate the regression‐based PSM, increasing signal over noise, which is particularly large on interannual timescales. Moreover, the fact that the control experiment, with PSMs calibrated with the trend, captures variability independent of the trend (Figure 6) indicates that the PSM is not simply tuned to capture the trend. Furthermore, we remind the reader that the PSM predicts the proxies from the prior data, which is not only out of sample but is also entirely external to the calibration data set since it comes from a simulation from a climate model for a different period of time. Therefore, there is no a priori training of the PSMs that can artificially induce “known” trends.

5 Multivariate Reconstruction Example A fundamental advantage of the PDA approach is to enable dynamically consistent reconstructions of multivariate states from the proxy record. We demonstrate this by examining multivariate fields for a specific historical event: the mystery volcanic eruption of 1808/1809 [Guevara‐Murua et al., 2014]. The global 2 m air temperature and 500 hPa geopotential height field are shown in Figure 14 for 1808–1813, with anomalies relative to the 10 year mean preceding 1808. Warm conditions are evident during 1808, particularly in the eastern tropical Pacific ocean and northern Europe, with mainly weak 500 hPa geopotential height anomalies except near Antarctica. Global cooling takes place in 1809, with locally stronger cooling over Canada where 500 hPa geopotential heights are anomalously low. Cooling intensifies in 1810, particularly over the northern portion of Northern Hemisphere land areas. It appears that the regions of eastern North America were largely unaffected on these timescales, with largest anomalies over western North America. The 500 hPa geopotential height field is dominated by an extratropical wave train closely resembling the Pacific‐North America (PNA) pattern [Wallace and Gutzler, 1981], which is known to respond to thermal forcing in the tropical Pacific Ocean [e.g., Hoerling and Kumar, 2002]. This pattern persists and slowly evolves through 1813, with locally intensified cooling over most land areas by this time. Figure 14 Open in figure viewer PowerPoint Example of multifield reconstruction for a specific event: the volcanic eruption of late 1808. Colors show 2 m air temperature anomalies (units: K), and contours show 500 hPa geopotential height anomalies (contours every 5 m; negative values dashed). Anomalies apply to the 10 year mean prior to the eruption and to the MLOST/CCSM4 reconstruction. The global‐mean temperature evolution during the 30 year period surrounding the 1808/1809 eruption clearly shows the abrupt cooling in 1809, with reinforced cooling around 1815 and 1820 before recovering around 1823 (Figure 15). Presumably, the 1815 cooling enhancement is due to the eruption of Mount Tambora, which from this analysis appears to simply reinforce the cooling event established by the 1808/1809 eruption. Also shown in Figure 15 is the uncertainty from both the ensemble (shading) and a sample over experiments using different combinations of calibration data for the PSMs and prior data in the reconstruction (colors). It is interesting that little uncertainty exists prior to 1808, with all of the reconstructions closely clustered. Uncertainty increases rapidly following the 1808/1809 eruption, with the largest spread of about 0.3 K apparent about 5 years after the eruption. Figure 15 Open in figure viewer PowerPoint Sensitivity of global‐mean 2 m air temperature anomalies to the calibration and prior data (shown in the caption as calibration/prior). Anomalies are relative to 10 year period prior to the 1808 eruption, which is denoted by year 0 on the abscissa. Uncertainty in the spatial fields from the range of calibration/prior reconstructions shown in Figure 15 indicates that the primary features for 1810 are robust (Figure 16). Specifically, all reconstructions have a PNA pattern in the 500 hPa geopotential height field, although they differ in the amplitude of the ridge over the North Pacific ocean and the trough over western North America. Although the 2 m air temperature field corresponds to a roughly equivalent barotropic response with the 500 hPa geopotential height field, it is interesting to note that the anomalies in the 2 m air temperature field with the North Pacific ridge are smaller that for the North American trough, as might be expected for the land‐ocean contrast associated with these features. Comparing the reconstructions that are warmest (HadCRUT4/MPI) and coldest (GISTEMP/CCSM4‐LM) in the global‐mean 2 m air temperature suggests that, at least for the Northern Hemisphere, these differences are associated with relatively minor changes in the spatial pattern. Figure 16 Open in figure viewer PowerPoint Sensitivity of global‐mean 2 m air temperature (units: K) and 500 hPa geopotential height anomaly fields (contours every 5 m; negative values dashed) to the calibration and prior data (shown in the titles as calibration/prior). All panels pertain to the year 1810.

6 Concluding Summary The purpose of this paper was to document an approach to paleoclimate climate field reconstruction using a data assimilation technique that weights data from proxies against climate model simulations and to verify the assimilated reconstructions from this approach primarily during the instrumental era. The approach involves linear, univariate proxy system models and an “offline” (no cycling) approach to data assimilation. Both of these approximations are chosen to establish a baseline for reconstruction skill as a reference to measure future improvements by generalizing these approximations. We performed reconstructions using proxy data drawn from the PAGES2K [PAGES2K Consortium, 2013] proxy data set and CMIP5 climate model simulations. Reconstructions are compared to previous reconstructions of Northern Hemisphere mean 2 m air temperature, and a sample of reanalysis products in both space and time. For the control reconstruction, the PSMs are fit against the MLOST temperature field, and a 100‐member prior ensemble is drawn randomly from the CCSM4 Last Millennium simulation. When compared to previous reconstructions of Northern Hemisphere mean 2 m air temperature for 0–2000 C.E., we find agreement in gross aspects, such as millennial‐scale cooling, but also that many of the previous reconstructions are more similar to each other than they are to the LMR. This is due, perhaps, to the fact that the previous reconstructions share methods that are more similar to each other than they are to the LMR. Verification against six commonly used reanalysis products at annual resolutions reveals greatest skill in the global mean, with less skill in the spatial field, particularly over North America and Eurasia. Verification against independent proxy records withheld from data assimilation indicate improvement over the baseline skill of the prior. Reconstructions for 16 combinations of calibration and prior data show little sensitivity in the results. This is true for both the time series of global‐mean 2 m air temperature and for spatial patterns of geopotential height and temperature during the 1808/1809 volcanic eruption used as an example. The success of the LMR in reproducing persistent cooling, and robust geopotential height anomalies, following a volcanic eruption is an important demonstration of the ability of the PDA approach to capture specific climate fluctuations and therefore offers confidence in the LMR reconstructions over the duration of the 2000 year proxy data set. The largest impact on the results is obtained when the linear PSMs are fit on detrended data, both for proxy and calibration data. In this case, the reconstructed global‐mean 2 m air temperature has a much reduced trend and decadal variability, which is a result of less weight on the proxies during data assimilation due to relatively larger errors for the PSM calibration. Given that the control, calibrated including the trend, shows substantial skill when the trend is removed (Figure 3) suggests that the trend simply provides greater signal‐to‐noise ratio when calibrating the PSMs. Having established a baseline LMR reconstruction, we will focus future efforts on using a vastly expanded proxy network, adopting more comprehensive PSMs such as PRYSM [Dee et al., 2015], accounting for seasonal dependencies, using isotope‐enabled GCMs as priors, and cycling data assimilation.

Acknowledgments This research was supported by grants from the National Science Foundation (grant AGS‐1304263 to the University of Washington) and the National Oceanic and Atmospheric Administration (grant NA14OAR4310176). Climate simulations were generated as part of the Paleoclimate Model Intercomparison (PMIP3) project. We acknowledge the World Climate Research Programme's Working Group on Coupled Modelling, which is responsible for CMIP. For CMIP the U.S. Department of Energy's Program for Climate Model Diagnosis and Intercomparison provides coordinating support and led development of software infrastructure in partnership with the Global Organization for Earth System Science Portals. CMIP data used in this paper may be obtained from the Earth System Grid Federation at http://esgf.llnl.gov/. Access to data and software related to research in this paper are published at the doi: 10.17911/S9WC7N. Comments and suggestions from two anonymous referees were helpful in revision and are gratefully acknowledged. We also thank the LMR advisory panel, Kim Cobb, Kevin Anchukaitis, Gil Compo, Michael N. Evans, and Thorsten Kiefer, for their guidance and suggestions.

Supporting Information Filename Description jgrd53021-sup-0001-supplementary.pdfPDF document, 4.6 MB Supporting Information S1 Please note: The publisher is not responsible for the content or functionality of any supporting information supplied by the authors. Any queries (other than missing content) should be directed to the corresponding author for the article.