This analysis estimates uncertainty in the NOAA global surface temperature (GST) version 5 (NOAAGlobalTemp v5) product, which consists of sea surface temperature (SST) from the Extended Reconstructed SST version 5 (ERSSTv5) and land surface air temperature (LSAT) from the Global Historical Climatology Network monthly version 4 (GHCNm v4). Total uncertainty in SST and LSAT consists of parametric and reconstruction uncertainties. The parametric uncertainty represents the dependence of SST/LSAT reconstructions on selecting 28 (6) internal parameters of SST (LSAT), and is estimated by a 1000-member ensemble from 1854 to 2016. The reconstruction uncertainty represents the residual error of using a limited number of 140 (65) modes for SST (LSAT). Uncertainty is quantified at the global scale as well as the local grid scale. Uncertainties in SST and LSAT at the local grid scale are larger in the earlier period (1880s–1910s) and during the two world wars due to sparse observations, then decrease in the modern period (1950s–2010s) due to increased data coverage. Uncertainties in SST and LSAT at the global scale are much smaller than those at the local grid scale due to error cancellations by averaging. Uncertainties are smaller in SST than in LSAT due to smaller SST variabilities. Comparisons show that GST and its uncertainty in NOAAGlobalTemp v5 are comparable to those in other internationally recognized GST products. The differences between NOAAGlobalTemp v5 and other GST products are within their uncertainties at the 95% confidence level.

The rest of the paper is arranged as follows. The datasets used for uncertainty estimation and comparison are described in section 2 . The methodology and estimation of SST and LSAT uncertainties are described in sections 3 and 4 respectively. The GST uncertainty in NOAAGlobalTemp v5 is quantified based on SST and LSAT uncertainties and compared with the GST uncertainties in the other products in section 5 . A summary and conclusions are given in section 6 .

The focus of this paper is to estimate the uncertainty in NOAAGlobalTemp v5 and to compare its uncertainty with other products. Total uncertainty in NOAAGlobalTemp v5 consists of parametric and reconstruction uncertainties. The reconstruction uncertainty represents the residual errors due to using a limited number of empirical orthogonal teleconnection ( van den Dool et al. 2000 ) modes in SST and LSAT reconstructions, and how well the retained EOTs span the variations at any given time. The parametric uncertainty represents the dependence of SST and LSAT reconstructions on randomly selecting the optional values of the internal parameters of SST and LSAT.

The debates regarding recent GST warming trends are partly caused by the selection of starting and ending year of the hiatus ( Medhaug et al. 2017 ). However, the uncertainty in the GST products may also impact the significance of the warming trends. For example, the GST difference between NOAAGlobalTemp v5 and HadCRUT4 is mostly associated with differences in SSTs that are associated with the uncertainty of SST bias correction ( Huang et al. 2015 ). Therefore, it is important to quantify the GST uncertainty when a GST product is generated.

The evolution of globally averaged GSTs described in those studies is qualitatively similar for periods since the 1880s even though different methods are used in data interpolation and bias correction. For example, all analyses indicated that GST experienced a cooling over the 1880s–1910s, a warming over the 1910s to 1940s, a slight cooling over the 1940s to 1970s, and an enhanced warming over the 1970s to 2010s. Note that in the 1940s there was a switch in the source of available observations and a corresponding rapid shift in observational bias. Quantitatively, there are some differences in trend magnitude in the above periods among these products, particularly for the period 1998–2012 ( IPCC 2013 ; Karl et al. 2015 ; Fyfe et al. 2016 ; Lewandowsky et al. 2016 ; Medhaug et al. 2017 ; Rahmstorf et al. 2017 ), which led to questions regarding a change, slowdown, or cessation of the general warming trend observed since the 1970s.

LSATs from coupled model simulations are used as pseudo-observations to assess reconstruction uncertainty ( section 4d ). The model simulations are independent and provide spatially complete analyses. These model simulations with different resolutions are box-averaged to the monthly 5° × 5° grids of NOAAGlobalTemp and used to estimate the reconstruction uncertainty. To assure that the estimated reconstruction uncertainty is not sensitive to the selection of model simulations, three model simulations are tested ( Table 1 ):

The monthly 5° × 5° LSAT data, derived from the European Centre for Medium-Range Weather Forecasts (ECMWF) interim reanalysis (ERA-Interim; Table 1 , second row) from 1982 to 2011 ( Dee et al. 2011 ), are used to train LSAT EOTs. Original resolutions of ERA-Interim are daily, approximately 0.75° in longitude and latitude, and 60 levels. The air temperature at the lowest level (2 m) over the land is used as LSAT. ERA-Interim is an observation-based reanalysis forced by the SST from HadISST1 ( Rayner et al. 2003 ) and the Operational SST and Sea Ice Analysis (OSTIA) system ( Stark et al. 2007 ; Donlon et al. 2015 ) as a low boundary condition, which is methodologically independent of the LSAT from NOAAGlobalTemp.

The monthly 2° × 2° Optimum Interpolation SST (OISST) data ( Table 1 ) are used to train SST EOTs ( Table 2 , 23rd row) and used as pseudo-observations to derive SST reconstruction uncertainty ( section 3d ). The monthly OISST is derived from the weekly 1° × 1° OISST (wOISST) from 1982 to 2011 ( Reynolds et al. 2002 ), which is consistent with those used in the previous versions of ERSST (v4 and v3b). To better represent the interannual variations of SST, different SST data combinations are selected to train the EOTs as described in parameter 23 in Table 2 and appendix A . The weekly data are first interpolated to the daily data time scale, and then the daily data on 1° × 1° grids are averaged to monthly data on 2° × 2° grids. The OISST data are observation-based estimates that are methodologically independent of ERSSTv5. Both OISST and ERSSTv5 include in situ data. The OISST includes satellite-derived data while ERSSTv5 does not.

The GST anomaly in GISTEMP ( Hansen et al. 2010 ) is on monthly 2° × 2° grids from 1880 to 2017. The anomaly is relative to a climatological mean over 1951–80, which is rescaled to 1971–2000 then box-averaged to 5° × 5° grids. GISTEMP combines LSAT from GHCNm v3 ( Lawrimore et al. 2011 ) with SST from ERSSTv5 ( Huang et al. 2017 ). Regions without observations over the land are filled by the weighted average with weights decaying linearly to zero at 1200 km. Overall, the GST in GISTEMP is closest to NOAAGlobalTemp v5 because the SST component of GISTEMP is the same as that of NOAAGlobalTemp v5.

The GST anomaly in BEST is on monthly 1° × 1° grids from 1850 to 2017. The anomaly is relative to the 1951–80 climatological mean, which is rescaled to 1971–2000 and then box-averaged to 5° × 5° grids. BEST uses LSAT from the Global Historical Climatology Network–monthly (GHCNm) v3 ( Lawrimore et al. 2011 ) and SST from HadSST3 ( Kennedy et al. 2011a , b ). The LSAT observations are homogenized using the “scalpel” method and averaged and interpolated using “Berkeley average” and kriging (or Gaussian regression) algorithms. Overall, the GST in BEST is closest to HadCRUT4 because the SST component of BEST is the same as that in HadCRUT4, although their difference is notable in the recent decade and in the nineteenth century. The uncertainty of BEST includes observational biases and undersampling effects.

SSTs in ERSSTv5 are produced on monthly 2° × 2° grids from 1854 to 2017 ( Huang et al. 2017 ) and used as the oceanic component of NOAAGlobalTemp v5. ERSSTv5 includes available in situ SST observations from ships, buoys, and Argo floats. Satellite-based observations are not included. The ship and buoy SSTs are from the International Comprehensive Ocean–Atmosphere Dataset (ICOADS) Release 3.0 (R3.0) ( Freeman et al. 2017 ). The Argo temperature data of 0–5-m depth are defined as SSTs, and the data are from the Argo Global Data Assembly Centre (GDAC; https://www.seanoe.org/data/00311/42182/ ). A recent study ( Huang et al. 2019 ) suggested that Argo and buoy SSTs are playing an equally important role in SST analyses in the global oceans. Huang et al. (2017 , 2018 , 2019) demonstrated that spatial and temporal variabilities of SSTs are more realistic and more reliable in ERSSTv5 compared with previous versions. Using these observations from ships, buoys, and Argo floats, ERSSTv5 is further processed as follows.

Ship SSTs may contain biases for a variety reasons ( Smith and Reynolds 2003 , 2004 , 2005 ; Kennedy et al. 2011a , b ; Huang et al. 2015 ). Biases in ship SST measurements are corrected in ERSSTv5 using bias-corrected nighttime marine air temperatures (NMATs) from the Hadley Centre (HadNMAT2; Kent et al. 2013 ) as a large-scale measurement frame of reference before 1985. The quantification of the ship biases depends on the region of interest and variance of SST and NMAT. However, ship biases after 1985 are quantified using more accurate and precise buoy SSTs from ICOADS R3.0, which are used to adjust ship biases determined from NMAT. Similarly, biases in Argo SSTs resulting from slight differences in observing depth are corrected according to buoy SSTs based on statistics over 1990–2010. Corrections to ship and Argo SSTs are designed to maximize compatibility with buoy SSTs at a nominal depth of 0.2 m. However, the compatibility may vary when different options of parameters 11–14 ( Table 2 and appendix A ) are selected, particularly in the selection of different NMAT and different bias fitting domains.

The observations from ships, buoys, and Argo floats are merged into superobservations (superobs) on 2° × 2° grids using a weight determined by their signal-to-noise ratios and a maximum number (5–15) of observations ( Reynolds and Smith 1994 ; Reynolds et al. 2002 ). Annually averaged superobs are first calculated with a minimum of 1–3 months within a year. The reconstruction itself consists of low- and high-frequency components. The low-frequency (LF) component of annually averaged superobs is determined by a running window of 11–19 yr and a spatial window of 25° × 25°. The missing values in superobs fields are filtered out by filling the running average when the ratio of superobs coverage within 25° × 25° reaches a minimum criterion. The high-frequency (HF) component for each grid box is set as the difference between superobs and LF component, and is further filtered by a 1–3-month filter. Therefore, the separation of LF and HF components involves parameters 15–22 ( Table 2 and appendix A ).

The HF component is decomposed using a maximum of 140 EOTs calculated with different sets of training data that are sensitive to the EOTs ( Huang et al. 2017 ). These EOTs are fitted using different weighting methods. Not all 140 EOTs are used to reconstruct the HF component of SSTs. The final selection of EOTs is based on a minimum criterion of EOT variance supported by superobs. The HF decomposition is sensitive to the selections of parameters 23–25 ( Table 2 and appendix A ).

Options of these 28 parameters in ERSSTv5 are first determined by perturbing parameter values by 10%–100% ( Table 2 ). These options are then randomly selected in generating a 1000-member ensemble of SST anomalies (SSTAs; referenced to the 1970–2000 climatological mean). The ensemble is finally used to quantify the parametric uncertainty.

Most of these 28 parameters in processing ERSSTv5 in sections 3a(1) – 3a(5) are the same as those used in estimating the uncertainty of ERSSTv4 ( Huang et al. 2016 ). New parameters are added here for ERSSTv5 in association with Argo SSTs ( Table 2 , rows 9, 14, and 16) and the readjustment of ship SSTs according to buoy SSTs ( Table 2 , row 13). The NMAT selection parameter ( Table 2 , row 11) used in correcting ship SSTs is revised by adding a 25° × 25° running domain in ERSSTv5. In addition, more parameter options are considered here for the EOT modes ( Table 2 , row 23).

In the high latitudes, the reconstructed SST should be consistent with the freezing point of seawater in regions covered with sea ice. Sea ice concentration is derived from satellite observations and may differ around 10% among available ice concentration products. In the area of mixed open water and sea ice, the final SST is interpolated between reconstructed SST and seawater freezing point of −1.8°C. The range of ice concentration is given by a minimum and maximum concentration ( Reynolds et al. 2002 ; Smith et al. 2008 ). Therefore, options for parameters 26–28 ( Table 2 and appendix A ) may impact the final SST in the region covered with sea ice.

The uncertainties in the early versions of ERSSTv3b and NOAAGlobalTemp v3 ( Smith et al. 2008 ; Vose et al. 2012 ) consist of low-frequency uncertainty, high-frequency uncertainty, and bias uncertainty. In the ocean component, the low-frequency uncertainty was estimated using SST variances from a coupled model simulation. The high-frequency uncertainty was estimated using SST variance difference between OISST and ERSST. The bias uncertainty was estimated using the absolute difference of SST biases between ERSSTv3b and HadSST3. These uncertainty estimations in ERSSTv3b and NOAAGlobalTemp v3 was reasonable by applying ERSSTv3b bias correction, which generally decrease with time. However, when the updated SST bias correction of ERSST v4 and v5 was applied, the bias uncertainty and therefore the total uncertainty was large between the 1920s and 1960s and smaller before the 1920s and after the 1960s. In particular, there are no clear reasons to explain why the uncertainty increased from the 1880s to the 1920s. In addition, the estimation of the uncertainties in ERSSTv3b and NOAAGlobalTemp v3 was very much dependent on OISST and model simulations, but not much dependent on ERSST or NOAAGlobalTemp themselves. Therefore, the uncertainty algorithms are updated in ERSSTv4, ERSSTv5, and NOAAGlobalTemp v5 as in the following subsections.

Local SST uncertainties from Eq. (1) can be area-averaged into regional- and global-scale uncertainties, which allows typical grid box uncertainties in different regions to be compared easily. However, these area-averaged uncertainties are generally much larger than the uncertainty associated with an area-averaged SST such as the global oceans:

We want to clarify that the differences between ε r and ε p in Eqs. (1) and (5) are in two aspects. First, the estimation of ε r uses pseudo-observations that have a complete spatial and time coverage, while the estimation of ε p uses in situ observations that do not have a complete spatial and time coverage. Second, ε p is quantified as a root-mean-square difference (RMSD) between ensemble members and ensemble average, while ε r is quantified as a RMSD between ensemble members and pseudo-observations. In principle, ε p is associated with the uncertainty of temperature when temperature sampling changes with space and time, while ε r is associated with the residual error that cannot be resolved by a set of limited EOTs.

where superscript g represents a global average. In section 3d , ε r is quantified using OISST described in section 2a , which is very similar when different model simulated data are used as shown in Huang et al. (2016) .

The intention of applying EOT decomposition in SST reconstruction is to filter out potential noise or random errors in observations and to interpolate available observations to data-void areas using the spatial covariance spanned by the set of EOTs. However, the EOT decomposition with a limited number of modes can bring about residual errors even if observations were perfect (free of noise or random errors and 100% area covered over the entire global oceans). The residual between perfect observations and their EOT decomposition is defined here as reconstruction uncertainty (ε r ), which is the variance not spanned by the set of EOT modes used. As in Huang et al. (2016) , pseudo-observational datasets (e.g., gridded products from model simulations, reconstruction analysis, or reanalysis) are used to generate another set of 1000-member ensemble and ε r of local SST is assessed as follows:

It should be noted that the total uncertainty in Eq. (7) does not explicitly include the sampling uncertainty since it has implicitly been included in parameter uncertainty as discussed in Huang et al. (2016) . When the sampling of observations is complete in space and time and the quality of observations is perfect, the parametric uncertainty will approach zero and the total uncertainty will approach the reconstruction uncertainty.

where the time variable t in ε p and ε t represents time in month and year, and time variable t 0 in ε r represents monthly mean from January to December. The averaged (1982–2017) ε r is used in Eq. (7) since its variability over time is very small.

The parametric uncertainty (ε p ) in ERSSTv5 quantified in Eqs. (1) and (2) is generally larger in the earlier period (say 1854–1900; Fig. 1a) than the later periods of 1900–50 (Fig. 1b) and 1950–2010 (Fig. 1c). This is because with denser sampling in the more recent decades the analysis is less sensitive to the details in the parameter settings (Huang et al. 2017). In the earlier period (Fig. 1a), ε p is 0.6°–0.8°C in the northwestern North Pacific, the northwestern North Atlantic, the eastern equatorial Pacific, and the eastern equatorial Atlantic, which is largely associated with low observation coverage and/or strong SST variability in these areas. In contrast, ε p is less than 0.4°C in other regions, particularly in the Arctic and the Southern Ocean. The smaller uncertainty in the Arctic and the Southern Ocean does not necessarily mean that the analysis is accurate, but only implies that the analysis is less sensitive to the changes of the 28 internal parameters in those regions. Dominant factors for the small uncertainty are that the areas are often covered with sea ice and therefore SSTs are less variable. Further, the observations over these regions are extremely sparse, leaving the reconstructed SSTA persistently near zero.

As observation coverage increases, ε p decreases in the northwestern North Pacific, the northwestern North Atlantic, the eastern equatorial Pacific, and the eastern equatorial Atlantic (Figs. 1b,c). In the later period (Fig. 1c), ε p is relatively large (approximately 0.2°C) in the Southern Ocean due to sparse observational coverage over the area. With no observations ε p is low because the analyzed anomaly is always near zero, which implies that there may be an uncertainty component that is not fully accounted for and can be explained by the structural uncertainty (Kennedy 2014). With dense observations ε p is low because the dense sampling makes parameter details less important. With few observations ε p can be larger since details of the parameter settings have more impact on the analysis. Averaged over the global oceans (Fig. 2a; solid red), ε p is approximately 0.4°C before 1880 and decreases gradually to less than 0.2°C after 1960. There are two spikes of ε p (0.4°C) in the short periods of the two world wars in the later 1910s and early 1940s, respectively, due to low observation coverage.

Fig . 2. View largeDownload slide (a) Globally averaged parametric uncertainty (1σ) of local SST, and (b) parametric uncertainty (1σ) of globally averaged SST in ERSSTv5 (solid red), ERSSTv4 (solid black), and EOTv4 (same as ERSSTv5 except for using EOTs from ERSSTv4; dotted green). A 12-month running filter is applied in plotting. Fig . 2. View largeDownload slide (a) Globally averaged parametric uncertainty (1σ) of local SST, and (b) parametric uncertainty (1σ) of globally averaged SST in ERSSTv5 (solid red), ERSSTv4 (solid black), and EOTv4 (same as ERSSTv5 except for using EOTs from ERSSTv4; dotted green). A 12-month running filter is applied in plotting.

In contrast, ε p of the globally averaged SST in ERSSTv5 quantified in Eqs. (3) and (4) is much smaller than that of local SST, which is approximately 0.08°C before 1880 and decreases to approximately 0.04°C in the period of 1950–80 and less than 0.02°C after 1980 (Fig. 2b; red solid line). The ε p of the globally averaged SST is much smaller than that of the local SST over the global oceans, because the uncertainty of the local SST largely cancels when averaged over the global ocean.

The term ε p of the local SST in ERSSTv5 (Figs. 1a–c) is mostly consistent with that in ERSSTv4 (Figs. 1d–f) over the global oceans, because internal parameters and their selections are mostly the same in ERSSTv5 and ERSSTv4. The difference is that the magnitude of ε p is approximately 0.1°C smaller in ERSSTv5 (Fig. 2a, solid red) than in ERSSTv4 (solid black) before the 1900s and in the late 1910s and early 1940s. Similarly, ε p of the globally averaged SST is 0.02°–0.04°C smaller in ERSSTv5 (Fig. 2b, solid red) than in ERSSTv4 (solid black) before the 1940s and after the 2000s, although their temporal evolutions are very consistent.

Tests show that these differences of ε p between ERSSTv5 and ERSSTv4 are largely associated with the changes in EOTs. EOTs in ERSSTv4 are damped to zero north of 65°N and south of 60°S (Huang et al. 2015), while EOTs in ERSSTv5 are not (Huang et al. 2017). The purpose of damping the EOTs in the high latitudes was to avoid a potential overshooting of observed SSTs from lower latitudes to high latitudes. Tests showed that the damping completely removes the impact of observations in high latitudes, which should be avoided as observations in high latitudes increase rapidly in the modern time period (Huang et al. 2017). The use of nondamped EOTs enables a more reliable analysis that is not sensitive to the selections of the other parameters in ERSSTv5. This is why ε p is lower in ERSSTv5 than in ERSSTv4. When the EOTs in ERSSTv4 (EOTv4) are used in the ERSSTv5 system while other parameter selections in ERSSTv5 are held constant (Fig. 2), ε p values of both local and globally averaged SSTs in EOTv4 (dotted green) become close to that in ERSSTv4 (solid black).

However, the change of EOTs cannot explain why ε p of the globally averaged SST is lower in ERSSTv5 in the periods of 1920–40 and 1990–2017 (Fig. 2b, dotted green and solid black). The lower ε p over 1920–40 and 1990–2017 in ERSSTv5 is more consistent with overall decreasing uncertainty due to increasing observation coverage, while ε p in ERSSTv4 increases in these two periods. To detect the causes for lower ε p in these periods in ERSSTv5, the 1000 ensemble members were grouped according to the selections of a specific parameter. There are typically three potential options for each parameter value, and therefore the size of each group is approximately 333. Uncertainties within each group were calculated for every parameter in both ERSSTv5 and ERSSTv4 as shown in Eq. (3).