Defining marine heatwaves

We identified MHWs from daily SST time series and calculated metrics to characterise their frequency, intensity and duration. A MHW was defined following ref. 1 as a discrete prolonged anomalously warm water event. ‘Discrete’ was defined quantitatively as an identifiable event with recognisable start and end dates, ‘prolonged’ meant a duration of at least 5 days and ‘anomalously warm’ was defined by reference to a baseline, seasonally varying threshold. Heatwave events were found by identifying periods when daily temperatures were above the seasonally varying 90th percentile (threshold) for at least five consecutive days. Two events with a break of less than 3 days were considered as a single event. The 90th percentile was calculated for each calendar day using daily SSTs within an 11-day window centred on the date across all years within the climatology period, and smoothed by applying a 31-day moving average. The choices of an 11-day window and a 31-day moving average were motivated to ensure sufficient sample size for percentile estimation and a smooth climatology1. A seasonally varying threshold allows identification of anomalously warm events at any time of the year, rather than events only during the warmest months.

Each MHW event had a set of properties including duration and several measures of intensity (see ref. 1 for more discussion). We considered metrics for duration (time between start and end dates) and maximum intensity (maximum temperature anomaly, relative to the seasonally varying climatological mean, over the event duration). We then calculated time series of the annual average duration and annual maximum intensity across events in each year, and also frequency (defined as the number of events occurring in each year) and the total number of MHW days in a year. Note that when calculating the annual statistics of events which occur across several years, the duration and intensity are assigned to the start year of that event. The statistical significance of the long-term trends presented in the main text (described below) is not sensitive to the choice of percentile threshold (90th, 95th and 98th percentiles tested; not shown), and therefore, our conclusions about changes in MHW properties are robust to the choice of threshold.

This definition builds on recent developments in defining atmospheric heatwaves52 and was developed further and adapted for the marine environment by a cross-disciplinary team consisting of atmospheric scientists, oceanographers and marine ecologists1. The MHW definition as used in this manuscript is available as software modules in Python (http://github.com/ecjoliver/marineHeatWaves) and R (https://github.com/ajsmit/RmarineHeatWaves).

Daily, global, remotely sensed SSTs covering 1982–2016

We used the National Oceanic and Atmospheric Administration (NOAA) Optimum Interpolation (OI) Sea Surface Temperature (SST) V2.0 high-resolution gridded data set37,38 to derive and calculate MHW properties globally. This data set was derived from remotely sensed SST by the advanced very high resolution radiometer (AVHRR), and it should be noted that retrievals from high-cloud covered areas may need to be treated with caution53,54. The data have been interpolated daily onto a 0.25° × 0.25° spatial grid with global coverage from 1982 to 2016. We calculated MHW properties and annual time series for the period 1982–2016 from the raw daily SSTs at each grid point globally, and used a 30-year subset (1983–2012) to define the baseline climatology and 90th percentile threshold. For each of the annual MHW time series, we calculated (i) a mean value at each grid point, (ii) the difference between the mean values over the first and second halves of the data (1982–1998, 2000–2016) and (iii) a globally averaged (area-weighted) annual time series, including linear trend. Note that the point where we split the data (1999) was chosen to have two equal-sized samples (17 years); the results are not strongly sensitive to this choice. We excluded any grid cells from the analysis which had continuous ice cover duration for longer than 5 days—gaps of 5 days or less were interpolated using the MHW algorithm.

We did not calculate linear trends at each location individually since the time series of MHW metrics have certain properties for which standard trend-fitting techniques may have unusual biases, e.g., they are bounded at a lower value (zero for frequency, 5 for duration) and some are quantised. We instead present differences of the mean value between two time slices, making no assumptions about the distribution of the data. Statistical significance of the time slice differences was determined using a two-sample Kolmogorov–Smirnoff (K-S) test, a non-parametric test allowing for non-normally distributed data. Concerns around the use of linear trends are less relevant for the globally averaged time series; however, ordinary least squares estimates of linear trends may be biased due to the presence of outliers or non-normally distributed data. We therefore calculated linear trends of the globally averaged time series using Theil–Sen (TS) estimates55. A TS estimate of the linear trend is more robust for time series data that are heteroskedastic or have a skewed distribution. Statistical significance of trends was estimated using a 95% confidence interval.

Given the relatively low threshold used, the MHW definition will encompass events from those with low intensity that are likely to have limited impact, through to the most extreme events that would likely have major impacts. Under this criteria, there were typically 1–3 events annually at most locations (Fig. 1a). This definition is consistent with commonly used atmospheric heatwave definitions, and provides sufficient samples of MHW events to provide robust statistics using standard tests.

We removed the influence of ENSO from the SST time series, before the MHW detection, using a statistical approach. We first estimated the ENSO signal at each pixel by regressing daily SSTs onto the multivariate ENSO index (MEI56) and subtracted the linear prediction based on this model. We included monthly leads and lags of the MEI up to ±1 year, into a multiple linear regression model. The MEI is defined monthly as a 2-month average (Dec–Jan, Jan–Feb, etc.) and we assumed the monthly values to be centred on the middle of second month. While this will implicitly impose a 15-day lag on the MEI with respect to the true central date, which should be the end of the first month/beginning of the second month, we are not greatly concerned with how the regression is distributed across lags and we do not expect this 15-day difference to impact our ±1 year lead-lag analysis. We then identified the MHWs from the ENSO-less SST time series as above, except we used the climatological mean and threshold based on the original SST (i.e., that which includes ENSO). The original climatology was used because what we consider MHWs, and what ecosystems are adapted to, are based on the real-world threshold and we are interested in how ENSO changes the frequency, intensity and duration of exceedances over this threshold. Note that this method has two assumptions: (1) a linear relationship between the MEI and SST and (2) that the MEI captures all of the ENSO signal. The MEI was chosen as it takes into account the full-scale atmospheric and oceanic response to ENSO. Both assumptions lead to an imperfect model and therefore a small residual ENSO signal may remain in the data.

Daily in situ sea temperature from six monitoring stations

We obtained daily in situ measurements of ocean temperature at six century-scale monitoring stations (see Table 1). We were unaware of any other stations with daily measurements over a period of more than 80 years, with few missing values or large data gaps. If individual daily records were missing, we interpolated over gaps of up to 5 days (the minimum threshold for MHW duration1) and otherwise flagged the entire year of data as missing. The data for Scripps Pier were bias-corrected by −0.45 °C post-1988 as recommended in ref. 57. MHW properties and annual time series and linear trends were calculated as for the NOAA OI SST data with the same 1983–2013 period used to define the baseline climatology and threshold. We calculated differences of mean MHW properties between the earliest and latest 30-year periods shared across all stations: 1925–1954 and 1984–2013.

Proxies for MHW properties based on monthly SSTs

In the absence of centennial-scale global gridded daily SST data, monthly averaged gridded SSTs can be used to evaluate changes in certain MHW characteristics globally in a longer temporal context. Monthly SSTs cannot resolve many individual MHWs directly, since many events are shorter in duration than 1 month. However, we have used monthly SSTs to develop proxies for annually aggregated MHW properties. This was done using both the long station time series and global monthly SST data sets. We developed proxies by selecting the MHW metrics we wished to predict (annual frequency, duration, intensity) and the set of variables which may be possible predictors (annual mean SST, annual maximum SST, annual count of months above a threshold, etc). Generalised linear models were trained on the long station series over the post-1982 period, and validated over the pre-1982 period. This was used to select which variables should be used as predictors for the annual MHW metrics. Then, these variables were used to train proxy models on the post-1982 MHW satellite data using predictors derived from the monthly SST analyses. Monthly SST analyses have no daily data with which to validate the model selection—which is why the model selection was performed based on the long station time series, and why the use of the stations is important to the study.

Proxies based on daily station time series

Monthly averages of the daily temperatures at the six stations were calculated and from these data we calculated a monthly climatological mean and 90th percentile across all years. We then calculated eight annual quantities: TMM was the annual mean SST, TMX was the annual maximum monthly SST, TAM and TAX were the annual mean and maximum monthly SST anomaly (relative to the monthly climatological mean), TTM and TTX were the annual mean and maximum anomaly (relative to the 90th percentile threshold) over all months which exceed the 90th percentile, and NA and NT were the annual count of months above the seasonally varying climatological mean and 90th percentile, respectively. Note TAM and TAX (TTM and TTX) had a missing value if NA (NT) was zero in that year.

Annual MHW properties, derived from daily temperature time series (as described above), were modelled using a generalised linear model (GLM)58 with monthly based annual proxies as predictors:

$$g\left( y \right) = \beta _0 + \beta _1x + \varepsilon$$ (1)

where y is an annual MHW property (frequency, intensity, duration), x is one of the proxies listed above, β 0 and β 1 are the regression coefficients and g is the link function. GLMs are not restricted to normally distributed data and y is assumed to be distributed according to the exponential family (which includes the normal and Poisson distributions). Models were trained over the available data at each station since 1982 and fit using iteratively reweighted least squares. The fitted models were used to predict MHW properties over the entire time series and validated using the independent data prior to 1982. We tested each of the eight proxies as predictors for each of the three MHW properties (see Supplementary Note 1).

Distributions and link functions were chosen as follows. MHW frequency and duration are count data and were modelled by a Poisson distribution, intensity was approximately normally distributed and was modelled by that distribution accordingly. The proxies NA and NT could also be considered count data and an identity link function was used when predicting frequency or duration, a log link function otherwise. For the remaining proxies, which were approximately normally distributed, a log link function was chosen when predicting frequency or duration, an identity link function otherwise. Proxies were found for frequency and duration, but intensity was too poorly predicted to provide any proxy (see Supplementary Note 1, Supplementary Table 1 and Supplementary Fig. 6).

Proxies from the global data sets of monthly SSTs

To examine MHWs globally over centennial time scales, we used the Hadley Centre Sea Ice and SST v1.140 (HadISST), NOAA Extended Reconstructed SST v541 (ERSST), Centennial in situ Observation-Based Estimates 2 SST42 (COBE), Coupled European Centre for Medium-Range Weather Forecasts (ECMWF) ReAnalysis 20C SST43 (CERA-20C) and Simple Ocean Data Assimilation si.3 SST44 (SODA) data sets to derive MHW proxies from 1900 to 2016. Note that HadISST, ERSST and COBE end in 2016 while CERA-20C ends in 2010 and SODA ends in 2013. The HadISST, ERSST and COBE data sets consist of monthly gridded data in which in situ observations (and remotely sensed satellite observations, after 1982) have been interpolated onto a regular horizontal grid (1° × 1° for HadISST and ERSST; 2° × 2° for COBE). The CERA-20C data set is a global coupled reanalysis which relaxes the air-sea interface towards HadISST v2 SSTs and data have been obtained on a 1° × 1° horizontal grid (note that version 2 of HadISST was not directly available at the time of writing, which is why we used version 1 above). The SODA data set is a global ocean reanalysis that assimilates individual observations from the International Comprehensive Ocean-Atmosphere Data Set (ICOADS) Release 2.5 and provides data on a 1/2° × 1/2° horizontal grid. We calculated the eight proxies defined in Section 4.1 at each location and fitted the models by Eq. 1 to the NOAA OI SST annual MHW properties, area-averaged over the above-listed data sets’ grid cells, from 1982 to 2016. Results demonstrated that monthly average temperatures could be used as proxies for MHW count (through NT), duration (through TAX) but not for intensity (see Supplementary Note 1, Supplementary Figs. 7–8).

We used the full record of global, gridded, monthly SST data to reconstruct MHW frequency and duration and to analyse centennial-scale trends in those properties. We calculated annual mean MHW frequency and duration from 1900 to 2016. We used the product of MHW frequency and duration to calculate the total number of MHW days per year. Due to the sparse coverage and large uncertainties in early years35, we restricted the analysis period to data since 1900; for the remaining data, any year in which at least one month of data was missing was marked as missing. We also omitted from the data sets any grid cell which repeated the seasonal cycle at least 10 times which is common in the Southern Ocean due to low data coverage. Correlations between the annual MHW metrics from the long station time series and the proxies indicate good agreement regionally over the overlapping time periods (Supplementary Fig. 9). The correlations shown in Supplementary Fig. 9 have been averaged across the five proxy data sets using the Fisher’s z-transformation technique of ref. 59.

We calculated spatial maps of the mean and the difference between the 1925–1954 period (consistent with the early period chosen for the long station data) and the 1987–2016 period (the last 30 years in the time series), as well as a globally averaged annual time series and its time slice difference. Prior to this the five global proxy data sets were regridded onto a shared grid (2° × 2°) and then an average across all data sets was performed (data set mean). Globally averaged time series from the proxies were adjusted to have the same mean value as the NOAA OI SST time series over the 1982–2016 period. Note that for years in which data were missing over more than half of the area to be averaged, global average time series values were not calculated and were marked as missing data. Globally averaged time series and trends were not strongly sensitive to the exclusion of ENSO (see Method below), unlike the 35-year satellite-derived daily SST data. This suggested that longer time series reduced the influence of interannual climate variations by exposing more of the longer-term trend.

SST in parts of the ocean, e.g., the Southern Ocean, the South Indian Ocean, and much of the Pacific Ocean, were sparsely observed prior to the mid-twentieth century. The global monthly data sets are statistically or dynamically interpolated and so provide data for all available space and time ocean grid cells. However, the Hadley Centre SST data set60,61 (HadSST3, v3.1.1.0) is not global in coverage: rather than interpolating over all space and time coordinates it consists of spatial means within 5° × 5° bins, leading to missing values in the absence of data. We tested the impact of spatial bias by examining global trends in MHW proxies derived from HadSST3 data. We rejected any years in which there were missing months from the HadSST3 data analysis and we also calculated the global averages only for years in which at least half of the ocean surface consisted of valid data. Despite these restrictions: the global data sets (presented in Fig. 5), which may be subject to artefacts from having values in regions with no observations, and HadSST3 provided consistent global trends in MHW properties (Supplementary Fig. 10). We have considered this, along with the inclusion of the long in situ station data and the NOAA OI SST data since 1982, as a multi-data set approach.

We removed the influence of ENSO, the Pacific Decadal Oscillation (PDO62) and the Atlantic Multidecadal Oscillation (AMO63) from the MHW proxies using a statistical approach. First, we fitted the proxy model to the original SST data. Then, we generated proxies with this fitted model, using as predictors the SST time series where the components linearly related to the MEI, PDO and AMO were removed (by linear regression, as above). We allowed for monthly leads and lags of the MEI up to ±1 year; the PDO and AMO indices were smoothed with a 5-year running average and only the zero-lag relationships with SST were considered. This method assumes a linear relationship between the climate modes and SST, and also that these indices capture all of the ENSO, PDO and AMO signals. Both assumptions lead to an imperfect model and therefore residual climate variability may remain in the data. We used the NOAA Earth System Research Laboratory monthly unsmoothed AMO index and the NOAA National Centers for Environmental Information monthly PDO index. Note that the MEI and the PDO index were not independent (r = 0.32) and so the component of the PDO index linearly related to the MEI was first removed by linear regression; neither the MEI or the PDO index are significantly correlated with the AMO index, and so there is no need to perform a similar correction to the AMO index.

Marine heatwave proxy uncertainty

We have quantified the uncertainty in the globally averaged proxies by considering two sources: the proxy model and the observations themselves. Model prediction uncertainty was quantified by assuming the model parameters were distributed normally, with a variance given by the uncertainty on the model fit, and performing a Monte Carlo resampling to produce many new predictions from which a 95% confidence interval was calculated. A Monte Carlo sample size of 200 was used for the long time series data, separately for each of the five proxy data sets. This provided a model error variance σ mod,i 2 at each grid cell i. The error on the global mean σ mod 2 was calculated by propagating the point location errors using the following formula, which takes into account the spatial covariance of MHW properties,

$$\sigma _{{\mathrm{mod}}}^2 = \left[ {\mathop {\sum }\limits_i \sigma _{{\mathrm{mod}},i}^2 + 2\mathop {\sum }\limits_i \mathop {\sum }\limits_{j > i} {\mathrm{cov}}(y_i,y_j)} \right]/n^2$$ (2)

where y i is the MHW proxy at grid cell i, cov indicates the covariance between two variables and n is the number of valid grid cells. The model error on the data set-mean time series (Fig. 5b, d, f, shaded areas) was calculated by propagating the σ mod 2 for the five data sets and including the covariance of the time series using a formula equivalent to Eq. 2.

Observational uncertainty was unavailable from any of the five global data sets considered. However, the HadSST3 data set does include a quantification of observational errors. At each location the measurement and sampling uncertainty provided with the HadSST3 data was combined with a Monte Carlo technique to simulate many alternate proxy time series (sample size of 200 at each grid cell). This resulting error on the proxies was defined as the observational error σ obs,i 2. This enabled us to estimate the influence of nonhomogeneity in the variance which might arise due to data errors and sampling density. The uncertainty in the global mean due to observational uncertainty σ obs 2 was calculated as above, and the total error due to both sources σ tot 2 was then calculated as follows:

$$\sigma _{{\mathrm{tot}}}^2 = \left[ {\mathop {\sum }\limits_i \sigma _{{\mathrm{mod}},i}^2 + \mathop {\sum }\limits_i \sigma _{{\mathrm{obs}},i}^2 + 2\mathop {\sum }\limits_i \mathop {\sum }\limits_{j > i} {\mathrm{cov}}(y_i,y_j)} \right]{\mathrm{/}}n^2$$ (3)

which assumes the two sources are independent. Model, observational and total errors for the HadSST3 proxies are shown in Supplementary Fig. 10.

Calculating excess trends in MHW properties

Trends in MHW properties arise due to a trend in the mean SST, a trend in higher-order SST statistics (i.e., variability), or both. The variability of SST at a given location, region or basin can be forced by local to large-scale climate variability, e.g., persistent atmospheric patterns or shifts in large-scale climate modes over century-long time scales under anthropogenic climate change. Changes to variability on centennial time scales have been demonstrated in the atmosphere and are distributed non-uniformly over the globe64,65,66. Changes to ocean variability have been observed, primarily through the mesoscale eddy field as observed through satellite altimetry67,68 and these changes will influence the extremes. For example, off southeastern Australia projections indicate an increase in eddy variability with anthropogenic climate change69 and an associated increase in SST extremes from increases in both the mean and variability of SST70.

We developed a statistical model to simulate trends in MHW properties due solely to a trend in the mean SST. To do so, we simulated a SST time series which assumes its statistical properties (mean, variance, autocorrelation) are stationary in time. This was accomplished using a stochastic climate model based on the concept that ocean temperature variability is a slow dynamical system, a red noise signal, generated by integrating stochastic atmospheric forcing, or white noise71. A model of SST variability was implemented in which the ocean is treated as a motionless mixed layer forced and damped by stochastic surface heat fluxes72. This model has been used successfully to capture SST variability in the North Pacific73,74 and the North Atlantic74,75. This is expressed as a first-order, autoregressive process:

$$T_{t + 1} = aT_t + \varepsilon _t$$ (4)

where T t is daily SST at time (day) t, a is the autoregressive parameter and ε t is a white noise process with zero mean and variance σ ε 2 (ref. 76). The autoregressive parameter can be expressed as a time scale τ = −1/ln(a), in days. The simulated temperature series has a mean of zero with no secular trend. Given a time series of observed SST, after removing the seasonal climatology and linear trend, the model parameters were determined by ordinary least squares regression on the lagged SST with itself. The seasonal climatology was removed to be consistent with the MHW definition; and the linear trend was removed so that we could later prescribe one. The model fit to the daily NOAA OI SST data (1982–2016) can be found in Supplementary Note 2 and Supplementary Fig. 11. Higher-order autoregressive models could be considered but the model selection procedure required was beyond the scope of the present study.

We then generated simulated SST time series at each location by using the spatially varying fitted parameters, random white noise data ε t covering 35 years (i.e., 1982–2016), and specifying a constant linear trend. We then applied the MHW definition to these simulated T t time series and calculated the trends in annual MHW properties. This was undertaken for N ε independent realisations of ε t and from this we calculated a 95% confidence interval on the trends in MHW properties. Note that the statistical properties of the SST time series pertaining to short-term variability remained stable over the entire record, only the mean SST was allowed to vary. Therefore, this confidence interval provided the range of trends that we expect solely from a change in the mean SST itself. We then compared the actual trends in annual MHW properties and if they lay outside this confidence interval we indentified these trends as excess trends (p < 0.05). In order to facilitate this excess trend calculation, we relaxed the restriction on using linear trends at individual grid cells, which was less of a concern over shorter time scales (approx. 30 years), and use the Theil–Sen estimator instead of ordinary least squares to avoid problems of non-normality in the data.

Code availability

The code used to analyse these data and generate the results presented in this study can be obtained from https://github.com/ecjoliver/Global_MHW_Trends (doi: 10.5281/zenodo.1188863). All figures were created using the software package Python, specifically the the matplotlib and basemap modules (https://matplotlib.org/, https://matplotlib.org/basemap/). The coastline data are the Global Self-consistent, Hierarchical, High-resolution Geography (GSHHG) Database (https://www.soest.hawaii.edu/pwessel/gshhg/), which has been released under the GNU Lesser General Public License, and is provided with the basemap Python module.

Data availability

We have used publicly available data only; new data were not generated as a result of this study. NOAA high resolution SST and COBE-SST2 data were provided by NOAA/OAR/ESRL PSD (Boulder, CO, USA) from their website (www.esrl.noaa.gov/psd/). Hadley Centre SST data were provided by the Met Office (UK) from their website (www.metoffice.gov.uk/hadobs). The in situ Norway data were obtained from Jon Albretsen of the Flødevigen Research Station, Havforskningsinstituttet Institute of Marine Research, the UK data from the Isle of Man Government Laboratory, the USA data from the Shore Stations Programme run by Scripps Institution of Oceanography and the Canada data from Fisheries and Oceans, Government of Canada.