Ice freeze and breakup dates

We obtained ice freeze dates for a 572-year period from 1443–2014 for Lake Suwa and a 321-year record of river ice breakup dates for Torne River from 1693–2013 from the National Snow and Ice Data Center54,55. Leap years have been taken into account for both Suwa and Torne.

Ice-freeze dates for Suwa were first recorded in 1443. Ice freeze date, defined as the first date of complete cover, was decided by observers from the shoreline. The name of the family observing ice freeze is provided, although the unique name of the observer is not given but would have included at least 15 generations of observers16. The Shinto Shrine also reported the Omiwatari date of ice ridge formation. Fujiwhara14 wrote that of the various ice phenomena recorded by the Shinto Shrine, first complete ice cover was the most robust14,16. Unfortunately, there are missing data in the midyears of the time series (1505–1515) and more importantly, data from 1682–83 to 1922–23 are considered unreliable for analysis ice cover freeze dates14,16,43,56. In those middle years, various changes in the calendar confused the record, ice cover dates often were indicated as approximate or were not provided even though the lake did freeze over, the group making the observations varied, and Omiwatari date or even the Omiwartari ceremony often were substituted for the ice cover date. We eliminated all data from 1682–1923 from the analyses to reduce the uncertainty in dates of ice freeze14,16,43,56. However, the ice-freeze date between 1443–1682 and 1924–2014, in addition to the presence or absence of lake freeze from 1443–2014 are considered to be very reliable14,16,43,56. For years when more than one data source was available (1897–1993), we numerically compared the values. In almost all years they were the same and if not, the standard deviation between the values between 1944 and 1996 was 2.6516. When they were not the same, we used Arakawa14 over the Suwa Meteorological Observatory and Yatsurugi Shrine from 1443 to 1953; from 1953 to 1993 we used the Suwa Meteorological Observatory over the Yatsurugi Shrine, and from 1994 to 2014 we used Yatsurugi Shrine. There were 3 exceptions to these choices (1950 we used the Suwa Meteorological Observatory; 1952 and 1976 where we used information from Tadashi Arai; Supplementary Table 1). Ice freeze dates occur before and after January 1st, therefore we converted dates to day of year, using a zero to represent the calendar day January 1st.

Ice breakup dates (defined as the general movement of ice) from Torne have been recorded since 1693 and were also converted to day of year. Data were collected from observation journals, newspapers, weather monitoring services in Tornio, Finland, and Haparanda, Sweden, the Finnish Environment Institute, and local guessing competitions17,18 (Supplementary Table 2). Ice breakup data appear to be homogeneous even in the presence of different observers as many years have multiple observers, although the margin of error may be approximately 2.1 days18,19,20. For complete details of the record, please see Kajander, which documents and compares every ice breakup date from a variety of sources, in addition to providing a translation of the original comments provided by the observer18.

Large-scale climate drivers and weather

We obtained data from paleo- and historical records that may be important to ice freeze date on Lake Suwa and ice breakup date on Torne River (Supplementary Data Table 1). We acquired average annual sunspot number from 1700–2012 from the Solar Influences Data Analysis Center57. The average annual sunspot number represents a relative index of solar activity for the visible solar surface based on visual observations from a group of people from different regions around the globe57. Atmospheric carbon dioxide concentrations from 1AD -2012 AD were acquired from proxies of Antarctic ice cores from 1AD - 1957 AD and direct observations from 1958–2012 averaged from Mauna Loa, Hawaii and the South Pole from the Scripps CO 2 program58. We acquired an index of El Niño Southern Oscillation (ENSO) from 1301–2005 derived from the North American Drought Atlas. Thousands of tree-ring records from North America were used to produce an annual database of drought reconstructions and subsequently analyzed using Empirical Orthogonal Functions (EOF) to develop an annual ENSO index59. In addition, we acquired a winter (December-March) North Atlantic Oscillation Index based on a combination of reconstruction and instrumental data from 1659–2013. Between 1659 and 1864, reconstructions of temperature, precipitation, and other proxy data were used based on observations of ice, snow, and tree-ring data60. From 1864–2013, instrumental data were used to develop the NAO index which has been defined as the pressure difference from stations in Ponta Delgada, Azores and Reykjavik, Iceland61.

We acquired air temperatures for both locations. For Suwa, we obtained reconstructed March air temperatures from Kyoto, Japan that were derived from cherry blossom phenology records obtained from diaries of emperors, aristocrats, and monks from 854–199562. The DTS model (DTS: the number of days transformed to standard temperature) was based on cherry blossoming data and used to develop a March mean temperature reconstruction model. An exponential model was developed for March air temperatures using rate of plant development62. For Torne, we acquired reconstructed January to April air temperature data from Stockholm, Sweden for 1500 to 200863. Ledgers documenting the fees and taxes, in addition to diaries and records related to harbour activities in the Stockholm harbour, were used to identify the start of the sailing season each year from which a winter/spring air temperatures were reconstructed63. Leijonhufvud et al. compare, calibrate, and evaluate over 15 sources of data summarizing sailing records in the Stockholm harbour in an effort to validate the reconstruction of January-April air temperatures63.

Trends in ice seasonality

Continuous Segmented Regression

We used segmented regression to test for abrupt changes in the trend of ice dates in Torne. Specifically, we wanted to test when a shift in the temporal trend of ice date may have occurred. To estimate the timing and magnitude of a change in the slope of ice dates, we used continuous segmented regression (CSR) models. In CSR, trend lines on either side of the estimated breakpoint intersect (hence making them “continuous”), but are allowed to have different slopes. In general, a CSR takes the form

is a latent variable representing ice dates, x i are the years of the time series, β 0 is the intercept of the regression (ice date on year 0), β 1 is the trend in ice date prior to any breakpoint (ice date per year), the a k are the breakpoints (k was either 1 or 2 for this study. Because the number of parameters increases with k, we limited k to avoid over-fitting the model), the β k+1 are the changes in the temporal trend at each of the k breakpoints compared to the trend prior to the breakpoint, and the ε i are the errors. Note that the β k+1 parameters indicate the effect on ice date of years elapsed since the previous breakpoint once the breakpoint has passed.

Fitting CSR in Torne (Ordinary Least Squares (OLS) Regression)

The Torne time series began in 1693 and ended in 2013, thus x = 1, 2, … 321. Ice breakup dates for Torne ranged from day 117 to day 160, and the ice melted each year of the time series. For Torne, we fit CSR parameters using ordinary least squares using the lm() function in the statistical programming language R.

Fitting Regressions in Suwa (Tobit)

The Suwa time series was split into an early period (1443–1683) and a late period (1923–2014) (Fig. 1). A breakpoint model was not fit for Suwa; rather, separate linear regressions were fit for the early and late periods of the time series, with the parameters being the intercept and the effect of years elapsed on ice date. The day that Suwa froze ranged from day -24 to day 62 (negative values indicate freezing before January 1st of the designated “year”); however, there were 37 years when the lake did not freeze. Treating no-freeze years as missing data or as a constant date would result in biased results if we employed the regression techniques used for Torne. Thus, calculating trends for Suwa ice dates required a statistical approach distinct from that used in Torne. We can consider the lake an instrument that measures a value which we call ice date. This instrument indicates the favorability of conditions for ice formation, and if we understand the lake instrument to censor measurements at 62, then the no-freeze years can be encoded as ice dates of 62. We consider Suwa as an instrument with output of ice date that is censored at an upper limit, L = 62. As such, the observed y i are related to L and the latent variable in the following manner:

To address this censoring of Suwa ice dates while fitting the parameters, we used a Tobit regression model. For a Tobit regression model with an upper limit (right censoring) of the response variable, the log likelihood of observing data given the parameters β (as in Eq. 1) and σ2 (the variance of ε in Eq. 1), can be calculated as:

where φ(.) and Φ(.) are the probability and cumulative density functions of the normal distribution, respectively. The first term is the standard normal likelihood, and applies to observations for which an ice date was observed. The second term reflects the probability of the observation being censored, and applies to no-freeze years. Given parameter values, Eq. 3 reflects the probabilities of observing the ice dates (y i ) during freeze years, as well as the probabilities that ice date was censored (unobserved) during no-freeze years. Thus, the β in the Tobit regression model indicate the effect of unit change in X on the latent variable, . We used Tobit regression models as implemented by the vglm() function in the R package VGAM to fit parameters to Suwa data.

Finding Breakpoint Locations

When fitting models with one breakpoint, breakpoints were searched exhaustively, and the breakpoint location whose model had the lowest Akaike Information Criterion (AIC) was selected. The model with the lowest AIC value was chosen as the most parsimonious model64. The same procedure applied to fitting the two-breakpoint model for Torne, which was fit with OLS (Supplementary Fig. 1).

Model Selection

We compared AIC values from CSR models containing one or two breakpoints to multiple regression models containing only year or only year and year2 as predictor variables. AIC is calculated from the negative log likelihood of the data given the model, minus a penalty of 2 AIC points per parameter; lower AIC values indicate better model fit. A model with only year as a predictor has two regression coefficients (β’s for intercept and linear slope), both the polynomial and the single breakpoint model have three, and the two breakpoint model has four.

Like the breakpoint models, the polynomial model indicates that trends in ice date are becoming steeper over time. The four models were of similar AIC for the Torne data, with the AIC decreasing as model complexity increased from 2155.8 to 2152.8 (Table S3). Comparing AIC values among models indicates that a single linear trend in ice date is a poorer description of the data than the more complex models that indicate accelerating ice dates, and that the single breakpoint model fits the data about as well as or better than the other models.

Inter-annual variability

For gross climate variability, we analyzed the standard deviation of non-overlapping 30-year windows of ice dates. Although all permutations of window sizes were evaluated, the general pattern was not influenced by window size. Therefore, we chose 30-year windows to assess long-term variability in climate by accounting for intra-annual, inter-annual and most multi-decadal variation attributed to weather, Quasi-biennial oscillation, El Nino Southern Oscillation, solar sunspots, and shorter multi-decadal variation in the range of 20–30 years. Before the standard deviation was calculated, any linear trend in the data within the 30-year window was removed to remove potential bias due to trend. For Suwa, the missing (unobserved) ice dates were removed, reducing the number of observations in the window and widened the standard deviation confidence interval. No-ice years were treated as censored values and included in the maximum likelihood fit of standard deviation (Matlab 2011a normfit function). All possible 30-year non-overlapping windows were examined to ensure the pattern in variability was not sensitive to the specific start-date used. The variability is presented in units of days and differs in baseline magnitude between the two lakes.

Quasi-periodic dynamics

To examine the quasi-periodic dynamics in the ice data through time, we applied a continuous wavelet transform to the ice dates for both lakes. This allowed for the decomposition of the signal into its individual frequency components while still examining how they change through time (as opposed to Fourier Transform). For the Suwa data wavelet analysis, the omitted middle period contained the majority of missing values. The missing values in the early and late period (∼10% of values) were replaced with the average of the time series, a conservative approach to prevent spurious oscillation detection that can occur with interpolated datasets. Tornio had no missing or no-ice observations. The Morlet basis function was chosen for its frequency identifying characteristics. For the continuous wavelet transform (Fig. 3C,D), significant periods were identified using a chi-squared test with 95% confidence intervals and assuming a red noise mean background spectrum. For the global wavelet transform (Fig. 3E,F), a 95% significance was also calculated using a red noise background spectrum, but time-averaged across all times outside of the cone of influence to give an overall significance level. For detail on the methods used in the continuous and global wavelet analysis, see Torrence and Compo65.

Changing drivers

We explored linear relationships between ice date and the following climate drivers (unless otherwise specified, drivers apply to both systems): air temperature (Air °C), atmospheric CO 2 , El Niño Southern Oscillation index (Suwa only), North Atlantic Oscillation index (Torne only), and Sunspots (Torne only). In each system, the relationships between ice date and each of the climate drivers were explored using data before and after the breakpoint. In Suwa, the duration of the period on both sides of the breakpoint was 75 years (1581–1655 and 1923–1997), and the time periods were chosen to maximize duration on either side of the breakpoint while minimizing the inclusion of years for which ice data were missing. In Torne, we used the full time series on either side, giving 174 years in the first portion (1693–1866) and 146 years in the second portion (1867–2013).

For each ice date–driver pair in each system, we performed separate linear regressions for an early and late period. In these analyses, we detrended the response and driver variables (i.e., took the residuals from a linear temporal trend), and for the driver variables, we scaled (x scaled = (x−μ)/σ, where μ is the mean of x, and σ is its standard deviation) the time series. These transformations were applied separately to early and late periods (Supplementary Fig. 2–8). This procedure resulted in twenty-two separate regressions, and each before-after pair of regressions is equivalent to fitting a single model of the form

where y is ice date, β 0 is the intercept, x 1 is the driver variable and β 1 its effect, x 2 is a dummy variable that is 0 before the breakpoint and 1 after, ergo β 2 is the post-breakpoint change in the intercept, and β 3 is the change in the relationship between the driver and ice date after the breakpoint (i.e., β 3 is the adjustment made to β 1 after the breakpoint). As described for the analysis of temporal trends, for Torne Eq. 4 was fit with ordinary least squares, and with the Tobit regression for Suwa. For each regression, we assumed there was a possibility that the residuals of the regression would be autocorrelated; to estimate coefficient standard errors in the presence of autocorrelation, we used a bootstrapping procedure where the randomized residuals retained the autocorrelation structure of the regression residuals.

To characterize the autocorrelation structure of the residuals in Eq. 4, we fit an autoregressive moving average (ARMA(p,q)) model with p AR parameters and q MA parameters. An ARMA(p,q) model has the general form

The y t are the residuals ε from Eq. 4, and μ is the mean of the residuals, which is 0. The β i are the AR parameters, the α j are the MA parameters, and the ε t-j are the residuals. We applied Eq. 5 to the ε of Eq. 4. We selected among ARMA models using AIC, and allowed model complexity to vary from ARMA(1, 0) to ARMA(5, 5) (including all orders in between). These models were fit and selected using the stepwise procedure implemented in the auto.arima function in the forecast R package. We then simulated an ARMA process using the fitted ARMA parameters; the variance of the innovations in the simulated ARMA process was the maximum likelihood estimate acquired in fitting Eq. 5 to the residuals.

The ARMA-simulated residuals formed the basis of our bootstrapping procedure. These simulated residuals were then added to the fitted regression values, and the regression was re-fit. This procedure was repeated 1,000 times. The standard deviation of these 1,000 parameter estimates was then used as the standard error of the parameters in Eq. 4. In summary, our bootstrapping procedure was as follows:

1 Fit Eq. 4 2 Separate residuals from Step 1 into before and after period 3 Fit and select an ARMA time series model (Eq. 5) to each set of residuals 4 Use the model from Step 3 to simulate new sets of residuals from fitted time series model 5 Add the residuals simulated in Step 4 to fitted values ( ) from Eq. 4 6 Re-fit Eq. 4 to the new set of observations from the sum in Step 5 7 Repeat Steps 2–6 1,000 times 8 The bootstrapped estimate of the standard error of parameters in Eq. 4 is the standard deviation of the 1,000 estimates from Step 6

We calculated p-values corresponding to the probability that regression coefficients between drivers and ice dates differed before (subscript 1) and after (subscript 2) the breakpoint in the following manner: