Datasets

We construct the synthetic dataset by considering 1000 independent ‘nodes’ of dynamics where, at each node, we have a deterministic sinusoid component and a normally distributed stochastic component,

$$x_0^u(t) = {\mathrm{sin}}(2\pi t{\mathrm{/}}50) + 3\xi ^u(t).$$ (1)

Here u = 1, 2, …, 1000 represents the node index, t = 1, 2, …, 1000 denotes time, and \(\xi ^u(t)\sim {\cal N}(0,1)\) is Gaussian white noise. This setup can be intuitively understood as being analogous to spatially gridded data from a region of interest, where the value of the observable at each grid location can be modeled as being offset from a local mean state (which is the sinusoid in Eq. 1 in our case) by some dynamic noise (ξ u(t) in Eq. 1). Next, at each location u, we impose three transitions, of which the first two, a sharp transition at T = 200 and a linear change between T = 400 and T = 450, change the baseline value of the sinusoid equally for all nodes. The third transition at T = 675, however, changes the baseline differently for different locations. Half of the nodes have their baseline raised and the other half have their baseline lowered. Formally, we can write the transitions as

$$\tilde x_0^u(t) = \left\{ {\begin{array}{*{20}{l}} {x_0^u(t),} \hfill & {\forall u,1 \le t \le 200} \hfill \\ {x_0^u(t) + 5,} \hfill & {\forall u,201 \le t \le 400} \hfill \\ {x_0^u(t) + 45 - 0.1t,} \hfill & {\forall u,401 \le t \le 450} \hfill \\ {x_0^u(t),} \hfill & {\forall u,451 \le t \le 675} \hfill \\ {\left\{ {\begin{array}{*{20}{c}} {10x_0^u(t),} & {1 \le u \le 500} \\ { - 10x_0^u(t),} & {501 \le u \le 1000} \end{array}} \right.} \hfill & {676 \le t \le 1000} \hfill \end{array}} \right.,$$ (2)

where \(\tilde x_0^u(t)\) is the transition-imposed time series at location u. Finally, in order to simulate noisy measurements, we take \(\tilde x_0^u(t)\) as the mean of a normal distribution with standard deviation equal to the measurement error 1.5, such that our final sampled time series \(x_s^u(t)\) is: \(x_s^u(t)\sim {\cal N}\left( {\tilde x_0^u(t),1.5} \right)\). The 1000 time series of length 1000 time points each, obtained in this way, are then used to construct the PDF sequence in the next step.

The daily stock index data for the DAX (code: DAX, 30 companies from the Frankfurt Stock Exchange), NASDAQ-100 (code: NDX, 100 companies listed on the NASDAQ), and S&P BSE SENSEX (code: BSESN, 30 companies from the Bombay Stock Exchange) stock indices are obtained from http://finance.yahoo.com/ with their appropriate codes under the section ‘Historical Prices’. Google trends data (Fig. 3d) were obtained from https://www.google.com/trends/ for the search queries: ‘mortgage crisis’, ‘Eurozone crisis’, ‘Brexit’, and ‘Grexit’ on 2 June 2016. The data were then normalized using a min–max transform such that they fall in the interval [0,1].

The monthly SST anomalies were obtained from the gridded SST data product Merged Hadley-NOAA/OI Sea Surface Temperature & Sea-Ice Concentration released by National Centers for Environmental Prediction26, and available for free download at: http://www.esrl.noaa.gov/psd/data/gridded/data.noaa.oisst.v2.html. The Niño 3.4 region was extracted from the global data as those grid points falling within 5°N–5°S and 120°W–120°W. Anomalies were calculated with reference to the average climatology of the period: 1 January 1950 to 31 December 1979. The Niño 3.4 index data (Fig. 4b, d) was obtained from http://www.cpc.ncep.noaa.gov/data/indices/. The NCEI PDO index series (Supplementary Figs. 5 and 7) was obtained from https://www.ncdc.noaa.gov/teleconnections/pdo/.

The paleoclimate datasets for the Dongge24, Tianmen27, and Qunf28 caves were obtained from the NOAA National Centers for Environmental Information (formerly the National Climatic Data Center). They are available for free download at: https://www.ncdc.noaa.gov/data-access/paleoclimatology-data.

Constructing \(\left\{ {\varrho _t^X} \right\}_{t = 1}^N\) from measurements

In the case of the synthetic example, for a given time slice T = t, we consider the values \(\left\{ {x_s^u(t) \, | \, u = 1,2, \ldots ,1000} \right\}\) as the given set of measurements for that time instant and then obtain a kernel density estimate (using the Python scientific computing package SciPy) which yields an estimate of the PDFs \(\varrho _t^X\) for that time instant t.

For the financial datasetes, we use the reported intra-day high x hi (t) and intra-day low x lo (t) values of the stock indices on a given day, and we postulate that, without any further information about the intra-day variations of the stock indices, the observed stock index values are noisy measurements of the underlying correct value of the index, which we assume to lie with equal probability anywhere inside the interval between x lo (t) and x hi (t). This results in the PDF sequence \(\varrho _t^X(x) = \left( {x_{{\mathrm{hi}}} - x_{{\mathrm{lo}}}} \right)^{ - 1}\) for \(x \in \left[ {x_{{\mathrm{lo}}}{\mathrm{,}}x_{{\mathrm{hi}}}} \right]\), and 0 otherwise.

In the SST dataset, for a given month in the SST data for the Niño 3.4 region, we take the spatially distributed SST anomaly values for that month and apply a kernel density estimation using an optimal bandwidth for Gaussian kernels with the Python toolkit Scikit-learn29. This results in an empirically estimated \(\left\{ {\varrho _t^X} \right\}_{t = 1}^N\) sequence constructed from the spatial distribution of SST values in a given month.

In the paleoclimate datasets, using the obtained proxy-depth and age-depth data, we estimate the posterior probability of the paleoclimatic proxy at a chosen time instant of the past using a Bayesian approach reported in an earlier study30. In this approach, we consider the proxy, radiometric age, calendar age, and depth as the random variables X, R, T, and Z, respectively. In these terms, our quantity of interest is the probability \(\varrho (x \, | \, t)\), which for a speleothem dated with U/Th radiometric dates can be shown to be approximated by the Riemann sum

$$\varrho (x \, | \, t) \approx \frac{{\mathop {\sum}\limits_{j = 1}^M b_j \, w_t( {z_j^x} ) \, \varrho ( {x \,|\, z_j^x} )}}{{\mathop {\sum}\limits_{j = 1}^M b_j \, w_t( {z_j^x})}}$$ (3)

where \(z_j^x,{\kern 1pt} j = 1,2, \ldots ,M\) denotes the M depth values at which the proxy measurements are made, and where b j is the width of the depth interval corresponding to \(z_j^x\):

$$b_j = \frac{1}{2}\left\{ {\begin{array}{*{20}{c}} {z_2^x - z_1^x} & {j = 1} \\ {z_{j + 1}^x - z_{j - 1}^x} & {1 < j < M} \\ {z_M^x - z_{M - 1}^x} & {j = M} \end{array}} \right..$$ (4)

The weight term \(w_t ( {z_j^x})\) in Eq. 3 encodes our beliefs as to how likely a chosen depth \(z_j^x\) is given the calendar age T = t, i.e., \(w_t( {z_j^x} ) = {\int} \varrho ( {r\,|\,t} ) \, \varrho ({r\,|\,z_j^x}) \, {\mathrm{d}}r\). Thus, the probability that the proxy X = x at a chosen time T = t is expressed in terms of estimable or measured quantities. For the application to the paleoclimate datasets in this study, we take a regular time grid at 5-year intervals starting (ending) at the minimum (maximum) age measurement. We refer to the paper by Goswami et al. 30 for a more detailed discussion. For the current analysis, we use \(\varrho _t^X(x): = \varrho (x\,|\,t)\) from Eq. 3.

Network of recurrence probabilities

We use the framework of recurrence analysis to analyze the chosen datasets. Typically, this is based on the construction of a recurrence matrix R whose elements R ij are either 1 if the state \({\cal X}\) recurred (within an ε-neighborhood) at times i and j, or 0 otherwise11. The recurrence matrix R can be used to classify and investigate various classes of complex dynamics. More recently, R has been shown to be interpretable as the adjacency matrix A = R − 1 of a complex network, called the recurrence network (RN), where the nodes are the time points of the observations and edges are placed between those pairs of time points which recur within an ε-neighborhood12. Here, 1 is the identity matrix of the same size as R, which is subtracted from R to give an adjacency matrix A without self-loops.

In our case, since we consider time series with uncertainties, it is not possible to give a precise answer to the question whether time points i and j recurred, in the sense that we cannot answer this question with a 1 or a 0 as in a traditional recurrence analysis. We estimate instead the probability that i and j recurred in a chosen ε-neighborhood. A further point of difference with traditional recurrence analysis is that, till date, there does not exist any meaningful way to ‘embed’ a time series of probability densities, and we thus estimate the recurrence probabilities in the following without embedding our data. Therefore, from here on, we use the simple scalar difference between the observable at times i and j as our distance metric in order to define a ‘recurrence’. For the sake of clarity, we define as X i and X j the random variables describing the observable at time points i and j respectively, and \(Z_{ij}: = X_i - X_j\) as the random variable describing the difference of the observable at i and j. The probability of recurrence \(Q_{ij}(\varepsilon ): = Prob{\kern 1pt} \left( {\left| {Z_{ij}} \right| \le \varepsilon } \right)\) (where | ⋅ | denotes the absolute value) is

$$Q_{ij}(\varepsilon ) = {\int}_{ - \infty }^{ + \infty } \varrho _i^X(x_i){\int}_{x_i - \varepsilon }^{x_i + \varepsilon } \varrho _{j|i}^X(x_j \, | \, x_i) \, {\mathrm{d}}x_i\,{\mathrm{d}}x_j = {\int}_{ - \infty }^{ + \infty } {\int}_{x_i - \varepsilon }^{x_i + \varepsilon } \varrho _{ij}^X(x_i,x_j) \, {\mathrm{d}}x_i\,{\mathrm{d}}x_j,$$ (5)

where the joint probability density \(\varrho _{ij}^X(x_i,x_j)\) is not provided and is also not uniquely estimable from the marginal densities \(\varrho _i^X(x_i)\) and \(\varrho _j^X(x_j)\) alone. To overcome this limitation we use the results from an earlier study by Williamson and Down 31 who show how upper and lower bounds on the cumulative distribution function (CDF) of X i − X j can be derived using only the marginal CDFs for X i and X j , denoted here as \(P_i^X\) and \(P_j^X\) respectively. In the following, we thus first describe the probability of recurrence Q ij (ε) in terms of the CDF of Z ij = X i − X j and then use the bounds given by Williamson and Down 31 to derive precise bounds for Q ij (ε) itself.

The probability of recurrence Q ij (ε) can be seen as the total probability thatZ ij = X i − X j falls in the interval [−ε, ε],

$$Q_{ij}(\varepsilon ) = Prob\left( {\left| {Z_{ij}} \right| \le \varepsilon } \right) = Prob\left( {Z_{ij} \le \varepsilon } \right) - Prob\left( {Z_{ij} \le - \varepsilon } \right),$$ (6)

which implies

$$Q_{ij}(\varepsilon ) = P_{ij}^Z(\varepsilon ) - P_{ij}^Z( - \varepsilon ),$$ (7)

where \(P_{ij}^Z\left( {z_{ij}} \right): = Prob\left( {Z_{ij} \le z_{ij}} \right)\) is the CDF of Z ij , and z ij ∈ (−∞, ∞).

The upper bound M ij and the lower bound m ij for the CDF \(P_{ij}^Z\left( {z_{ij}} \right)\forall z_{ij}\) is obtained using the results from Williamson and Down 31 as,

$$M_{ij}\left( {z_{ij}} \right) = {\mathrm{min}}\left\{ {\mathop {{{\mathrm{inf}}}}\limits_v f_{ij}\left( {v,z_{ij}} \right),0} \right\} + 1,\,{\mathrm{and}}$$ (8)

$$m_{ij}\left( {z_{ij}} \right) = {\mathrm{max}}\left\{ {\mathop {{{\mathrm{sup}}}}\limits_v f_{ij}\left( {v,z_{ij}} \right),0} \right\},$$ (9)

where f ij (v, z ij ) = P i x(v) − P j x(v − z ij ). These bounds ensure that, for all values of Z ij = z ij , \(P_{ij}^Z ( {z_{ij}} ) \in [ {m_{ij}( {z_{ij}} ),M_{ij}( {z_{ij}})} ] \subseteq [0,1]\). Using these bounds and combining them with Eq. 7, we find that the upper bound \(q_{ij}^{\mathrm{u}}\left( \epsilon \right)\) and the lower bound \(q_{ij}^{\mathrm{l}}\left( \epsilon \right)\) for the probability of recurrence Q ij (ε) can be written as,

$$q_{ij}^{\mathrm{u}}(\varepsilon ) = {\mathrm{min}}\left\{ {M_{ij}(\varepsilon ) - m_{ij}( - \varepsilon ),1} \right\},\,{\mathrm{and}}$$ (10)

$$q_{ij}^{\mathrm{l}}(\varepsilon ) = {\mathrm{max}}\left\{ {m_{ij}(\varepsilon ) - M_{ij}( - \varepsilon ),0} \right\}.$$ (11)

Given a recurrence threshold ε, the bounds \(q_{ij}^{\mathrm{u}}\)(ε) and \(q_{ij}^{\mathrm{l}}\)(ε) constrain the probability of recurrence Q ij (ε) such that Q ij (ε) ∈ [\(q_{ij}^{\mathrm{l}}\)(ε), \(q_{ij}^{\mathrm{u}}\)(ε)] ⊆ [0, 1]. We drop ε in the following for notational clarity and with the understanding that ε is fixed.

First, we assume the probability Q ij itself to be a random variable distributed in the obtained interval [\(q_{ij}^{\mathrm{l}}\), \(q_{ij}^{\mathrm{u}}\)], but in a way unknown to us, and define the PDF within these bounds as \(\varrho _{ij}^Q(q_{ij})\). Next, assuming A to be the true but not estimable adjacency matrix of the system’s recurrence network, we write down the probability of having a link between i and j as

$$Prob\left( {{\bf{A}}_{ij} = 1} \right) ={\int}_{q_{ij}^{\rm l}}^{q_{ij}^{\rm u}} \varrho _{ij}^{{\bf{A}}|Q}\left( {{\bf{A}}_{ij} = 1 \big\rvert q_{ij}} \right) \;\varrho _{ij}^Q(q_{ij})\;{\mathrm{d}}q_{ij} ={\int}_{q_{ij}^{\rm l}}^{q_{ij}^{\rm u}} q_{ij}{\kern 1pt} \; \varrho _{ij}^Q(q_{ij}) \; {\mathrm{d}}q_{ij} = {\bf{E}}_{\rho _{ij}^Q}[Q_{ij}],$$ (12)

where we use the notation \(\varrho _{ij}^{{\bf{A}}|Q} ( {A_{ij} = 1 \, | \, q_{ij}} )\) to denote the probability that A ij = 1 given Q ij = q ij and use the result that \(\varrho _{ij}^{{\bf{A}}|Q} ( {A_{ij} = 1 \, | \, q_{ij}} ) = q_{ij}\). The final result of Eq. 12 shows how the total probability that A ij equals 1 is simply the expectation of Q ij evaluated with respect to \(\varrho _{ij}^Q(q_{ij})\).

Assuming that Q ij is distributed symmetrically around the mean in the interval [\(q_{ij}^{\mathrm{l}}\), \(q_{ij}^{\mathrm{u}}\)], the total probability that the observable at i and j (from Eq. 12) is

$$Prob{\kern 1pt} ({\bf{A}}_{ij} = 1) = {\bf{E}}_{\varrho _{ij}^Q}[Q_{ij}] = \frac{{q_{ij}^{{\rm {l}}} + q_{ij}^{{\rm u}}}}{2},$$ (13)

which allows us to define an estimator \(\hat{\mathbf{A}}\) of the probabilities of recurrence of the observable X and interpret it as the adjacency matrix of a network whose nodes are the time points of observation and whose edge weights are defined as

$${{\hat {\bf A}}}_{ij}(\varepsilon ): = \left( {\begin{array}{*{20}{c}} {\frac{1}{2}\left( {q_{ij}^{\rm l}(\varepsilon ) + q_{ij}^{\rm u}(\varepsilon )} \right)} & {i\,

e \,j,} \\ 0 \hfill & {i = j} \end{array}} \right..$$ (14)

Here, we put \(\hat{\mathbf{A}}_{ii} = 0\) to avoid self-loops in the network. The elements of \(\hat{\mathbf{A}}_{ij}\) encode the total probability that time points i and j have recurred within an ε-neighborhood, taking into account the uncertainties in the dataset.

In order to estimate the networks of recurrence probabilities for the applications, we use a bisection routine to arrive at a suitable ε threshold which results in a pre-specified link density of the RN. The link densities chosen are: (1) synthetic example, 30%, (2) financial datasets, 24%, (3) SST dataset, 25%, and (4) paleoclimatic datasets, 30%.

Detecting abrupt transitions using recurrence network community structure

Block-like structures in recurrence plots have been suggested to encode the occurrence of an abrupt transition in the dynamics of the observable under consideration32. In the RN framework, such structures correspond to communities, defined in the sense of Newman17 as those parts of a network which have a higher link density within themselves than to the rest of the network. RN communities represent a time period in which the states of the system are closer to each other than to the rest and thus, they correspond to stable regimes of dynamics and their borders are the time points at which the system transited between regimes. However, as our interest is confined solely to the question of the existence of an abrupt transition at the midpoint of the time period under consideration, we do not need to do a typical ‘community detection’ by taking into account all possible partitions of the network. Rather, for our purposes it suffices to move a sliding window over the dataset (Fig. 2b), and after extracting the portion of \(\hat{\mathbf{A}}\) which falls within that window, to test whether the two halves of the network (before and after the midpoint) form two communities that are unlikely to have occurred due to chance. We use the within-community link fraction S as an indicator of community structure. We expect those windows with a high value of S to have very low p-values. A low p-value, obtained at the end of statistical hypothesis testing for the window under consideration, would imply that we fail to accept the null hypothesis that the two-community structure observed in that window occurred by chance.

We consider the sliding window at a position k, where it covers the time period T ∈ [t 1 , t 2 ], and define the midpoint of the window as

$$t_{\mathrm{m}} = \left\lfloor {\frac{{t_1 + t_2}}{2}} \right\rfloor ,$$ (15)

where \(\lfloor \cdot \rfloor\) denotes the greatest integer function, which gives the largest integer less than or equal to the given function argument. Defining \(\hat{\mathbf{A}}^k\) as the portion of the full adjacency matrix \(\hat{\mathbf{A}}\) which covers the interval [t 1 , t 2 ], we can estimate the within-community link fraction s k for the window at k as

$$s^k = \frac{{\mathop {\sum}\limits_{i,\,j = t_1}^{t_{\mathrm{m}}} \hat{\mathbf{A}}_{ij}^k + \mathop {\sum}\limits_{i,j = t_{\mathrm{m}}}^{t_2} \hat{\mathbf{A}}_{ij}^k}}{{\mathop {\sum}\limits_{i,\,j = t_1}^{t_2} \hat{\mathbf{A}}_{ij}^k}}.$$ (16)

The extent to which the value of S obtained from the data in a particular window \(( {: = s_{{\mathrm{obs}}}^k} )\) is governed by a dynamical transition and not by randomness is quantified using the p-value \(( {: = p_{s_{{\mathrm{obs}}}^k}} )\) of a statistical test with the null hypothesis H 0 : S is determined by the degree sequence of the given network. Here, the term degree sequence denotes the sequence of the total weight of links for each node. To simulate this null hypothesis, we use the Python Igraph module which allows us to generate random networks with a prescribed degree sequence. In each window k, we generate 1000 such random surrogate networks and obtain an ensemble \(s_{{\mathrm{sur}}}^k\) of values of S, from which we can estimate \(p_{s_{{\mathrm{obs}}}^k}\) using the percentile of \(s_{{\mathrm{obs}}}^k\) in the distribution \(S_{{\mathrm{sur}}}^k\),

$$p_{s_{{\mathrm{obs}}}^k} = 1 - \frac{{F_{{\mathrm{sur}}}^{S^k} ( {s_{{\mathrm{obs}}}^k} )}}{{100}},$$ (17)

where \(F_{{\mathrm{sur}}}^{S^k}( \cdot )\) is a percentile function, which returns the percentile of the given argument within the distribution of the 1000 \(s_{{\mathrm{sur}}}^k\) values generated in the previous step. Here, we assume that 1000 is a sufficient number of random realizations for the following approximation to hold:

$$P_{{\mathrm{null}}}^{S^k} ( {s^k} ) \approx \frac{{F_{{\mathrm{sur}}}^{S^k} ( {s^k} )}}{{100}},$$ (18)

where \(P_{{\mathrm{null}}}^{S^k}\) is the CDF of the within-community link fraction S k corresponding to the null hypothesis. Finally, to estimate the statistically significant windows for a chosen confidence level α, we apply the Holm’s method to correct for multiple comparisons33 along with the Dunn–Šidák correction factor34 (Figs. 2, 3 and 4).

The sizes of the sliding window were different for the different applications: synthetic example, 100 time points; financial datasets, 60 time points (approximately 2 months); SST data, 30 time points (2.5 years); and paleoclimatic datasets, 100 time points (500 years). The confidence level alpha was set at α = 0.05 for all tests.

Coincidence analysis of the ENSO transitions

Here, we define phases for the Niño 3.4 index and the PDO index (Methods: Datasets) in the sense of Maraun and Kurths35. In this approach, a low-pass forward–backward Butterworth filter is applied on the index time series \(\{ y_t\} _{t = 1}^N\) such that all frequencies higher than \(\frac{1}{{12}}\) months−1 are removed. The filtered time series \(\tilde y_t\) is then used to obtain the instantaneous phase ϕ t (Supplementary Fig. 7),

$$\phi _t = {\mathrm{arctan}}\left( {\frac{{H(\tilde y_t)}}{{\tilde y_t}}} \right)$$ (19)

where H(⋅) denotes the Hilbert transform. The time derivative \(\frac{{{\mathrm{d}}({\mathrm{\Delta }}\phi _t)}}{{{\mathrm{d}}t}}\) of the instantaneous phase difference \({\mathrm{\Delta }}\phi _t = \phi _t^{{\mathrm{PDO}}} - \phi _t^{{\mathrm{ENSO}}}\) helps to identify periods of ‘phase locking’ where Δϕ t ≈ 0 (Supplementary Fig. 8a). We consider the two index time series, representing ENSO and PDO respectively, to be phase locked when \(\frac{{{\mathrm{d}}({\mathrm{\Delta }}\phi _t)}}{{{\mathrm{d}}t}}\) falls between the 25th and 75th percentile of all \(\frac{{{\mathrm{d}}({\mathrm{\Delta }}\phi _t)}}{{{\mathrm{d}}t}}\) values (Supplementary Fig. 8b). Next, we test whether the phase-locked time periods are significantly coincident with the abrupt transitions detected using our current method on the PDF series of SST anomalies from the Niño 3.4 region.

We consider a detected transition to be coincident if it occurs within 31 days of a time point of phase locking. In total, 216 such coincidences are identified. To test whether this could occur simply by chance, we randomize the timings of the detected transitions 50,000 times and compute the number of coincidences each time. This results in a null distribution of coincidences occurring purely by chance (Supplementary Fig. 8c). At a significance level of 5%, we find that the observed number of coincidences is significantly higher than that possible by pure random chance, which validates the hypothesis that the detected abrupt transitions in the Niño 3.4 region are significantly coincident with periods of phase locking between the PDO and the ENSO.

Code availability

All numerical computation required to implement the ideas presented in this study have been coded in Python 2. All necessary codes are available on request from the corresponding author.

Data availability

We have used previously released, freely available datasets in our study. The various data sources have been listed in Methods, Datasets. In case of any difficulty in obtaining the datasets mentioned above, the corresponding author can provide the data used upon request.