Cointegration vs. Correlation

In quantitative trading, we usually work with non-stationary time-series. Often, people consider correlated for two assets when these assets co-move, but this term is mathematically incorrect in this context. Pearson’s correlation is defined for stationary variables only. As we see, this formula uses expected values and standard deviations, but these values are changing over time in non-stationary processes.

Correlation formula

For these processes, we can define the cointegration. Cointegration refers to some stationary linear combination of several non-stationary time-series. Easy explanation you can find in this video

This picture shows two processes (X and Y), and their spread. This is an example of the correlation with no cointegration.

Correlation with no cointegration

This example is vice versa (cointegration with no correlation)

Cointegration with no correlation

How to build these processes using Python you can find here.

For going to the next chapter, we should know how to detect the cointegration.

The three main methods for testing for cointegration are: Engle–Granger two-step method If xt and yt are non-stationary and cointegrated, then a linear combination of them must be stationary. In other words: yt−βxt =ut, where ut is stationary. If we knew ut, we could just test it for stationarity with something like a Dickey–Fuller test, Phillips–Perron test and be done. But because we don’t know ut, we must estimate this first, generally by using ordinary least squares, and then run our stationarity test on the estimated ut series. 2. Johansen test The Johansen test is a test for cointegration that allows for more than one cointegrating relationship, unlike the Engle–Granger method, but this test is subject to asymptotic properties, i.e. large samples. If the sample size is too small then the results will not be reliable and one should use Auto Regressive Distributed Lags (ARDL). 3. Phillips–Ouliaris cointegration test Peter C. B. Phillips and Sam Ouliaris (1990) show that residual-based unit root tests applied to the estimated cointegrating residuals do not have the usual Dickey–Fuller distributions under the null hypothesis of no-cointegration. Because of the spurious regression phenomenon under the null hypothesis, the distribution of these tests have asymptotic distributions that depend on (1) the number of deterministic trend terms and (2) the number of variables with which co-integration is being tested. These distributions are known as Phillips–Ouliaris distributions and critical values have been tabulated. In finite samples, a superior alternative to the use of these asymptotic critical value is to generate critical values from simulations. Source: Wikipedia

Let’s code some analysis for this problem. First of all, download the data from Bitfinex for several cryptocurrencies (from 2018–01–01 to 2018–05–31). The next step is plotting a performance of cryptocurrencies. Finally, carry out the cointegration test for all pairs of assets.

The performance of cryptocurrencies is

Performance of cryptocurrencies (from 2018–01–01 to 2018–05–31)

The null-hypothesis is that there is no cointegration, the alternative hypothesis is that there is cointegrating relationship. If the p-value is small, below a critical size, then we can reject the hypothesis that there is no cointegrating relationship.

Cointegration test result

We can conclude that some of these pairs are cointegrated and could be selected for the next research.