bc1q8x83c9zxzf7u44xtvshwr4rnxhjdda4ad6s6un

Bitcoin Signed Message: Reviewing “Modelling Bitcoin’s Value with Scarcity” — Part II: The hunt for cointegration

Signature: H+evhpmr/HrUTo5tHKwdtnkt09OIEKb3omE+mWxdI0j2N+6q5qXr9+XURAbSfaLQ/BeJjK1alPZzs6NrATb4jus=

Reviewing “Modelling Bitcoin’s Value with Scarcity” —Part II: The hunt for cointegration

Does OLS regression of natural logarithms of bitcoin price and stock-to-flow ratios result in spurious regression, or are we dealing with the exceptional case of cointegration?

In my first review of the work of PlanB, I concluded that the relation between stock-to-flow and bitcoin price as pointed out by the author was invalid because the general assumptions of ordinary least squares regression were not met. When two variables are non-stationary and we estimate a regression model, there is a good chance we find highly autocorrelated residuals and a significant value for the coefficient. This phenomenon is well known as spurious regression. But, spurious regression isn’t always the case. Sometimes the variables might be cointegrated, which would imply that the estimated relation is super consistent. Another review by Nick pointed out that in this specific case we could be dealing with the exceptional case of cointegration. For a better understanding of cointegration, I would recommend to have a look at a very good visual introduction of the concept here.

In this article, I will investigate if the log of bitcoin price and the log of its stock-to-flow ratio are indeed cointegrated. If cointegration applies, it turns out that the OLS estimates of the coefficients are consistent. If this is the case, I would have to reject my earlier conclusion where I said that the relation between the two variables as indicated by PlanB is nonsense since the OLS assumptions are not met.

As the concept of cointegration wasn’t really on top of my mind, I had to take a deep dive in some of my college books and academical literature during my holiday to refresh my mind on the concept and how to test for it.

TL;DR in layman terms

In an earlier analysis I showed that assumptions that should be met, were not met and that the resulting model therefor was flawed. In this article I looked into an exceptional case. If I would be able to confirm we are dealing with that specific exception, the resulting model would be validated and could be used to quantify the relation between stock-to-flow and bitcoin price. It turned out that the exception indeed applies and that we CAN use the model.

Difference between correlation and cointegration

Before we continue, it’s good to understand the difference between cointegration and correlation. Correlation is describing the in tandem movement of two (or more) variables. Cointegration is about the constant difference (with a stationary distribution) between the means of the same variables. Or a bit shorter: cointegration means that two time series both share a stochastic drift.

Method

All analysis is performed in Python where I used the following packages:

numpy

pandas

statsmodels

matplotlib

The dataset originates from my earlier analysis and a download can be found here. I figured out how to use Jupyter Notebook to visualise the analysis, because learning how to work with Jupyter was still on my wish list. The best way to learn these things is by just having a go at it.d

Testing

I use three different approaches to test for cointegration of the natural logarithms of bitcoins price and stock-to-flow ratio. To easily refer to those series we refer to them as lnBTCprice and lnS2F. I use the following tests:

Cointegrating Regression Durbin-Watson test (CRDW test);

the two step Engle Granger test;

the Johansen test.

All approaches are briefly summarised below.

CRDW Test

Test whether the Durbin-Watson statistic is significantly larger than 0. If a unit root exists the value should be close to zero. If we can’t reject the presence of a unit root in the residuals, this implies we can’t reject that the variables are not cointegrated.

Engle Granger Test

Determine the integration order of the two time series; lnS2F and lnBTCprice. (i.e. how often do we need to difference the series in order to find a stationary time series). If both lnS2F_𝑡 and lnBTCprice_t are integrated of order one (abbreviated to I(1)), we know that if these two series cointegrate then there will exist coefficients, 𝜇 and 𝛽 such that: lnBTCprice_𝑡 =𝜇+𝛽 lnS2F_𝑡+u_𝑡. The residuals that follow from running a regression will be tested for unit root, as residuals should be stationary in case variables are cointegrated. If for the residuals we can reject the null hypothesis of the presence of a unit root, we can say with at least 99% certainty that the residuals are not integrated of the first order.

Johansen Test

We know the natural logarithms of bitcoin price and S2F are both non stationary, which means they are integrated of an order larger than 0. That implies we can model both series by means of an autoregressive model. As we model both series at once, we can use a vector auto regressive (VAR) model in which y is the nx1 vector of variables integrated of order one (lnBTCprice and lnS2F).

This can be rewritten as:

where:

In the second equation above we have multidimensional variables and multiplication would happen via matrix multiplication.

The Johansen tests consists of two tests: the maximum eigenvalue test, and the trace test. For both test statistics we test the null hypothesis of no cointegration against the alternative of cointegration, by means of comparing the test statistics to the critical values for the test.

Running the tests

In this section we carry out the mentioned tests and have a closer look at the results of these tests.

Engle Granger and CRDW

Both series (natural logarithms of S2F and bitcoins price) are clearly not stationary, but trending over time. After differencing the series, we might find stationarity for both though.

By the naked eye I would say there is a very good chance that the differenced series are both stationary, but we need to check that as well.

To verify whether the differenced series are stationary I ran the augmented Dickey-Fuller (ADF) test for both differenced series. Code for the test is in the appendix.

ADF test result for first order difference of ln(price)

ADF Statistic: -12.843153

p-value: 0.000000

Critical Values:

1%: -3.432

5%: -2.862

10%: -2.567 ADF test result for first order difference of ln(S2F)

ADF Statistic: -15.426991

p-value: 0.000000

Critical Values:

1%: -3.432

5%: -2.862

10%: -2.567

For both series we can reject the null hypothesis of the presence of a unit root, which tells us we can say with at least 99% certainty that both variables are not integrated of the first order. Time to run an OLS regression to estimate the coefficients in:

lnBTCprice_𝑡 =α+𝛽 lnS2F_𝑡+e_𝑡

Here’s the regression summary which we use for both CRDW and the Engle and Granger procedure.

OLS Regression Results from statsmodels package in Python

We’ll have a closer look at the residuals from that regression. The residuals as shown below don’t look like a stationary series, but the Durbin Watson statistic is just significantly larger than zero for~3200 observations, so even though the residual plot indicates no cointegration, the CRDW test statistic (value=0.161) doesn’t support this!

Residuals vs fitted values and time

I ran the ADF test to check for unit root in the residuals. According to the ADF test we have to reject the null hypothesis and conclude that the residuals are stationary. The concept of cointegration is again not rejected!

ADF test result for regression residuals

ADF Statistic: -3.714701

p-value: 0.003911

Critical Values:

1%: -3.432

5%: -2.862

10%: -2.567

Johansen test

As mentioned the Johansen test consists of two separate tests; the maximum eigenvalue test and the trace test. The statsmodels package in Python was used to conduct the tests. Code can be found in the Appendix.

Trace Statistic:

[77.61330689 8.83704667]

Critical Values Trace Statistic [90% 95% 99%]:

[[13.4294 15.4943 19.9349]

[ 2.7055 3.8415 6.6349]]

Maximum Eigenvalue Statistic

[68.77626022 8.83704667]

Critical Values Maximum Eigenvalue Statistic [90% 95% 99%]

[[12.2971 14.2639 18.52 ]

[ 2.7055 3.8415 6.6349]]

For both Johansen tests we fail to reject the null hypothesis (as the test statistics are higher than critical values for all confidence intervals).

Conclusion

The estimated relation between lnBTCprice and lnS2F is consistent (even though the OLS assumptions are not met) as we have shown that the time series are cointegrated. My former conclusion is thereby falsified. As cointegration applies we are able to use the coefficients coming from the OLS to quantify a model that describes the relation between the two series.

We could set up a Vector Error Correction Model to model both the short term and the long term dynamics of the relation, which I leave for a follow up article.

References

[1]:https://medium.com/@100trillionUSD/modeling-bitcoins-value-with-scarcity-91fa0fc03e25

[2]:https://medium.com/burgercrypto-com/challenging-plan-b-a-review-of-modelling-bitcoins-value-with-scarcity-3d4e7e141286

[3]: https://medium.com/altcoin-magazine/falsifying-stock-to-flow-as-a-model-of-bitcoin-value-b2d9e61f68af

[4]: Co-Integration and Error Correction: Representation, Estimation, and Testing; Robert F. Engle and C. W. J. Granger, 1987

[5]: A guide to Modern Econometrics, second edition, 2005; M. Verbeek

[6]: https://medium.com/@vera.chernova.burfield/how-to-embed-jupyter-notebook-into-medium-posts-in-three-steps-1-2-3-54262edea88a

Appendix

Python code

https://gist.github.com/MarcelBurger/ed216b12e436bb4f07497cecff2b6742