This article is the result of a year-long collaboration between myself and Dr. Justin Gash, Associate Professor Mathematics at Franklin College. For the past year, we’ve tracked and analyzed the movements of the Bitcoin market, with the goal of developing a model that could explain its behavior and predict future changes. Our findings are discussed below.

You can’t throw a rock these days without hitting a story about cryptocurrencies in general and Bitcoin in particular. From discussions of mining hardware, to predictions about the future of such movements, to the real identity of Bitcoin’s enigmatic creator Satoshi Nakamoto, plenty of people have an opinion — but objective attempts to analyze what drives the Bitcoin market and what factors influence the price of the currency have been few and far between. A year ago, as the price of Bitcoins began to lift out of the doldrums, I gave a long-time friend and tenured professor of mathematics a call. Would he be interested in taking a look at Bitcoin from a mathematical angle and seeing what we could make of the currency’s activity? He was.

Regression and statistics

Before we dive into the data, we need to discuss a bit of statistics. One of the techniques for comparing the relationship between two variables is called regression analysis. In theory, the relationship between any two trends can be measured, and there are entire websites dedicated to pointing out odd correlations, like this one illustrating the meteoric growth in Facebook use against Greece’s skyrocketing debt problem.

When it’s not being used to amuse nerds and draw improbable relationships, regression analysis can actually be used to describe useful things. In this case it was the relationship between various facets of the Bitcoin ecosystem.

First, two aspects of a system are plotted on a chart — the relationship between the total hash rate of the Bitcoin network and the price of one Bitcoin, for example. The next step is to find a mathematical model that most accurately characterizes the relationship between the two variables. Different statistical models can be applied to a set of data. Some of you may be familiar with the concept of a correlation coefficient — a number between -1 and 1 whose value indicates the strength of a relationship and whether that relationship is positive, nonexistant, or negative. In this article, we’re going to be talking about a distinct but related concept — the r-squared value. A model that fits a given set of data well will have a high r-squared value, while a model that doesn’t fit very well will have a low r-squared value. A model with a high r-squared value more accurately matches past data points and should (barring unforeseen shocks) be a better predictor of future ones.

The first question was whether we could find a strong relationship between any two aspects of Bitcoin itself. Our early investigations into Bitcoin price versus trade volume produced no useful results. The relationship between the amount of BTC flowing across the network and the price of Bitcoins had virtually nothing to do with each other at all. The relationship between the total hash rate and the value of one Bitcoin was stronger, but not particularly strong.

The relationship between Bitcoin difficulty and Bitcoin price, however, is much stronger. Those of you who are familiar with the cryptocurrency may be protesting at this point, since cryptocurrency difficulty is directly tied to cryptocurrency hash rates. The more people that mine, the faster difficulty rises. You might think, therefore, that the statistical relationship between the price and the difficulty would be the same as the relationship between the price and the hash rate. It wasn’t — possibly because difficulty increases every 2016 blocks, while hash rate can vary a great deal more over a small period of time.

Next page: Let the statistical analysis begin!