UPDATE:

Because of the general interest in this matter I created a dataset including all OHLC data from the Bitfinex exchange API and uploaded it as a public dataset on Kaggle.

Introduction

Algorithmic trading is a popular way to tackle the fast-paced and volatile environment of cryptocurrency markets. However implementing an automated trading strategy is challenging and requires a lot of backtesting, which in turn requires a lot of historical data. While there are several sources available that provide historical cryptocurrency data most of them have drawbacks. Either they are expensive, provide only low temporal resolution data (daily) or cover limited time periods of a limited amount of currency pairs. Here we will see that obtaining historical open, high, low, close data (OHLC) at a 1-minute resolution is actually not a magical task and can be done in a few lines of Python code for free.

Connecting to the exchange

In this tutorial, we will use the Bitfinex exchange API to retrieve historical data. However, the approach should also work for any other exchange that provides a similar API. Also, you do not need a Bitfinex account for this code to work since we will only use public API endpoints. In case you are not familiar with what an API is or how to use it I suggest you read through the Bitfinex API documentation, after all this is also the interface through which your algorithm will later interact with the exchange. But do not worry, you won't need to write the Python interface for the Bitfinex API yourself. There are already several implementations available with one of them being this client here. The easiest installation is via pip:

>>> pip install bitfinex-tencars

Alternatively, if you have Git installed you can simply run the commands below to install the client. Just remember to replace <folder> with your target folder.



>>> python <folder>/setup.py install >>> git clone https://github.com/akcarsten/bitfinex_api.git >>> python /setup.py install

If you do not have Git installed you can clone the repository on the GitHub page, then go to the folder you cloned it to and run:

>>> python setup.py install

In both cases, the Bitfinex client will be installed to your Python distribution.

Using the API client

If you look at the Bitfinex API documentation you will see that there are two API versions, v1 and v2, both of them are implemented in the client you just installed but here we will only use the v2 API. So after importing the Bitfinex API client, we need to create an instance of the v2 API by running the code below. Notice that we are not providing any keys here so we will only have access to the public endpoints, a corresponding message will be shown after running the code.

>>> import bitfinex



>>> # Create api instance of the v2 API

>>> api_v2 = bitfinex.bitfinex_v2.api_v2()

And that is our gate to the data. From the documentation, we know that one of the public endpoints is called candles which returns the data behind the candlestick charts that you see on all the exchanges. This kind of data contains the following information a time stamp the open, close, high and low price and the trade volume. It is also referred to as OHLC data. The simplest way to interact with this endpoint through the client is to just call it with its default settings.

>>> result = api_v2.candles()

The line above will give you the last 1000 minutes of OHLC data for the Bitcoin price in USD. Well, that’s nice but we might be interested in a time period long ago or a different currency pair. In this case, we can specify additional parameters to get exactly what we want. And these parameters are:

symbol : currency pair,default: BTCUSD

: currency pair,default: BTCUSD interval : temporal resolution, e.g. 1m for 1 minute of OHLC data

: temporal resolution, e.g. 1m for 1 minute of OHLC data limit : number of returned data points, default: 1000

: number of returned data points, default: 1000 start : start time of interval in milliseconds since 1970

: start time of interval in milliseconds since 1970 end: end time of interval in milliseconds since 1970

So with this information at hand, we can run the first query. The code below will return the 1-minute resolution OHLC data of Bitcoin price in USD for the first two days in April 2018.

>>> import datetime

>>> import time >>> # Define query parameters

>>> pair = 'btcusd' # Currency pair of interest

>>> bin_size = '1m' # This will return minute data

>>> limit = 1000 # We want the maximum of 1000 data points >>> # Define the start date

>>> t_start = datetime.datetime(2018, 4, 1, 0, 0)

>>> t_start = time.mktime(t_start.timetuple()) * 1000 >>> # Define the end date

>>> t_stop = datetime.datetime(2018, 4, 2, 0, 0)

>>> t_stop = time.mktime(t_stop.timetuple()) * 1000 >>> result = api_v2.candles(symbol=pair, interval=bin_size,

>>> limit=limit, start=t_start, end=t_stop)

Collecting historical data for longer time intervals

Now that’s great but there is still one problem: The API will only return a maximum of 1000 data points. So if we were to increase the time interval of interest to the entire month of April 2018 we would not be able to get it at a 1-minute resolution. So to get past this limitation we need to write a function that splits our big query into multiple smaller ones. One additional thing we need to keep in mind here is that there is a limit of how many requests we can make to the Bitfinex API. Currently, this limit is at 60 calls per minute which means after each request we should wait for a minimum of 1 second before we start the next one. To be safe the function below waits 2 seconds but you can change that if you want.

>>> def fetch_data(start, stop, symbol, interval, tick_limit, step):

>>> # Create api instance

>>> api_v2 = bitfinex.bitfinex_v2.api_v2()

>>> data = []

>>> start = start - step

>>> while start < stop:

>>> start = start + step

>>> end = start + step

>>> res = api_v2.candles(symbol=symbol, interval=interval,

>>> limit=tick_limit, start=start,

>>> end=end)

>>> data.extend(res)

>>> time.sleep(2)

>>> return data

With the function above we can now run queries for longer time intervals, the only extra thing we need to provide is the step size in milliseconds. That is how many data points we should ask for in each of the smaller queries. This is basically the same as the limit we defined earlier but now in milliseconds. So to reduce the number of calls to the API we should go for the maximum which means for the 1-minute case a step size of: 60000 * 1000 = 60000000.

>>> # Set step size

>>> time_step = 60000000 >>> # Define the start date

>>> t_start = datetime.datetime(2018, 4, 1, 0, 0)

>>> t_start = time.mktime(t_start.timetuple()) * 1000 >>> # Define the end date

>>> t_stop = datetime.datetime(2018, 5, 1, 0, 0)

>>> t_stop = time.mktime(t_stop.timetuple()) * 1000 >>> pair_data = fetch_data(start=t_start, stop=t_stop, symbol=pair,

>>> interval=bin_size, tick_limit=limit,

>>> step=time_step)

Finally let’s convert the results into a Pandas data frame so we can remove potential duplicates, make sure everything is in the correct order and convert the timestamp into a readable format.

>>> import pandas as pd

>>>

>>> # Create pandas data frame and clean/format data

>>> names = ['time', 'open', 'close', 'high', 'low', 'volume']

>>> df = pd.DataFrame(pair_data, columns=names)

>>> df.drop_duplicates(inplace=True)

>>> df['time'] = pd.to_datetime(df['time'], unit='ms')

>>> df.set_index('time', inplace=True)

>>> df.sort_index(inplace=True)

Conclusion

So retrieving high-resolution OHLC data is actually not that complicated. And if you wonder for how many currency pairs we can do that through the Bitfinex API, just run the two lines of code below.

>>> api_v1 = bitfinex.bitfinex_v1.api_v1()

>>> pairs = api_v1.symbols()

Now if we were to push it we could write a script like this which collects all the data for each currency pair and saves it to a CSV file. That gives you all the historical OHLC trading data from the Bitfinex exchange at a 1-minute resolution which should help you develop an automated trading strategy. However, it will take a while until all the data is on your computer so you should limit your query either to a shorter time frame or be more selective with your currency pairs.

I hope that helps and you can check out the code here, follow me on Twitter or connect via LinkedIn.