Enigma Developer Update, September 21st, 2017:

Minute-resolution pricing data for Catalyst

At Enigma, our development of Catalyst — a platform that enables anyone to start their own crypto hedge fund — continues at a good pace. After our previous release earlier this month that provided live trading functionality, we come back to further improve backtesting capabilities, releasing this week a new version with 1-minute-resolution pricing data.

There is a saying in the crypto-community that one day in crypto feels like a year in any other industry. Things happen very fast, whether it’s raising millions of dollars in Initial Coin Offerings (ICOs) that last as little as 24 seconds or cryptoasset ‘pump and dumps’ where prices more than double or halve in the time it takes you to spell ‘cryptoasset’.

Thus, it is imperative for Catalyst to provide accurate and reliable pricing data, which includes the standard OHLCV (Open, High, Low, Close prices and Volume) for all crypto assets that are trading on a given exchange. We are starting with pricing data from the Poloniex exchange that provides 101 currency pairs at the time of this writing, with more assets added weekly.

Even though Poloniex doesn’t provide pricing data with 1-minute resolution by default (the lowest they go is every 5 minutes), one can build that dataset from the history of all transactions, and resampling it on a minute-by-minute basis. Let me provide some numbers to give you a sense of the size of the dataset, and the scope of the associated data manipulations:

The dataset encompasses about two and half years of data, starting with Bitcoin being traded for the first time on this particular exchange on Feb 21st, 2015.

There are about 221 million trades recorded in this timespan of 2.5 years and growing daily.

All these transactions take up 16GB of disk space in uncompressed plain tabular form (CSV files), which then gets reduced to 10GB of minute bar OHLCV data (again, uncompressed CSV files) using pandas.

Catalyst stores all this data using an external library optimized for column data (OHLCV being a prime use case) named bcolz that further compresses the data and optimizes it for querying. The compression ratio depends heavily on the resolution chosen for the data in terms of number of decimal places. With only 3 decimal places, it got down to 99MB (that’s a 100x compression factor!), but we choose to provide 8 decimals of resolution given the small fractional nature of most coins. That brought the size on disk to ~450MB. Gzip further compresses it to ~340MB for download, and that’s what Catalyst makes available!

Eager to get started with algorithmic crypto-trading? Head over to the Catalyst Documentation Wiki for instructions on how to install and run your first algorithm. Our support community of enthusiast algo traders lives on the Enigma Slack (particularly in the #dev channel). Stop by, say hi, introduce yourself and ask questions to get you started.

Welcome onboard, and happy trading!