We have made available a free open source BitBank dataset hosted on Google BigQuery

Ill go over how to query the dataset to gain insights into the cryptocurrency markets.

The data is described more in our API docs and Bulk Data docs

Volatility!

Cryptocurrencies ordered by having the most volatile 120 minutes

select currency_pair, MAX(float(future_wstd_120) / ((best_bid_price + best_Ask_price) / 2.0) * 100.0) as wstd_percent from [bitbank-nz:BitBanknz_cryptocurrency_forecasts.featuresets] where has_future_data='1' group by currency_pair order by wstd_percent desc

most volatile markets

SELECT STDDEV( (best_bid_price + best_Ask_price) / 2.0) as std_of_midpoint_price, currency_pair, avg( (best_bid_price + best_Ask_price) / 2.0) as avg_of_midpoint_price, (STDDEV( (best_bid_price + best_Ask_price) / 2.0) / avg( (best_bid_price + best_Ask_price) / 2.0)) as volatility FROM [bitbank-nz:BitBanknz_cryptocurrency_forecasts.featuresets] group by currency_pair order by volatility desc LIMIT 1000

It appears BTC_OMNI is a volatile pair again over this time period with stddev: 1% over this day

Correlation!

Checking how correlated price move is with future price movements

SELECT CORR(wavg_distance_to_midpoint_percent5min, (float(future_wavg_5) / ((best_bid_price + best_Ask_price) / 2.0))) as price_correlation5, CORR(wavg_distance_to_midpoint_percent5min, (float(future_wavg_30) / ((best_bid_price + best_Ask_price) / 2.0))) as price_correlation30, CORR(wavg_distance_to_midpoint_percent5min, (float(future_wavg_60) / ((best_bid_price + best_Ask_price) / 2.0))) as price_correlation60, CORR(wavg_distance_to_midpoint_percent5min, (float(future_wavg_120) / ((best_bid_price + best_Ask_price) / 2.0))) as price_correlation120 FROM [bitbank-nz:BitBanknz_cryptocurrency_forecasts.featuresets] where has_future_data='1' LIMIT 1000

results:

price_correlation5 price_correlation30 price_correlation60 price_correlation120 0.31553591396391 0.03814593626787945 0.04183696129029081 0.03856468233799804

This means the change in 5 minute price is related to the next 5 minutes in price but not so much the next 5-30, 30-60 or 60-120 minutes.

Price change in 30 minutes correlation to future price:

SELECT CORR(wavg_distance_to_midpoint_percent30min, (float(future_wavg_5) / ((best_bid_price + best_Ask_price) / 2.0))) as price_correlation5, CORR(wavg_distance_to_midpoint_percent30min, (float(future_wavg_30) / ((best_bid_price + best_Ask_price) / 2.0))) as price_correlation30, CORR(wavg_distance_to_midpoint_percent30min, (float(future_wavg_60) / ((best_bid_price + best_Ask_price) / 2.0))) as price_correlation60, CORR(wavg_distance_to_midpoint_percent30min, (float(future_wavg_120) / ((best_bid_price + best_Ask_price) / 2.0))) as price_correlation120 FROM [bitbank-nz:BitBanknz_cryptocurrency_forecasts.featuresets] where has_future_data='1' LIMIT 1000

results:

price_correlation5 price_correlation30 price_correlation60 price_correlation120 0.7973442128 0.03708915493 0.0720097425 0.07091767196

This means price changes in the last 30 minutes are strongly correlated with the next 5 minutes and still not much but more so with the 30-60 and 60-120 minute prices.

How correlated are BitBank.nz forecasts to what actually happened

SELECT CORR( estimated_future_wavg_5 , (float(future_wavg_5) / ((best_bid_price + best_Ask_price) / 2.0))) as price_correlation5, CORR( estimated_future_wavg_30 , (float(future_wavg_30) / ((best_bid_price + best_Ask_price) / 2.0))) as price_correlation30, CORR( estimated_future_wavg_60 , (float(future_wavg_60) / ((best_bid_price + best_Ask_price) / 2.0))) as price_correlation60, CORR( estimated_future_wavg_120 , (float(future_wavg_120) / ((best_bid_price + best_Ask_price) / 2.0))) as price_correlation120 FROM [bitbank-nz:BitBanknz_cryptocurrency_forecasts.featuresets] where has_future_data='1' LIMIT 1000

price_correlation5 price_correlation30 price_correlation60 price_correlation120 0.3771947898 0.1059980775 0.08641589135 0.0694471175

Although correlated this highlights there's much more work to be done optimising training and denoising our forecasts to more reliably correlate with the actual market.

BEGINRANT;

Given the amount of training and work that's gone into our algorithm it also shows how its hard to reliably predict what the market will do, especially further into the future. A common trap is to trust companies that occasionally forecast 7+ days into the future, unfortunately the compounding uncertainty can mean they are extremely inaccurate and with a lack of data on how often their forecasts are correct they lack the tools and transparency to build trust and self improve.

I do believe it is possible for high frequency trading bots with up to date forecasts to get things right on average but they don't need very long term forecasts.

ENDRANT;

Orderbook imbalance

SELECT CORR( power_imbalance , (float(future_wavg_5) / ((best_bid_price + best_Ask_price) / 2.0))) as price_correlation5, CORR( power_imbalance , (float(future_wavg_30) / ((best_bid_price + best_Ask_price) / 2.0))) as price_correlation30, CORR( power_imbalance , (float(future_wavg_60) / ((best_bid_price + best_Ask_price) / 2.0))) as price_correlation60, CORR( power_imbalance , (float(future_wavg_120) / ((best_bid_price + best_Ask_price) / 2.0))) as price_correlation120 FROM [bitbank-nz:BitBanknz_cryptocurrency_forecasts.featuresets] where has_future_data='1' LIMIT 1000

price_correlation5 price_correlation30 price_correlation60 price_correlation120 0.1668656568 0.1486553967 0.1342865412 0.1274909794

Orderbook buy pressure is correlated with upward price movement, a correlation that holds well with time

Backtesting trading strategy

To summarise our forecasts, price movement and power_imbalance are all correlated with upward price movement, so can we simply buy when they are all up and sell when they are all down?

SELECT (best_bid_price + best_Ask_price) / 2.0 as midpoint, date, best_bid_price, best_Ask_price, (wavg_distance_to_midpoint_percent60min < 0 and wavg_distance_to_midpoint_percent5min < 0 and wavg_distance_to_midpoint_percent30min < 0 and power_imbalance < 1 and estimated_future_wavg_5 < 1 and estimated_future_wavg_30 < 1 and estimated_future_wavg_60 < 1 and estimated_future_wavg_120 < 1 ) as should_sell FROM [bitbank-nz:BitBanknz_cryptocurrency_forecasts.featuresets] where has_future_data='1' and currency_pair='BTC_ETH' and ((wavg_distance_to_midpoint_percent60min > 0 and wavg_distance_to_midpoint_percent5min > 0 and wavg_distance_to_midpoint_percent30min > 0 and power_imbalance > 1 and estimated_future_wavg_5 > 1 and estimated_future_wavg_30 > 1 and estimated_future_wavg_60 > 1 and estimated_future_wavg_120 > 1 ) or (wavg_distance_to_midpoint_percent60min < 0 and wavg_distance_to_midpoint_percent5min < 0 and wavg_distance_to_midpoint_percent30min < 0 and power_imbalance < 1 and estimated_future_wavg_5 < 1 and estimated_future_wavg_30 < 1 and estimated_future_wavg_60 < 1 and estimated_future_wavg_120 < 1 )) order by date LIMIT 1000

Putting the cryptourrency trade data in a spreadsheet to analyse the selected points to buy/sell at shows the strategy does quite well.

Trading in the BTC_ETH pair the algorithm performs quite well, assuming it trades at the midpoint it turns 1 btc into 1.17 (assuming no fees) despite the overall price in ethereum going down overall in the time period, the fees would be around .17 unfortunately if there was a .15% maker fee on the 106 trades made

As this result/algorithm was developed on the assumptions of the correlations computed on this data, we will need to do another test on another dataset to verify its accuracy (we may have over-fit to this dataset when we developed this algorithm)

Putting the algorithm into production shows that its very promising :) Its been running on BitMex for nearly 1 day now having earned ~6% profit.

The actual algorithm running in production is available on the BitBank.nz Github page Youll need to signup for a free 1 day trial at BitBank.nz.

Let me know if you'd like access to run queries on larger datasets on BigQuery

Also checkout our referral program to earn .003 BTC per paying user!

For more on BigQuery also see a previous post i writ on my personal blog how.nz about Analysing App Engine logs with Google BigQuery and the bitcoin dataset on BigQuery

bonus points (.003 btc and a free BitBank.nz year) for someone who can express the spreadsheet part of the equation as a BigQuery Aggregate Stateful User Defined Function, because SQL is hard, also much more to anyone who develops something like a backtesting dashboard using the BigQuery API because that would be amazing.

Eventually we will be able to integrate tools like this into the BitBank.nz core product which will make this strategy experimentation process super smooth and easy, keep an eye out!