Hi everyone again! It’s been a long since my last post about machine learning for algorithmic trading and I had some reasons for it. After I could show some rather successful results in forecasting assets prices using neural networks I received interesting offers from different institutions: starting from banks and ending family funds and individual traders that were expecting to have some fruitful earnings using AI. I already had some strategies properly backtested (as I thought), so I was happy to help them. Meanwhile I also talked to several experts in the area and studied some literature and things appeared to be not that easy as someone can think. Today I want to share with you lessons I’ve learnt from professionals, some literature and, the most important, own experience, that can potentially save you a lot of money if you’re trying to incorporate AI in your trading process.

First of all check my other articles please:

to be probably impressed with possibilities… and prepare to get discouraged :)

Current framework

As you can see, my whole blog is concentrated on solving the following machine learning problem: having a window of past N observations predict N+t one, where N can be 30 days of bars (or other variables as news headlines) and t is forecasting horizon (can vary depending on your strategy). What we have done correctly here?

Feature preparation and normalization

Indeed, we have prepared our windows correctly: no look-ahead bias, normalization done for each window separately (not to mix different regimes etc) Different task selection

We also have played with different forecasting objectives: on the high level they were both classification (binary movement “up” or “down”) and regression — returns forecasting, volatility forecasting etc. Backtesting

Training neural nets is cool, but it’s not the end of the process. We performed simple, but informative backtesting to show that if we would use these predictions to long/short assets we could make some profits (what’s important, better than some benchmark) Avoided classical overfitting

The main challenge in “normal” machine learning is to generalize our model to perform well on unseen data. To do this we use train/dev/test set splits of our initial data and do some cross validation to estimate how model behaves with changes of data distribution. We have shown, that neural networks can perform well on out of sample data (but wait for tricky details). On the image below is something what I call “classical” overfitting for financial time series, which looks as perfect forecast, but basically it just predicts the last observation :)

What’s missing?

Even the above described framework is generally correct, a lot of important parts are still missing, and they are actually crucial for profitable trading. I won’t talk about commission, shorting issues, liquidity, bet sizing or anything what is related actually to the strategy (I’ll touch some backtesting issues, but it’s needed for machine learning part). I will concentrate on problems in machine learning, dataset formation, evaluation of a model etc. It’s unbelievable coincidence, but most of the solutions to the problems (at least theoretical ones) I’ve found in the following book:

and I highly recommend it as well as second really useful source:

Bars data == weak data

Most of people use well known HOLC (high, open, low, close) prices for some time period and trades volume. This information reflects not enough information about the market and what its participants are doing. We really need bids and asks from the order book to trade well — it will give us “raw” information and allows to build better features like dollar volume, bid/ask spread and others.

You still can build candle bars from this, but you will also get access to queues and their length, detailed information on how many people want to buy or sell particular asset (not just plain volume), tick imbalances and a lot of other interesting features, that have much more signal than just averaged noise in the bars.

Concentration on a single asset

This mistake is really bad. Most of my articles concentrate on taking some single asset, learning to forecast it for some fixed horizon and backtest the long-short equity strategy. Maybe some individual traders with $10k in the pocket really do this — they build some indicator-based strategy for some currency pair and trade it. But if we think about it for a while, it looks a lot like overfitting for this particular asset! What’s the point of having a strategy that is overfitted for a single time series (and we even aren’t sure that it will perform same good in the past). Hedge funds never do this. They do trade in so called universe of assets (possible with the same strategy). Portfolio is balanced to short or long these assets, or if some strategy is used to trade them it’s expected to have good performance on all of them.

Moreover, when you compare your strategy performance to some benchmark (for example in case of crypto trading this benchmark can be HODL strategy) you’re interested in calculation of alpha (outperformance of a benchmark) and beta (strategy risk exposure).

Fixed sized horizon forecasting

When I was preparing a dataset to train a model, each pair {x_i, y_i} was a window of N past days and price change (or direction of price movement) in some time after last date in the historical window. Let’s think about it again. Some time after. Fixed time. Well, the word “fixed” in financial world is ridiculous. We can’t even be sure that in some time there will be bids or asks to perform the trade! This is very serious issue, that actually ruins all our forecasting framework. To be honest, I haven’t found any easy fix for this problem, just two, but they’re drastic. First solution is stop forecasting and start executing trades, which leads us to control theory and reinforcement learning immediately. It will help us to deal with any fixed time horizon (at least to some extent), but it’s a bit off the topic for now. Second option I found in a book and it’s pretty interesting.

Basically on the picture above you can see “non-fixed-time” creation of labels for the dataset. It’s called “triple barrier” and works like following: we build three barriers — one on the top, that will mean taking the profit, the bottom one as a stop loss and the last, vertical one, will mean some kind of expiration period. This labelling method allows to build much more flexible and realistic strategies based on predictions.

I.I.D.

If you have read before some statistics or ML theory texts you could see this three letters i.i.d., which means independent and identically distributed and is used to describe some random variables. It’s true in case of most of ML applications in CV, NLP, recommender systems, even some time series analysis and signal processing… But not in the case of financial time series! Look how we prepare the data:

for i in range(N):

x_i = features[i:i+WINDOW]

y_i = (close[i+WINDOW] - open[i+WINDOW]) / open[i+WINDOW]

while we are iterating on i, we roll on a time series with some step and it occurs, that different target ys aren’t actually independent! Corresponding xs have same features, same returns inside, just in different positions. And it actually violates all our ML framework. Solutions are rather difficult and what I’ve tried by myself was kind of ineffective — while working with minute bars I took only non-overlapping windows, and this data was kind of enough, but it definitely won’t be enough if you plan to work on larger timeframes. Some interesting, but rather advanced solutions I found in this amazing book again.

Validation set usefulness

When we talk about neural networks training with, let’s say, Keras, we used to pass to the fit() function such data samples as X_train, Y_train, X_test, Y_test, X_val, Y_val. Visually the can look like following (blue time series part is train set, orange is validation and green — test):