So this is really a use case to unleash the power of Machine Learning. How to leverage it?

Here is a typical workflow for a trading system using supervised learning:

Data

Get the data in place. Good sources for financial time series are the API of the exchange you want to trade on, the APIs of AlphaVantage or Quandl. The scale of the data should at least be as fine as the scale you want to model and ultimately predict. What is your forecast horizon? Longer-term horizons will require additional input factors like market publications, policy outlooks, sentiment analysis of twitter revelations etc. If you are in for the game of short-term or even high-frequency trading based on pure market signals from tick data, you might want to include rolling averages of various lengths to provide your model with historical context and trends, especially if your learning algorithm does not have explicit memory cells like Recurrent Neural Networks or LSTMs. All common indicators used in technical analysis (eg RSI, ADX, Bollinger Bands, MACD) are based on some sort of moving averages of some quantity (price, trading volume) — even if you don’t believe in simplistic trading rules, including them will help the model to reflect trading behaviour of a majority of market participants. Your computational capacity might be a limiting factor, especially in a context where your ML model will be up against hard-coded, fast and unique-purpose algorithms of market-making or arbitrage seekers. Deploying dedicated cloud servers or ML platforms like H2O and TensorFlow allows you to spread computation over various servers. Clean the data (how do you interpolate gaps?), chart it, play with it — do you already spot trading opportunities, trends, anomalies?

2. Supervised Model Training

Split your data into complementary sets for training, validation (for parameter tuning, feature selection etc) and testing. This is actually more complex than it sounds: optimally, the test set should be as ‘similar’ as possible to the present ‘state of the market’, and both validation and test set should follow the same distribution. Otherwise you might waste effort tuning the model parameters on the validation set only to find that it poorly generalizes to the test set. Following the concept of ‘market regimes’ — ie extended periods where a specific combination of commodities dominates the price dynamics of your target instrument — it might be worthwhile to first have a clustering algorithm of unsupervised learning discover defining correlations in the data and then evaluate model performance on data in the validation and test set belonging to the same clusters (see Figure 3 — in this project, clustering increased predictive performance by 8%).

Figure 3 Coherent market periods as identified by a clustering algorithm (colored segments of EUA settle price)

Early on, decide on and establish a single-number evaluation metric. Chasing too many different metrics will only lead to confusion. In the context of algorithmic trading, a suitable measure is ‘Profit and Loss’ (PnL) as it weights classification precision (price up/down) with the actual size of the swing (‘relevance’). And it fits with the metrics you may consider for your Trading Policy. Observe the model performance on training and validation set. If error on the training set, ie ‘model bias’, is high, you may need to allow for more model parameters (eg by adding more layers/neurons in a deep learning model). If the model poorly generalizes (‘the model is overfitting to the training set’), that is performance difference on validation and training set (‘model variance’) is high, you may need to add more data to the training set, reduce the number of features to the most relevant ones, add regularization (eg L2, L1 or dropout) or early stopping (in the gradient descent optimization). Examining closely the cases where the model went wrong will help to identify any potential and avoidable model bias, see Figure 4.

Figure 4 Error analysis — price move versus forecast confidence (>0.5: up, <0.5: down)

Establish your target performance: for market forecasts, a classification precision of 75% is actually quite good — it is 50% better than random guessing (50% precision). This baseline is very different to other ML applications like object or speech recognition which operate in a closed environment where the factors affecting the modelling target can be clearly identified (the RGB channels of image pixels, the wave frequencies of sound samples).

3. Trading Policy

Define your trading policy: a set of rules defining the concrete trading implications of the model outputs: eg depending on a threshold for the model confidence of a given prediction, what position do you place on the market, what position size, for how long do you hold a position in the given state of the market etc. A policy usually comes with some more free parameters which need to be optimized (next step). In the context of supervised learning discussed here, this is a fairly manual process based on backtesting and grid search (some shortcomings outlined below).

4. Backtesting & Optimization

Now it gets down to the numbers — how well is your trading system, or the interplay of prediction models and a given trading policy, performing on a hold-out set of historical market data? Here the test set used in step 2 (model training) can become the validation set for tuning the parameters of the policy. Genetic algorithms allow you to explore the policy space, starting from a first generation of say 100 randomly chosen policy parameters, iteratively eliminating the 80 worst performers and making the 20 survivors produce 4 offspring each. Or you can employ a grid search in the multidimensional parameter space: starting from some plausible values for the parameters of the policy, what is the best-performing setting you can achieve by varying the parameter values one-by-one. Your performance metric here is the one you finally aim to optimize in your trading strategy, eg the PnL or some derived quantity like Return on Investment, SharpeRatio (the return per volatility risk), Value at Risk, the beta etc, see Figure 5.

Figure 5 PnL and Sharpe Ratio for various trading policies

A good measure to prevent overfitting the parameters to the validation set is a cross-validation with a ‘walk-forward-test’ (WTF) verifying the robustness of your approach: optimize the policy parameters on a validation segment, test them forward in time on data following the validation segment, shift the validation segment forward to include that test data, repeat. The basic assumption here is that the recent past is a better gauge for the future than the more distant past.

5. Simulation & Live Trading

Before your strategy goes live, freeze all system parameters and test in real-time as if actually placing your orders according to the outputs of your trading algorithm. This important step is called paper trading and is the crucial litmus test for the validity of your approach. You might notice here that in your historical data you have actually used values which are not really available at a given time, eg when calculating moving averages. If your strategy still looks promising, congratulations — it’s time to go live! While you might start by placing your orders manually, do not underestimate both the administrative and technical efforts it takes to integrate your strategy with the API of your exchange.