Introduction to Algorithmic Trading

Algorithmic trading has been around for decades and has, for the most part, enjoyed a fair amount of success in its varied forms. Traditionally, algorithmic trading involves selecting trading rules that are carefully designed, optimized, and tested by humans. While these strategies have the advantage of being systematic and able to operate at speeds and frequencies beyond human traders, they are susceptible to all kinds of selection biases and are unable to adapt to changing market conditions.

Reinforcement learning (RL) on the other hand, is much more "hands off." In RL, an “agent” simply aims to maximize its reward in any given environment and tries to improve its decision making through trial and error as it experiences more examples. It can also learn to make decisions based not only on its beliefs of the environment one step ahead but on how the market plays out farther down the road. In most traditional trading algorithms, there are separate processes for prediction, turning that prediction into an action, and determining the frequency of the action based on transaction costs. RL supports an approach that integrates these processes. For all these reasons, RL may discover actions that humans normally would not find.

As a proof of concept, we designed and implemented a trading system for bitcoins as trade data is readily available. To evaluate the efficacy of our reinforcement learning agent, we compare the out of sample investment performance against a buy and hold strategy and a momentum strategy. We believe this framework could be easily expanded and could also be applied to other investment assets.

Reinforcement Learning Basics

Reinforcement learning is appropriate when the state space (the quantitative description of the environment) is large or even continuous. It may be especially useful when it is impractical to obtain labels for supervised learning. Trading is a good example of this where the correct actions aren’t known and even if they were, would be nearly impossible to apply to every situation in which the agent has to act. RL is also appropriate when, as in trading, the actions have long term consequences and rewards may be delayed.

The essential ingredients to reinforcement learning are states, actions, rewards, and an action selection policy. In a given problem, an agent is supposed to select the best action given its current state. This action produces an observation of the new state as well as a reward, and this is repeated in what is known as a Markov Decision Process. In order for agent to learn its behavior or policy, the reward feedback for this sequence of actions is used to tune the parameters of the model.

There are two main ways of formulating the problem: value based and policy based. In a value based approach, the value of each state or state-action pair is estimated. The policy is generated by accurately estimating these values and then selecting the action with the highest value. In a policy based approach, which is our chosen method, we directly parametrize the policy and then find the parameters that maximizes expected rewards.

Setup

We downloaded price and respective volume for each transaction, from GDAX exchange (formerly Coinbase exchange) from December 1, 2014 to June 14, 2017 which we aggregated into 15 minute candles (or intervals). We then split this into a 70%/30% train/test set.

Each 15 minute candle is one step and an episode is defined as 96 steps or roughly 1 day of trading. During training, a random block of 96 contiguous candles is selected to be played as an episode and a random number of bitcoins between 0 and 4 is selected to start the sequence. The agent makes a decision to buy, sell, or hold at each step subject to an lower/upper limit of 0 and 4 bitcoins respectively. The bitcoin holdings at each step are calculated, as well as the returns based on those holdings. Returns is calculated as number of bitcoins*[p(t)/p(t-1)-1]. At the end of each episode, we collect all the inputs, actions taken, and returns.

In order for our RL agent to learn a proper policy, it needs inputs that are representative of the state of the market and are somewhat predictive in aggregate. We use 18 different technical indicators that express where the current price and volume is in relation to its past history, along with 5 state variables which represent the 5 possible bitcoin holdings between 0 and 4 bitcoins.



The indicators used are fairly generic momentum/reversion type signals and their details are provided in Table 1. As an illustrative example of how these indicators might work, the agent may learn that rising prices along with steady volume is a bullish sign and adjust its weights so that it has a higher tendency to buy more bitcoins.