Note from Towards Data Science’s editors: While we allow independent authors to publish articles in accordance with our rules and guidelines, we do not endorse each author’s contribution. You should not rely on an author’s works without seeking professional advice. See our Reader Terms for details.

Winning high stakes poker tournaments, out-playing world-class StarCraft players, and autonomously driving Tesla’s futuristic sports cars. What do they all have in common? Each of these extremely complex tasks were long thought to be impossible by machines, until recent advancements in deep reinforcement learning showed they were possible, today.

Reinforcement learning is beginning to take over the world.

A little over two months ago, I decided I wanted to take part in the revolution, so I set out on a journey to create a profitable Bitcoin trading strategy using state-of-the-art deep reinforcement learning algorithms. While I made quite a bit of progress on that front, I realized that the tooling for this sort of project can be quite daunting to wrap your head around, and as such, it is very easy to get lost in the details.

In between optimizing my previous project for distributed high-performance computing (HPC) systems; getting lost in endless pipelines of data and feature optimizations; and running my head in circles around efficient model set-up, tuning, training, and evaluation; I realized that there had to be a better way of doing things. After countless hours of researching existing projects, spending endless nights watching PyData conference talks, and having many back-and-forth conversations with the hundreds of members of the RL trading Discord community, I realized there weren’t any existing solutions that were all that good.

There were many bits and pieces of great reinforcement learning trading systems spread across the inter-webs, but nothing solid and complete. For this reason, I’ve decided to create an open source Python framework for getting any trading strategy from idea to production, efficiently, using deep reinforcement learning.

Enter TensorTrade. The idea was to create a highly modular framework for building efficient reinforcement learning trading strategies in a composable, maintainable way. Sounds like a mouthful of buzz-words if you ask me, so let’s get into the meat.

Table of Contents

Overview

TensorTrade Components

Putting it All Together

The Future

Overview

TensorTrade is an open source Python framework for training, evaluating, and deploying robust trading strategies using deep reinforcement learning. The framework focuses on being highly composable and extensible, to allow the system to scale from simple trading strategies on a single CPU, to complex investment strategies run on a distribution of HPC machines.

Under the hood, the framework uses many of the APIs from existing machine learning libraries to maintain high quality data pipelines and learning models. One of the main goals of TensorTrade is to enable fast experimentation with algorithmic trading strategies, by leveraging the existing tools and pipelines provided by numpy , pandas , gym , keras , and tensorflow .

Every piece of the framework is split up into re-usable components, allowing you to take advantage of the general use components built by the community, while keeping your proprietary features private. The aim is to simplify the process of testing and deploying robust trading agents using deep reinforcement learning, to allow you and I to focus on creating profitable strategies.

RL Primer

In case your reinforcement learning chops are a bit rusty, let’s quickly go over the basic concepts.

Reinforcement learning (RL) is an area of machine learning concerned with how software agents ought to take actions in an environment so as to maximize some notion of cumulative reward.

Every reinforcement learning problem starts out with an environment and one or more agents that can interact with the environment.

This technique is based off Markov Decision Processes (MDP) dating back to the 1950s.

The agent will first observe the environment, then build a model of the current state and the expected value of actions within that environment. Based on that model, the agent will then take the action it has deemed as having the highest expected value.

Based on the effects of the chosen action within the environment, the agent will be rewarded by an amount corresponding to the actual value of that action. The reinforcement learning agent can then, through the process of trial and error (i.e. learning through reinforcement), improve its underlying model and learn to take more rewarding actions over time.

If you still need a bit of refreshment on the subject, there is a link to an article titled Introduction to Deep Reinforcement Learning in the references for this article, which goes much more in-depth into the details. Let’s move on.

Getting Started

The following tutorial should provide enough examples to get you started with creating simple trading strategies using TensorTrade, although you will quickly see the framework is capable of handling much more complex configurations.

You can follow along with Google Colab or the tutorial on Github.

Installation

TensorTrade requires Python 3.6 or later, so make sure you’re using a valid version before pip installing the framework.

pip install git+https://github.com/notadamking/tensortrade.git

To follow this entire tutorial, you will need to install some extra dependencies, such as tensorflow , tensorforce , stable-baselines , ccxt , ta , and stochastic .

That’s all the installation necessary! Let’s get into the code.

TensorTrade Components

TensorTrade is built around modular components that together make up a trading strategy. Trading strategies combine reinforcement learning agents with composable trading logic in the form of a gym environment. A trading environment is made up of a set of modular components that can be mixed and matched to create highly diverse trading and investment strategies. I will explain this in further detail later, but for now it is enough to know the basics.

Just like electrical components, the purpose of TensorTrade components is to be able to mix and match them as necessary.

The code snippets in this section should serve as guidelines for creating new strategies and components. There will likely be missing implementation details that will become more clear in a later section, as more components are defined.

Trading Environment

A trading environment is a reinforcement learning environment that follows OpenAI’s gym.Env specification. This allows us to leverage many of the existing reinforcement learning models in our trading agent, if we’d like.

Trading environments are fully configurable gym environments with highly composable Exchange , FeaturePipeline , ActionScheme , and RewardScheme components.

The Exchange provides observations to the environment and executes the agent’s trades.

provides observations to the environment and executes the agent’s trades. The FeaturePipeline optionally transforms the exchange output into a more meaningful set of features before it is passed to the agent.

optionally transforms the exchange output into a more meaningful set of features before it is passed to the agent. The ActionScheme converts the agent’s actions into executable trades.

converts the agent’s actions into executable trades. The RewardScheme calculates the reward for each time step based on the agent’s performance.

If it seems a bit complicated now, it’s really not. That’s all there is to it, now it’s just a matter of composing each of these components into a complete environment.

When the reset method of a TradingEnvironment is called, all of the child components will also be reset. The internal state of each exchange, feature pipeline, transformer, action scheme, and reward scheme will be set back to their default values, ready for the next episode.

Let’s begin with an example environment. As mentioned before, initializing a TradingEnvironment requires an exchange, an action scheme, and a reward scheme, the feature pipeline is optional.

from tensortrade.environments import TradingEnvironment environment = TradingEnvironment(exchange=exchange,

action_scheme=action_scheme,

reward_scheme=reward_scheme,

feature_pipeline=feature_pipeline)

While the recommended use case is to plug a trading environment into a trading strategy, you can obviously use the trading environment separately, any way you’d otherwise use a gym environment.

Exchanges

Exchanges determine the universe of tradable instruments within a trading environment, return observations to the environment on each time step, and execute trades made within the environment. There are two types of exchanges: live and simulated.

Live exchanges are implementations of Exchange backed by live pricing data and a live trade execution engine. For example, CCXTExchange is a live exchange, which is capable of returning pricing data and executing trades on hundreds of live cryptocurrency exchanges, such as Binance and Coinbase.

import ccxt from tensortrade.exchanges.live import CCXTExchange coinbase = ccxt.coinbasepro() exchange = CCXTExchange(exchange=coinbase, base_instrument='USD')

There are also exchanges for stock and ETF trading, such as RobinhoodExchange and InteractiveBrokersExchange , but these are still works in progress.

Simulated exchanges, on the other hand, are implementations of Exchange backed by simulated pricing data and trade execution.

For example, FBMExchange is a simulated exchange, which generates pricing and volume data using fractional brownian motion (FBM). Since its price is simulated, the trades it executes must be simulated as well. The exchange uses a simple slippage model to simulate price and volume slippage on trades, though like almost everything in TensorTrade, this slippage model can easily be swapped out for something more complex.

from tensortrade.exchanges.simulated import FBMExchange exchange = FBMExchange(base_instrument='BTC', timeframe='1h')

Though the FBMExchange generates fake price and volume data using a stochastic model, it is simply an implementation of SimulatedExchange . Under the hood, SimulatedExchange only requires a data_frame of price history to generate its simulations. This data_frame can either be provided by a coded implementation such as FBMExchange , or at runtime such as in the following example.

import pandas as pd from tensortrade.exchanges.simulated import SimulatedExchange df = pd.read_csv('./data/btc_ohclv_1h.csv') exchange = SimulatedExchange(data_frame=df, base_instrument='USD')

Feature Pipelines

Feature pipelines are meant for transforming observations from the environment into meaningful features for an agent to learn from. If a pipeline has been added to a particular exchange, then observations will be passed through the FeaturePipeline before being output to the environment. For example, a feature pipeline could normalize all price values, make a time series stationary, add a moving average column, and remove an unnecessary column, all before the observation is returned to the agent.

Feature pipelines can be initialized with an arbitrary number of comma-separated transformers. Each FeatureTransformer needs to be initialized with the set of columns to transform, or if nothing is passed, all input columns will be transformed.

Each feature transformer has a transform method, which will transform a single observation (a pandas.DataFrame ) from a larger data set, keeping any necessary state in memory to transform the next frame. For this reason, it is often necessary to reset the FeatureTransformer periodically. This is done automatically each time the parent FeaturePipeline or Exchange is reset.

Let’s create an example pipeline and add it to our existing exchange.

from tensortrade.features import FeaturePipeline

from tensortrade.features.scalers import MinMaxNormalizer

from tensortrade.features.stationarity import FractionalDifference

from tensortrade.features.indicators import SimpleMovingAverage price_columns = ["open", "high", "low", "close"] normalize_price = MinMaxNormalizer(price_columns)

moving_averages = SimpleMovingAverage(price_columns)

difference_all = FractionalDifference(difference_order=0.6) feature_pipeline = FeaturePipeline(steps=[normalize_price,

moving_averages,

difference_all]) exchange.feature_pipeline = feature_pipeline

This feature pipeline normalizes the price values between 0 and 1, before adding some moving average columns and making the entire time series stationary by fractionally differencing consecutive values.

Action Schemes

Action schemes define the action space of the environment and convert an agent’s actions into executable trades. For example, if we were using a discrete action space of 3 actions (0 = hold , 1 = buy 100% , 2 = sell 100% ), our learning agent does not need to know that returning an action of 1 is equivalent to buying an instrument. Rather, our agent needs to know the reward for returning an action of 1 in specific circumstances, and can leave the implementation details of converting actions to trades to the ActionScheme .

Each action scheme has a get_trade method, which will transform the agent’s specified action into an executable Trade . It is often necessary to store additional state within the scheme, for example to keep track of the currently traded position. This state should be reset each time the action scheme’s reset method is called, which is done automatically when the parent TradingEnvironment is reset.

from tensortrade.actions import DiscreteActions action_scheme = DiscreteActions(n_actions=20,

instrument_symbol='BTC')

This discrete action scheme uses 20 discrete actions, which equates to 4 discrete amounts for each of the 5 trade types (market buy/sell, limit buy/sell, and hold). E.g. [0,5,10,15]= hold , 1= market buy 25% , 2= market sell 25% , 3= limit buy 25% , 4= limit sell 25% , 6= market buy 50% , 7= market sell 50% , etc…

Reward Schemes

Reward schemes receive the trade taken at each time step and return a float , corresponding to the benefit of that specific action. For example, if the action taken this step was a sell that resulted in positive profits, our RewardScheme could return a positive number to encourage more trades like this. On the other hand, if the action was a sell that resulted in a loss, the scheme could return a negative reward to teach the agent not to make similar actions in the future.

A version of this example algorithm is implemented in the SimpleProfit component, however more complex strategies can obviously be used instead.

Each reward scheme has a get_reward method, which takes in the trade executed at each time step and returns a float corresponding to the value of that action. As with action schemes, it is often necessary to store additional state within a reward scheme for various reasons. This state should be reset each time the reward scheme’s reset method is called, which is done automatically when the parent TradingEnvironment is reset.

from tensortrade.rewards import SimpleProfit reward_scheme = SimpleProfit()

The simple profit scheme returns a reward of -1 for not holding a trade, 1 for holding a trade, 2 for purchasing an instrument, and a value corresponding to the (positive/negative) profit earned by a trade if an instrument was sold.

Learning Agents

Up until this point, we haven’t seen the “deep” part of the deep reinforcement learning framework. This is where learning agents come in. Learning agents are where the math (read: magic) happens.

At each time step, the agent takes the observation from the environment as input, runs it through its underlying model (a neural network most of the time), and outputs the action to take. For example, the observation might be the previous open, high, low, and close price from the exchange. The learning model would take these values as input and output a value corresponding to the action to take, such as buy, sell, or hold.

It is important to remember the learning model has no intuition of the prices or trades being represented by these values. Rather, the model is simply learning which values to output for specific input values or sequences of input values, to earn the highest reward.

In this example, we will be using the Stable Baselines library to provide learning agents to our trading strategy, however, the TensorTrade framework is compatible with many reinforcement learning libraries such as Tensorforce, Ray’s RLLib, OpenAI’s Baselines, Intel’s Coach, or anything from the TensorFlow line such as TF Agents.

It is possible that custom TensorTrade learning agents will be added to this framework in the future, though it will always be a goal of the framework to be interoperable with as many existing reinforcement learning libraries as possible, since there is so much concurrent growth in the space.

But for now, Stable Baselines is simple and powerful enough for our needs.

from stable_baselines.common.policies import MlpLnLstmPolicy

from stable_baselines import PPO2 model = PPO2

policy = MlpLnLstmPolicy

params = { "learning_rate": 1e-5 } agent = model(policy, environment, model_kwargs=params)

Note: Stable Baselines is not required to use TensorTrade though it is required for this tutorial. This example uses a GPU-enabled Proximal Policy Optimization model with a layer-normalized LSTM perceptron network. If you would like to know more about Stable Baselines, you can view the Documentation.

Tensorforce

I will also quickly cover the Tensorforce library to show how simple it is to switch between reinforcement learning frameworks.

from tensorforce.agents import Agent agent_spec = {

"type": "ppo_agent",

"step_optimizer": {

"type": "adam",

"learning_rate": 1e-4

},

"discount": 0.99,

"likelihood_ratio_clipping": 0.2,

} network_spec = [

dict(type='dense', size=64, activation="tanh"),

dict(type='dense', size=32, activation="tanh")

] agent = Agent.from_spec(spec=agent_spec,

kwargs=dict(network=network_spec,

states=environment.states,

actions=environment.actions))

If you would like to know more about Tensorforce agents, you can view the Documentation.

Trading Strategy

A TradingStrategy consists of a learning agent and one or more trading environments to tune, train, and evaluate on. If only one environment is provided, it will be used for tuning, training, and evaluating. Otherwise, a separate environment may be provided at each step.

from tensortrade.strategies import TensorforceTradingStrategy,

StableBaselinesTradingStrategy a_strategy = TensorforceTradingStrategy(environment=environment,

agent_spec=agent_spec,

network_spec=network_spec) b_strategy = StableBaselinesTradingStrategy(environment=environment,

model=PPO2,

policy=MlpLnLSTMPolicy)

Don’t worry if you don’t understand the strategy initialization just yet, it will be explained in more detail later.

Putting it All Together

Now that we know about each component that makes up a TradingStrategy , let’s build and evaluate one.

For a quick recap, a TradingStrategy is made up of a TradingEnvironment and a learning agent. A TradingEnvironment is a gym environment that takes an Exchange , an ActionScheme , a RewardScheme , and an optional FeaturePipeline , and returns observations and rewards that the learning agent can be trained and evaluated on.

Creating an Environment

The first step is to create a TradingEnvironment using the components outlined above.

from tensortrade.exchanges.simulated import FBMExchange

from tensortrade.features.scalers import MinMaxNormalizer

from tensortrade.features.stationarity import FractionalDifference

from tensortrade.features import FeaturePipeline

from tensortrade.rewards import SimpleProfit

from tensortrade.actions import DiscreteActions

from tensortrade.environments import TradingEnvironment normalize_price = MinMaxNormalizer(["open", "high", "low", "close"])

difference = FractionalDifference(difference_order=0.6)

feature_pipeline = FeaturePipeline(steps=[normalize_price,

difference]) exchange = FBMExchange(timeframe='1h',

base_instrument='BTC',

feature_pipeline=feature_pipeline) reward_scheme = SimpleProfit() action_scheme = DiscreteActions(n_actions=20,

instrument_symbol='ETH/BTC') environment = TradingEnvironment(exchange=exchange,

action_scheme=action_scheme,

reward_scheme=reward_scheme,

feature_pipeline=feature_pipeline)

Simple enough, now environment is a gym environment that can be used by any compatible trading strategy or learning agent.

Defining the Agent

Now that the environment is set up, it’s time to create our learning agent. Again, we will be using Stable Baselines for this, but feel free to drop in any other reinforcement learning agent here.

Since we are using StableBaselinesTradingStrategy , all we need to do is provide a model type and a policy type for the underlying neural network to be trained. For this example, we will be using a simple proximal policy optimization (PPO) model and a layer-normalized LSTM policy network.

For more examples of model and policy specifications, see the Stable Baselines Documentation.

from stable_baselines.common.policies import MlpLnLstmPolicy

from stable_baselines import PPO2 model = PPO2

policy = MlpLnLstmPolicy

params = { "learning_rate": 1e-5 }

Training a Strategy

Creating our trading strategy is as simple as plugging in our agent and the environment.

from tensortrade.strategies import StableBaselinesTradingStrategy strategy = StableBaselinesTradingStrategy(environment=environment,

model=model,

policy=policy,

model_kwargs=params)

Then to train the strategy (i.e. train the agent on the current environment), all we need to do is call strategy.run() with the total number of steps or episodes you’d like to run.

performance = strategy.run(steps=100000,

episode_callback=stop_early_callback)

And voila! Three hours and thousands of print statements later, you will see the results of how your agent has done!

If this feedback loop is a bit slow for you, you can pass a callback function to run , which will be called at the end of each episode. The callback function will pass in a data frame containing the agent’s performance that episode, and expects a bool in return. If True , the agent will continue training, otherwise, the agent will stop and return its overall performance.

A simple performance output at the end of the episode, including the final 5 balances and net worths of the agent.

Saving and Restoring

All trading strategies are capable of saving their agent to a file, for later restoring. The environment is not saved, as it does not have state that we care about preserving. To save our TensorflowTradingStrategy to a file, we just need to provide the path of the file to our strategy.

strategy.save_agent(path="../agents/ppo_btc_1h")

To restore the agent from the file, we first need to instantiate our strategy, before calling restore_agent .

from tensortrade.strategies import StableBaselinesTradingStrategy strategy = StableBaselinesTradingStrategy(environment=environment,

model=model,

policy=policy,

model_kwargs=params) strategy.restore_agent(path="../agents/ppo_btc/1h")

Our strategy is now restored back to its previous state, and ready to be used again.

Tuning Your Strategy

Sometimes a trading strategy will require tuning a set of hyper-parameters, or features, on an environment to achieve maximum performance. In this case, each TradingStrategy provides an optionally implementable tune method.

Tuning a model is similar to training a model, however in addition to adjusting and saving the weights and biases of the best performing model, the strategy also adjusts and persists the hyper-parameters that produced that model.

from tensortrade.environments import TradingEnvironment

from tensortrade.exchanges.simulated import FBMExchange exchange = FBMExchange(timeframe='1h',

base_instrument='BTC',

feature_pipeline=feature_pipeline) environment = TradingEnvironment(exchange=exchange,

action_scheme=action_scheme,

reward_scheme=reward_scheme) strategy.environment = environment tuned_performance = strategy.tune(episodes=10)

In this case, the agent will be trained for 10 episodes, with a different set of hyper-parameters each episode. The best set will be saved within the strategy, and used any time strategy.run() is called thereafter.

Strategy Evaluation

Now that we’ve tuned and trained our agent, it’s time to see how well it performs. To evaluate our strategy’s performance on unseen data, we will need to run it on a new environment backed by such data.

from pandas import pd from tensortrade.environments import TradingEnvironment

from tensortrade.exchanges.simulated import SimulatedExchange df = pd.read_csv('./btc_ohlcv_1h.csv') exchange = SimulatedExchange(data_frame=df,

base_instrument='BTC',

feature_pipeline=feature_pipeline) environment = TradingEnvironment(exchange=exchange,

action_scheme=action_scheme,

reward_scheme=reward_scheme) strategy.environment = environment test_performance = strategy.run(episodes=1, testing=True)

When complete, strategy.run returns a Pandas data frame of the agent’s performance, including the net worth and balance of the agent at each time step.

Example evaluation performance — this agent has not been trained on this feature set, so performance is arbitrary.

Live Trading

Once you’ve built a profitable trading strategy, trained an agent to trade it properly, and ensured its “generalize-ability” to new data sets, all there is left to do is profit. Using a live exchange such as CCXTExchange , you can plug your strategy in and let it run!

While the gambler in you may enjoy starting a strategy and letting it run without bounds, the more risk averse of you can use a trade_callback , which will be called each time the strategy makes a trade. This callback function, similar to the episode callback, will pass in a data frame containing the agent’s overall performance, and expects a bool in return. If True , the agent will continue trading, otherwise, the agent will stop and return its performance over the session.

import ccxt from tensortrade.environments import TradingEnvironment

from tensortrade.strategies import StableBaselinesTradingStrategy

from tensortrade.exchanges.live import CCXTExchange coinbase = ccxt.coinbasepro(...) exchange = CCXTExchange(exchange=coinbase,

timeframe='1h',

base_instrument='USD',

feature_pipeline=feature_pipeline) environment = TradingEnvironment(exchange=exchange,

action_scheme=action_scheme,

reward_scheme=reward_scheme) strategy.environment = environment strategy.restore_agent(path="../agents/ppo_btc/1h") live_performance = strategy.run(steps=0, trade_callback=episode_cb)

Passing steps=0 instructs the strategy to run until otherwise stopped.

That’s all there is to it! As you can see, it is quite simple to build complex trading strategies using simple components and deep reinforcement learning. So what are you waiting for? Dive in, get your hands dirty, and see what’s possible using TensorTrade.

The Future

Currently, the framework is in its early stages. The focus so far has been to get a working prototype, with all of the necessary building blocks to create highly profitable strategies. The next step is to build a roadmap for the future, and decide which upcoming building blocks are important to the community.

Soon, we will see highly informative visualizations of the environments added to the framework, as well as much more in-depth strategies on more exchanges, trading more instruments.

An example environment visualization, created in a previous article.

The sky is the limit. The groundwork (i.e. framework) has been laid, it’s now up to the community to decide what’s next. I hope that you will be a part of it.

Final Thoughts

TensorTrade is a powerful framework capable of building highly modular, high performance trading systems. It is fairly simple and easy to experiment with new trading and investment strategies, while allowing you to leverage components from one strategy in another. But don’t take my word for it, create a strategy of your own and start teaching your robots to take over the world!

While this tutorial should be enough to get you started, there is still quite a lot more to learn if you want to create a profitable trading strategy. I encourage you to head over to the Github and dive into the codebase, or take a look at our documentation at tensortrade.org. There is also quite an active Discord community with nearly 1000 total members, so if you have questions, feedback, or feature requests, feel free to drop them there!

This is my GitHub commit history for the TensorTrade framework — you could say I’ve been busy.

I’ve gotten the project to a highly usable state. Though, my time is limited, and I believe there are many of you out there who could make valuable contributions to the open source codebase. So if you are a developer or data scientist with an interest in building state-of-the-art trading systems, I’d love to see you open a pull request, even if its just a simple test case!

Contributing

Others have asked how they can contribute to the project without writing code. There are currently three ways that you can do that.

Write code or documentation for the TensorTrade framework. Many issues on the Github are funded through Gitcoin smart contracts, so you can actually get paid to contribute. To date, almost 10 ETH (~$2000 USD), donated by the community has used to pay open source developers for their contributions to the framework. Fund this project with either Bitcoin or Ethereum. These donations are used to fund our Gitcoin smart contracts, which allows anyone who contributes quality code and documentation to get paid for their contributions. Sponsor me on Github. Github is currently matching all donations 1:1, up to $5,000, so there has never been a better time to sponsor my work and the TensorTrade’s development. All sponsorships go directly towards the funding of the open source development of the framework.

Thanks for reading! As always, all of the code for this tutorial can be found on my GitHub. Leave a comment below if you have any questions or feedback, I’d love to hear from you! I can also be reached on Twitter at @notadamking.

You can also sponsor me on Github Sponsors or Patreon via the links below.

References