Python Backtesting Libraries For Quant Trading Strategies

(7 votes, average: 4.43 out of 5)

votes, average:out of 5)

Loading... Loading...

Written by Khang Nguyen Vo, khangvo88@gmail.com, for the RobustTechHouse (Mobile App Development Singapore) blog. Khang is a graduate from the Masters of Quantitative and Computational Finance Program, John Von Neumann Institute 2014. He is passionate about research in machine learning, predictive modeling and backtesting of trading strategies.

Frequently Mentioned Python Backtesting Libraries

It is essential to backtest quant trading strategies before trading them with real money. Here, we review frequently used Python backtesting libraries. We examine them in terms of flexibility (can be used for backtesting, paper-trading as well as live-trading), ease of use (good documentation, good structure) and scalability (speed, simplicity, and compatibility with other libraries).

Zipline: This is an event-driven backtesting framework used by Quantopian. Zipline has a great community, good documentation, great support for Interactive Broker (IB) and Pandas integration. The syntax is clear and easy to learn. It has a lot of examples. If your main goal for trading is US equity, then this framework might be the best candidate. Quantopian allows one to backtest, share, and discuss trading strategies in its community. However, in our experiment, Zipline is extremely slow. This is the biggest disadvantage of this library. Quantopian has some work-around such as running the Zipline library in parallel in the cloud. You can take a look at this post if this interests you. Zipline also seems to work poorly with local file and non-US data. It is difficult to use this framework for different financial asset classes.



PyAlgoTrade: This is another event-driven library which is active and supports backtesting, paper-trading and live-trading. It is well-documented and also supports TA-Lib integration (Technical Analysis library). It outperforms Zipline in terms of speed and flexibility. However, one big drawback of PyAlgoTrade is that it does not support Pandas-object and Pandas modules.

pybacktest: Vectorized backtesting framework in Python that is very simple and light-weight. This project seemed to be revived again recently on May 21st,2015.

TradingWithPython: Jev Kuznetsov extended the pybacktest library and build his own backtester. This library seems to updated recently in Feb 2015. However, the documentation and course for this library costs $395.

Some other projects: ultra-finance

Python Backtesting Libraries are summarized in the following table:

Zipline PyAlgoTrade TradingWithPython pybacktest Type Event-driven Event-driven Vectorized Vectorized Community Great Normal No No Cloud Quantopian No No No Interactive Broker support Yes No No No Data feed Yahoo, Google, NinjaTrader Yahoo, Google, NinjaTrader, Xignite, Bitstamp realtime feed Documentation Great Great $395 Poor Event profile Yes Yes Speed Slow Fast Pandas Supported Yes No Yes Yes Trading calendar Yes No No No TA-Lib support Yes Yes Yes Suitable for US-equity only Real trading

Paper-test trading Paper-test trading Paper-test trading

Zipline vs PyAlgoTrade Python Backtesting Libraries

We will focus on comparing the more popular Zipline and PyAlgoTrade Python Backtesting Libraries below.

1. Zipline:

The documentation could be found on http://www.zipline.io/tutorial/ and you can find some implementations on Quantopian. We do not go into detail of how to use this library here since the documentation is clear and concise. The sample script below just shows how this Python Backtesting library works for a simple strategy.

The syntax for zipline is very clear and simple and it is suitable for newbies so they can focus on the main trading algorithm strategy itself. Its other strengths include:

Good documentations, great community

IPython-compatible: support %%zipline

Input and output for zipline is based on Pandas DataFrame. This is a big advantage since Pandas is the biggest and easiest library to use for data analysis and modeling

Support slippage (or impact model, that means when you buy or sell, this action will impact the real price) and Commission model (the cost of transaction). Modeling makes trading strategies more realistic.

import pytz from datetime import datetime import zipline from zipline.api import order, record, symbol from zipline.algorithm import TradingAlgorithm from zipline.utils.factory import load_bars_from_yahoo # Load data manually from Yahoo! finance start = datetime(2000, 1, 1, 0, 0, 0, 0, pytz.utc) end = datetime(2012, 1, 1, 0, 0, 0, 0, pytz.utc) data = load_bars_from_yahoo(stocks=['AAPL'], start=start, end=end) print type(data["AAPL"]); print data["AAPL"] #this is create cache file for benchmarks. SHOULD ONLY RUN ONCE zipline.data.loader.dump_benchmarks('SPY') # Define algorithm def initialize(context): pass def handle_data(context, data): order(symbol('AAPL'), 10) record(AAPL=data[symbol('AAPL')].price) # Create algorithm object passing in initialize, handle_data functions algo_obj = TradingAlgorithm(initialize=initialize, handle_data=handle_data) import time start_time = time.time() #calculate the running time for i in xrange(10): perf_manual = algo_obj.run(data) print("--- %s seconds ---" % (time.time() - start_time))

This trading strategy is simple, we basically buy 10 shares in each iteration. Note that zipline allows negative cash, so the order is always filled. The iteration occurs in the handle_data() function and then each bar data will be fetched into data variable. Each bar data is defined as follows:

BarData({'AAPL': SIDData({'high': 3.8190101840271575, 'open': 3.5603358942290511, 'price': 3.8, 'volume': 133949200, 'low': 3.452045738788637, 'sid': 'AAPL', 'source_id': 'DataPanelSource-6d0572f7ed3cad6d52522c275aee663d', 'close': 3.7999999999999998, 'dt': Timestamp('2000-01-03 00:00:00+0000', tz='UTC'), 'type': 4})})

The average running time (10 loops) for this script is about 66 seconds which seems really long considering we are only fetching daily data and running a simple trading algorithm. We then try using local file instead of fetching from Yahoo Finance.

data = pd.read_csv('AAPL.csv', header=0, index_col=0, parse_dates = True) data.sort(inplace=True);data = data.tz_localize('UTC') #required to run data = data[data.index >= start];data = data[data.index <= end]

APPL.csv is the local file downloaded from http://ichart.finance.yahoo.com/table.csv?s=APPL. Sorting and localizing data is mandatory because zipline considers data as ascending timeline, and extracts data bar from that.

def handle_data(context, data): order('Close', 10) record(AAPL=data['Close'].price)

Then the data changes as follow:

BarData({ 'Volume': SIDData({'price': 151494000.0, 'volume': 1000, 'sid': 'Volume', 'source_id': 'DataFrameSource-f6bfb478831d7581226e6a4507bc386b', 'dt': Timestamp('2011-09-21 00:00:00+0000', tz='UTC'), 'type': 4}), 'Adj Close': SIDData({'price': 55.305234999999996, 'volume': 1000, 'sid': 'Adj Close', 'source_id': 'DataFrameSource-f6bfb478831d7581226e6a4507bc386b', 'dt': Timestamp('2011-09-21 00:00:00+0000', tz='UTC'), 'type': 4}), 'High': SIDData({'price': 421.58997, 'volume': 1000, 'sid': 'High', 'source_id': 'DataFrameSource-f6bfb478831d7581226e6a4507bc386b', 'dt': Timestamp('2011-09-21 00:00:00+0000', tz='UTC'), 'type': 4}), 'Low': SIDData({'price': 411.999977, 'volume': 1000, 'sid': 'Low', 'source_id': 'DataFrameSource-f6bfb478831d7581226e6a4507bc386b', 'dt': Timestamp('2011-09-21 00:00:00+0000', tz='UTC'), 'type': 4}), 'Close': SIDData({'price': 412.13998, 'volume': 1000, 'sid': 'Close', 'source_id': 'DataFrameSource-f6bfb478831d7581226e6a4507bc386b', 'dt': Timestamp('2011-09-21 00:00:00+0000', tz='UTC'), 'type': 4}), 'Open': SIDData({'price': 419.639992, 'volume': 1000, 'sid': 'Open', 'source_id': 'DataFrameSource-f6bfb478831d7581226e6a4507bc386b', 'dt': Timestamp('2011-09-21 00:00:00+0000', tz='UTC'), 'type': 4})})

* Note: We have to be careful with the volume field here. With this method, each data column (Open, Close, High, Low, Adj Close and Volume) is treated as individual instruments here and the ‘volume’ field is set 1000 as default. In backtest, the order is filled or cancelled based on the available market volume (please see this reference), so we need to change the ‘volume’ field set here.

The average running time is: 61 seconds which isn’t much better than load_bars_from_yahoo() we had tried before. Performance is in fact a known issue for the zipline library. Even though we use local data files, zipline also needs to fetch data from yahoo for the trading environment. This is due to the benchmark mechanism embedded in this library. e.g: get_raw_benchmark_data() function request to yahoo to get the data point for ^GSPC.

Of course, one can try to customize the code to use one’s own data rather than fetch data from other sources; however it requires a lot of effort. Jason Swearingen deals with this problems (stated in this post) by writing his own library called QuanShim, which supports Zipline and Quantopian. However, this is out-of-scope here.

Also, it is really difficult to deal with higher frequency trading data (hourly, minutes, tick data) here. In order to work with data outside of the provided benchmark date range, one can either:

(1) supply your own benchmark (look at this suggestion and answer for issue 271); or

(2) run without a benchmark and then don’t compute the risk metrics that require it (comment some code line in risk.py or benchmark.py). This is mentioned in the issue 13.

If your target market is US market, then zipline is a decent choice for a Python Backtesting library. But for backtesting different financial assets in all markets, zipline‘s lack of flexibility and slow running time will cause issues.

2. PyAlgoTrade:

We use the following simple script to demonstrate how PyAlgoTrade works compared to Zipline. PyAlgoTrade’s documentation can be found here, including tutorial and sample strategies. For fair comparison, let’s try the same strategy we did above:

from pyalgotrade import strategy from pyalgotrade.tools import yahoofinance instruments = ["AAPL"] class MyStrategy(strategy.BacktestingStrategy): def __init__(self, feed, instrument, useAdjustedClose = False): strategy.BacktestingStrategy.__init__(self, feed,cash_or_brk=100000) self.__instrument = instrument self.setUseAdjustedValues(useAdjustedClose) # We will allow buying more shares than cash allows. self.getBroker().setAllowNegativeCash(True) def onBars(self, bars): bar = bars[self.__instrument] self.marketOrder(self.__instrument, 10) # buy 10 self.info("BUY 10 %s, Portfolio value: %s" %(self.__instrument, self.getBroker().getEquity())) feed = yahoofinance.build_feed(instruments, fromYear=2000, toYear=2012, storage="data") # Evaluate the strategy with the feed's bars. myStrategy = MyStrategy(feed, instruments[0]) myStrategy.run() print "Final portfolio value: $%.2f" % myStrategy.getResult()

This is also pretty simple. The script obtains data from Yahoo, iterates using onBars(). Unlike zipline, PyAlgoTrade does not allow negative cash by default, so we must explicitly defined it.

Changing the feed to local file is very easy on PyAlgoTrade, which makes this library more suitable for paper- backtests than zipline. In the below example, we also use the data file downloaded from Yahoo.

# Load the yahoo feed from the CSV file from pyalgotrade.barfeed import yahoofeed feed = yahoofeed.Feed() feed.addBarsFromCSV(instrument="AAPL", path="AAPL.csv")

from pyalgotrade.barfeed import csvfeed from pyalgotrade.bar import Frequency filename = '../../data/gold/gold3_1.csv' feed = csvfeed.GenericBarFeed(Frequency.DAY,pytz.utc) feed.addBarsFromCSV('gap',filename)

One thing I like about PyAlgoTrade is that it is more flexible than zipline library for placing orders. Besides individual orders (eg: market, limit, stop, stop-limit order), PyAlgoTrade provide higher level functions that wrap a pair of entry/exit orders (eg: enterLong, enterShort, enterLongLimit, enterShortLimit interface).

PyAlgoTrade definitely provides more flexibility for placing orders. In most cases, we only work with the first 6 events i.e. onEnterOk, onEnterCanceled, onExitOk, onExitCanceled, onOrderUpdated and onBars.

However, PyAlgoTrade provides their own DataSeries and Bar classes, and these classes do not work with Pandas library. This is frustrating since Pandas is common to Data Analysis and modeling. Let’s look at the bars define in each iteration:

<class 'pyalgotrade.bar.BasicBar'> ['_BasicBar__adjClose', '_BasicBar__close', '_BasicBar__dateTime', '_BasicBar__frequency', '_BasicBar__high', '_BasicBar__low', '_BasicBar__open', '_BasicBar__useAdjustedValue', '_BasicBar__volume', '__abstractmethods__', '__class__', '__delattr__', '__dict__', '__doc__', '__format__', '__getattribute__', '__getstate__', '__hash__', '__init__', '__metaclass__', '__module__', '__new__', '__reduce__', '__reduce_ex__', '__repr__', '__setattr__', '__setstate__', '__sizeof__', '__slots__', '__str__', '__subclasshook__', '__weakref__', '_abc_cache', '_abc_negative_cache', '_abc_negative_cache_version', '_abc_registry', 'getAdjClose', 'getAdjHigh', 'getAdjLow', 'getAdjOpen', 'getClose', 'getDateTime', 'getFrequency', 'getHigh', 'getLow', 'getOpen', 'getPrice', 'getTypicalPrice', 'getUseAdjValue', 'getVolume', 'setUseAdjustedValue']

With lack of support for Pandas, you will likely spend more time learning PyAlgoTrade than zipline libray. Zipline provides a simple interface, and familiar datatype (Pandas) so the user can focus on the strategy itself, rather than take time working with other technical plumbing.

However, compared to zipline, PyAlgoTrade clearly outperforms in terms of running time. With the same algorithm, the average running time is only 2 seconds while the zipline script above takes about a minute.

Summary of Zipline vs PyAlgoTrade Python Backtesting Libraries

I would likely to rating these 2 Python Backtesting Libraries as follows:

Zipline PyAlgoTrade Description Paper-Trading ♦ ♦ ♦ ♦ Zipline doesn’t seem to work for non-US and local data, while PyAlgoTrade works with any type of data Real-trading ♦ ♦ ♦ ♦ Both good but cloud programming in Quantpian is really impressive Flexibility ♦ ♦ ♦ ♦ ♦ PyAlgoTrade supports higher level order types and more events in transactions. Zipline, on other hand, provides simple Slippage model Speed ♦ ♦ ♦ ♦ Zipline is really slow compared to PyAlgoTrade. Ease of use ♦ ♦ ♦ ♦ ♦ PyAlgoTrade does not support pandas.

Each Python Backtesting library has its own strengths and weaknesses, and a lot of interesting functions which I didn’t bring up in this article. So I would suggest you choose the most suitable one based on what your requirements are and the pros and cons mentioned above.

By the RobustTechHouse team