Originally at: https://www.backtrader.com/blog/2019-10-25-on-backtesting-performance-and-out-of-memory/on-backtesting-performance-and-out-of-memory/

There have been two recent https://redit.com/r/algotrading threads which are the inspiration for this article.

A thread with a bogus claim that backtrader cannot cope with 1.6M candles: reddit/r/algotrading — A performant backtesting system?

And another one asking for something which can backtest a universe of 8000 stocks: reddit/r/algotrading — Backtesting libs that supports 1000+ stocks?

With the author asking about a framework that can backtest ‘“out-of-core/memory”, ”because obviously it cannot load all this data into memory”

We’ll be of course addressing these concepts with backtrader

The 2M Candles

In order to do this, the first thing is to generate that amount of candles. Given that the first poster talks about 77 stocks and 1.6M candles, this would amount to 20,779 candles per stock, so we’ll do the following to have nice numbers

Generate candles for 100 stocks

Generate 20,000 candles per stock

I.e.: 100 files totaling 2M candles.

The script

This generates 100 files, starting with candles00.csv and going all the way up to candles99.csv . The actual values are not important. Having the standard datetime , OHLCV components (and OpenInterest ) is what matters.

The test system

Hardware/OS: A Windows 10 15.6" laptop with an Intel i7 and 32 Gbytes of memory will be used.

Python: CPython 3.6.1 and pypy3 6.0.0

and pypy3 Misc: an application running constantly and taking around 20% of the CPU. The usual suspects like Chrome (102 processes), Edge, Word, Powerpoint, Excel and some minor application are running

backtrader default configuration

Let’s recall what the default run-time configuration for backtrader is:

Preload all data feeds if possible

If all data feeds can be preloaded, run in batch mode (named runonce )

) Precalculate all indicators first

Go through the strategy logic and broker step-by-step

Executing the challenge in the default batch runonce mode

Our test script (see at the bottom for the full source code) will open those 100 files and process them with the default backtrader configuration.

$ ./two-million-candles.py

Cerebro Start Time: 2019-10-26 08:33:15.563088

Strat Init Time: 2019-10-26 08:34:31.845349

Time Loading Data Feeds: 76.28

Number of data feeds: 100

Strat Start Time: 2019-10-26 08:34:31.864349

Pre-Next Start Time: 2019-10-26 08:34:32.670352

Time Calculating Indicators: 0.81

Next Start Time: 2019-10-26 08:34:32.671351

Strat warm-up period Time: 0.00

Time to Strat Next Logic: 77.11

End Time: 2019-10-26 08:35:31.493349

Time in Strategy Next Logic: 58.82

Total Time in Strategy: 58.82

Total Time: 135.93

Length of data feeds: 20000

Memory Usage: A peak of 348 Mbytes was observed

Most of the time is actually spent preloading the data ( 98.63 seconds), spending the rest in the strategy, which includes going through the broker in each iteration ( 73.63 seconds). The total time is 173.26 seconds.

Depending on how you want to calculate it the performance is:

14,713 candles/second considering the entire run time

Bottomline: the claim in the 1st of the two reddit thread above that backtrader cannot handle 1.6M candles is FALSE.

Doing it with pypy

Since the thread claims that using pypy didn't help, let's see what happens when using it.

$ ./two-million-candles.py

Cerebro Start Time: 2019-10-26 08:39:42.958689

Strat Init Time: 2019-10-26 08:40:31.260691

Time Loading Data Feeds: 48.30

Number of data feeds: 100

Strat Start Time: 2019-10-26 08:40:31.338692

Pre-Next Start Time: 2019-10-26 08:40:31.612688

Time Calculating Indicators: 0.27

Next Start Time: 2019-10-26 08:40:31.612688

Strat warm-up period Time: 0.00

Time to Strat Next Logic: 48.65

End Time: 2019-10-26 08:40:40.150689

Time in Strategy Next Logic: 8.54

Total Time in Strategy: 8.54

Total Time: 57.19

Length of data feeds: 20000

Holy Cow! The total time has gone down to 57.19 seconds in total from 135.93 seconds. The performance has more than doubled.

The performance: 34,971 candles/second

Memory Usage: a peak of 269 Mbytes was seen.

This is also an important improvement over the standard CPython interpreter.

Handling the 2M candles out of core memory

All of this can be improved if one considers that backtrader has several configuration options for the execution of a backtesting session, including optimizing the buffers and working only with the minimum needed set of data (ideally with just buffers of size 1 , which would only happen in ideal scenarios)

The option to be used will be exactbars=True . From the documentation for exactbars (which is a parameter given to Cerebro during either instantiation or when invoking run )

`True` or `1`: all “lines” objects reduce memory usage to the

automatically calculated minimum period. If a Simple Moving Average has a period of 30, the underlying data

will have always a running buffer of 30 bars to allow the

calculation of the Simple Moving Average * This setting will deactivate `preload` and `runonce` * Using this setting also deactivates **plotting**

The doc is located here: https://www.backtrader.com/docu/cerebro/

For the sake of maximum optimization and because plotting will be disabled, the following will be used too: stdstats=False , which disables the standard Observers for cash, value and trades (useful for plotting, which is no longer in scope)

$ ./two-million-candles.py --cerebro exactbars=False,stdstats=False

Cerebro Start Time: 2019-10-26 08:37:08.014348

Strat Init Time: 2019-10-26 08:38:21.850392

Time Loading Data Feeds: 73.84

Number of data feeds: 100

Strat Start Time: 2019-10-26 08:38:21.851394

Pre-Next Start Time: 2019-10-26 08:38:21.857393

Time Calculating Indicators: 0.01

Next Start Time: 2019-10-26 08:38:21.857393

Strat warm-up period Time: 0.00

Time to Strat Next Logic: 73.84

End Time: 2019-10-26 08:39:02.334936

Time in Strategy Next Logic: 40.48

Total Time in Strategy: 40.48

Total Time: 114.32

Length of data feeds: 20000

The performance: 17,494 candles/second

Memory Usage: 75 Mbytes (stable from the beginning to the end of the backtesting session)

Let’s compare to the previous non-optimized run

Instead of spending over 76 seconds preloading data, backtesting starts immediately, because the data is not preloaded

seconds preloading data, backtesting starts immediately, because the data is not preloaded The total time is 114.32 seconds vs 135.93 . An improvement of 15.90% .

seconds vs . An improvement of . An improvement in memory usage of 68.5% .

Note

We could have actually thrown 100M candles to the script and the amount of memory consumed would have remained fixed at 75 Mbytes

Doing it again with pypy

Now that we know how to optimize, let’s do it the pypy way.

$ ./two-million-candles.py --cerebro exactbars=True,stdstats=False

Cerebro Start Time: 2019-10-26 08:44:32.309689

Strat Init Time: 2019-10-26 08:44:32.406689

Time Loading Data Feeds: 0.10

Number of data feeds: 100

Strat Start Time: 2019-10-26 08:44:32.409689

Pre-Next Start Time: 2019-10-26 08:44:32.451689

Time Calculating Indicators: 0.04

Next Start Time: 2019-10-26 08:44:32.451689

Strat warm-up period Time: 0.00

Time to Strat Next Logic: 0.14

End Time: 2019-10-26 08:45:38.918693

Time in Strategy Next Logic: 66.47

Total Time in Strategy: 66.47

Total Time: 66.61

Length of data feeds: 20000

The performance: 30,025 candles/second

Memory Usage: constant at 49 Mbytes

Comparing it to the previous equivalent run:

66.61 seconds vs 114.32 or a 41.73% improvement in run time

seconds vs or a improvement in run time 49 Mbytes vs 75 Mbytes or a 34.6% improvement.

Note In this case pypy has not been able to beat its own time compared to the batch ( runonce ) mode, which was 57.19 seconds. This is to be expected, because when preloading, the calculator indications are done in vectorized mode and that's where the JIT of pypy excels It has, in any case, still done a very good job and there is an important improvement in memory consumption

A complete run with trading

The script can create indicators (moving averages) and execute a short/long strategy on the 100 data feeds using the crossover of the moving averages. Let’s do it with pypy , and knowing that it is better with the batch mode, so be it.

$ ./two-million-candles.py --strat indicators=True,trade=True

Cerebro Start Time: 2019-10-26 08:57:36.114415

Strat Init Time: 2019-10-26 08:58:25.569448

Time Loading Data Feeds: 49.46

Number of data feeds: 100

Total indicators: 300

Moving Average to be used: SMA

Indicators period 1: 10

Indicators period 2: 50

Strat Start Time: 2019-10-26 08:58:26.230445

Pre-Next Start Time: 2019-10-26 08:58:40.850447

Time Calculating Indicators: 14.62

Next Start Time: 2019-10-26 08:58:41.005446

Strat warm-up period Time: 0.15

Time to Strat Next Logic: 64.89

End Time: 2019-10-26 09:00:13.057955

Time in Strategy Next Logic: 92.05

Total Time in Strategy: 92.21

Total Time: 156.94

Length of data feeds: 20000

The performance: 12,743 candles/second

Memory Usage: A peak of 1300 Mbytes was observed.

The execution time has obviously increased (indicators + trading), but why the memory usage increase?

Before reaching any conclusions, let’s run it creating indicators but without trading

Copied to clipboard

$ ./two-million-candles.py --strat indicators=True

Cerebro Start Time: 2019-10-26 09:05:55.967969

Strat Init Time: 2019-10-26 09:06:44.072969

Time Loading Data Feeds: 48.10

Number of data feeds: 100

Total indicators: 300

Moving Average to be used: SMA

Indicators period 1: 10

Indicators period 2: 50

Strat Start Time: 2019-10-26 09:06:44.779971

Pre-Next Start Time: 2019-10-26 09:06:59.208969

Time Calculating Indicators: 14.43

Next Start Time: 2019-10-26 09:06:59.360969

Strat warm-up period Time: 0.15

Time to Strat Next Logic: 63.39

End Time: 2019-10-26 09:07:09.151838

Time in Strategy Next Logic: 9.79

Total Time in Strategy: 9.94

Total Time: 73.18

Length of data feeds: 20000

The performance: 27,329 candles/second

Memory Usage: 600 Mbytes (doing the same in optimized exactbars mode consumes only 60 Mbytes , but with an increase in the execution time as pypy itself cannot optimize so much)

With that in the hand: Memory usage increases really when trading. The reason being that Order and Trade objects are created, passed around and kept by the broker.

Note Take into account that the data set contains random values, which generates a huge number of crossovers, hence an enourmous amounts of orders and trades. A similar behavior shall not be expected for a regular data set.

Conclusions

The bogus claim

Already proven above as bogus, becase backtrader CAN handle 1.6 million candles and more.

General

backtrader can easily handle 2M candles using the default configuration (with in-memory data pre-loading) backtrader can operate in an non-preloading optimized mode reducing buffers to the minimum for out-of-core-memory backtesting When backtesting in optimized non-preloading mode, the increase in memory consumption comes from the administrative overhead which the broker generates. Even when the trading, using indicators and the broker getting constantly in the way, the performance is 12,473 candles/second Use pypy where possible (for example if you don't need to plot)

Using Python and/or backtrader for these cases

With pypy , trading enabled, and the random data set (higher than usual number of trades), the entire 2M bars was processed in a total of:

156.94 seconds, i.e.: almost 2 minutes and 37 seconds

Taking into account that this is done in a laptop running multiple other things simultaneously, it can be concluded that 2M bars can be done.

What about the 8000 stocks scenario?

Execution time would have to be scaled by 80, hence:

12,560 seconds (or almost 210 minutes or 3 hours and 30 minutes ) would be needed to run this random set scenario.

Even assuming a standard data set which would generate far less operations, one would still be talking of backtesting in hours ( 3 or 4 )

Memory usage would also increase, when trading due to the broker actions, and would probably require some Gigabytes.

Note One cannot here simply multiply by 80 again, because the sample scripts trades with random data and as often as possible. In any case the amount of RAM needed would be IMPORTANT

As such, a workflow with only backtrader as the research and backtesting tool would seem far fetched.

A Discussion about Workflows

There are two standard workflows to consider when using backtrader

Do everything with backtrader , i.e.: research and backtesting all in one

, i.e.: research and backtesting all in one Research with pandas , get the notion if the ideas are good and then backtest with backtrader to verify with as much as accuracy as possible, having possibly reduced huge data-sets to something more palatable for usual RAM scenarios.

Tip One can imagine replacing pandas with something like dask for out-of-core-memory execution

The Test Script

Here the source code