Airship is a Python script that I’ve written that synchronizes game save data between Steam Cloud and iCloud for select games; currently those games are Race the Sun, The Banner Saga, Transistor, and Costume Quest. I often find myself looking for something to program, so for the hell of it I set up a benchmarking system to check the speed of the script on various Python interpreters with different combinations of optional packages installed. First I set up virtualenvs with the available interpreters, then I run the benchmark on each one. You can check out make-envs.py, benchmark-envs.py and benchmark.py to see how it works. Be warned, it’s quite messy.

The different interpreters I’m testing are: the latest patch release of the CPython versions 2.6–7 and 3.2–5, the latest PyPy, the latest PyPy3, and the latest Jython. (I honestly don’t know why I put in Jython).

I’m also testing different combinations of installed packages; specifically, Pillow, for comparing images of different formats for The Banner Saga, and scandir, a (theoretically) speedier version of os.walk() that was included in the 3.5 standard library. (Because of this, the package wasn’t installed on CPython 3.5.x).

A nice graph containing my findings (geometric mean of 5 runs). X axis is interpreter, Y axis is time in seconds

Findings

scandir actually increased the execution time on CPython 2.6–7 and 3.2, but lowered it on every other interpreter. To be honest, I entirely expected the performance improvement to be ‘free’ — it seems that this is not quite the case. I believe this is due to some changes first seen in 3.3.

Pillow always increased the execution time (except for Jython, which… I have no idea why), because the optional code that requires it is only run when Pillow is available, and that code is somewhat resource intensive.

When Pillow and scandir are available, we get some interesting findings. Execution time is increased compared to just having Pillow installed on CPython 2.6 and 3.2; both of those had their execution times increased with just scandir installed. However, for every other interpreter, the execution time is less than it was with just Pillow, including CPython 2.7, which had its execution time increased with just scandir.

Conclusions

Note that these are conclusions I am deriving from this data alone, and not from any extended period of usage or thorough benchmarking of any of the interpreters — these are only true in the context of the script that was benchmarked.

Jython is really slow.

CPython 2.6 is slow.

CPython 2.7 and 3.2–5 are fast without Pillow and slow with it.

PyPy is slower than CPython 2.7 and 3.2–5 without Pillow, but faster than every other interpreter with it.

PyPy3 is slower than PyPy. By a factor of ~2–3.5x.

I’d like to point out that several of these conclusions are to be expected based on the circumstances surrounding the development of the respective interpreters. Jython is explicitly meant to achieve a specific goal, and that goal is to have Python code interact directly with Java classes, not exactly to be the fastest around. 2.6 is the least modern CPython version that is still somewhat in use, and has not had much work put into it recently. 2.7 is still in widespread use and a lot of effort has been put into optimizing it. PyPy needs time to ‘warm up’ the JIT before it can achieve its full speed, which makes the single-use functions result in more execution time, but the many-use functions result in less, which happen to be mainly found in the Pillow-required code. PyPy3 is based off of an old version of PyPy, and being a spinoff of the main project, probably has much more overhead making the JIT work with Python 3.