Unladen swallow: accelerating Python

Benefits for LWN subscribers The primary benefit from subscribing to LWN is helping to keep us publishing, but, beyond that, subscribers get immediate access to all site content and access to a number of extra site features. Please sign up today!

Google uses Python for many of its engineering projects, from internal server monitoring and reporting to outward-facing products like Google Groups, so it is no surprise that the company wants to improve Python application performance. A group of Google developers is working on a new optimization branch of Python dubbed Unladen Swallow, with the goal of a five-fold speed increase over the trunk. It will achieve that goal by adding just-in-time compilation and a new virtual machine design, all while retaining source compatibility for Python application developers.

Unladen Swallow's lead developers Collin Winter, Jeffrey Yasskin, and Thomas Wouters have long been core developers for the CPython project, the reference implementation and most widespread interpreter for the Python language. All three are Google employees, and others contribute their "twenty percent time" to Unladen Swallow, but the group insists that it is a Python project, not an effort owned by Google.

Winter said the origin of the idea dates back to his work on the web-based code review tool Mondrian, when the team's attempts at optimization repeatedly hit limitations in CPython, such as the Global Interpreter Lock (GIL), the mutex that prevents concurrency on multiprocessor or multi-core machines. While researching potential speed-ups and changes, Winter and the other Google engineers eventually decided that the long-range ideas they had in mind were significant enough to warrant making a separate branch. Plus, doing so would give them the chance to stress-test their ideas before trying to roll them back into CPython.

The Concept: a bird's eye view

The core of the Unladen Swallow team's planned improvements is to remove performance bottlenecks in the Python virtual machine (VM) design, leaving the rest of the interpreter — not to mention the substantial runtime library — relatively untouched. The long-term plan is to replace CPython's existing stack-based VM with a register-based VM built with code from the Low Level Virtual Machine (LLVM) project, and to add a just-in-time compiler (JIT) on top of the new VM. Other performance-based improvements are welcome at the same time, and the team has several in store based on their talks with heavy Python users.

Using a JIT will speed up execution by compiling to machine code, thus eliminating the overhead of fetching, decoding, and dispatching Python opcodes. "In CPython," Winter explained, "this overhead is significant; some minor tweaks were made to CPython 2.7 that netted a 15% speed-up with relatively little work."

Adding the JIT presents a good opportunity to switch from a stack-based VM to LLVM's register-based design, which Winter said will net its own performance benefits. The merits of stack- versus register-based VMs is an ongoing debate, but Winter cites a 2005 study [PDF] from the Lua project showcasing the empirical benefits of the register-based design.

Unladen Swallow is based on Python 2.6.1, which is not the most recent release. Python 3.0 was released in December of 2008, implementing the backward-incompatible 3.0 version of Python. Because the majority of Python code in the wild — and in use at Google — is still written for Python 2.x, the Unladen Swallow team decided to focus its efforts on the earlier version where more benefits would be felt. By using the CPython source as its base, Python users can expect Unladen Swallow to retain 100% source compatibility.

Still, Winter said, the team does keep in close contact with Python designer Guido van Rossum (himself a Google employee) and other members of the CPython team. "In our discussions with Guido and others about how and where to merge our changes back into CPython, the idea has been proposed that Unladen Swallow should merge into 3.x. 3.x is the future of the language, and if 3.x is significantly faster than 2.x, that's an obvious incentive to port applications and libraries to 3.x. None of that is set in stone, and Guido may well change his mind."

Recent sightings

The team has set a tight development schedule for Unladen Swallow, making quarterly milestone releases. The first release, 2009Q1, was limited in scope, aiming for a 25 to 35% speed increase over vanilla CPython by making less than drastic changes to the code. The changes include a new eval loop reimplemented using vmgen, several improvements to the garbage collector — better tracking long-lived objects so that the garbage collection can make fewer collection runs — and to the data serialization module cPickle, which the developers said will benefit web applications in particular. Several obscure Python opcodes were also removed and replaced with functionally-equivalent Python functions, which reduces code size without affecting performance.

Unladen Swallow 2009Q1 is available as source code only for the time being, and can be checked out as a branch from the project's public Subversion repository. No specific compilation instructions are provided because this release closely follows the upstream CPython, but the developers do recommend building in 64-bit mode in order to take the fullest advantage of the performance increases.

Since speed of execution is the goal, the team performs regular benchmarks on the code. The thirteen benchmark tests in the suite are based on real-world performance tests designed to highlight practical application tasks, particularly for web applications. The results of the tests on Unladen Swallow 2009Q1 versus CPython 2.6.1 are posted on the project wiki; Unladen Swallow ranges from 7.43% faster to 157.17% faster, beating CPython on every benchmark.

Work is underway now on Unladen Swallow 2009Q2, which will focus on replacing the existing CPython VM with an equivalent built using LLVM.

Elsewhere in the ecosystem

Other open source projects have sought to improve Python application execution using some of the same ideas. Psyco was an earlier JIT for Python, but which was later superseded by the PyPy project. PyPy's primary goal is not performance, though, rather it is to build a Python implementation in Python itself. Stackless Python implements concurrency through the use of its own scheduler and special primitives called "tasklets." Finally, the Parrot project is implementing Python on its own register-based VM.

In some ways, Unladen Swallow is more ambitious than these other projects, particularly when you consider the rapid pace of development laid out in the road map. On the other hand, Unladen Swallow starts from the CPython 2.6.1 code base, and incorporates many CPython developers, which greatly improves the chances that its changes will one day be blessed as the official CPython release. Many of the 2009Q1 changes have already been sent upstream to CPython, and the door is still wide open for the 3.0 series should the JIT and VM performance deliver real-world performance increases anywhere close to the expected 400 percent.