PyPy: the other new compiler project

This article brought to you by LWN subscribers Subscribers to LWN.net made this article — and everything that surrounds it — possible. If you appreciate our content, please buy a subscription and make the next set of articles possible.

We have recently seen a lot of attention paid to projects like LLVM. Even though the GNU Compiler Collection is developing at a rapid pace , there are people in the community who are interested in seeing different approaches taken, preferably with a newer code base. LLVM is not where all the action is, though. For the last few years (since 2003, actually), a relatively stealthy project called PyPy has been trying to shake up the compiler landscape in its own way.

On the face of it, PyPy looks like an academic experiment: it is an implementation of the Python 2.5 interpreter which is, itself, written in Python. One might thus expect it to be more elegant in its code than the standard, C-implemented interpreter (usually called CPython), but rather slower in its execution. If one runs PyPy under CPython, the result is indeed somewhat slow, but that is not how things are meant to be done. When running in its native mode, PyPy can be surprising.

PyPy is actually written in a subset of Python called RPython ("restricted Python"). Many of the features and data types of Python are available, but there are rules. Variables are restricted to data of one type. Only built-in types can be used in for loops. There is no creation of classes or functions at run time, and the generator feature is not supported. And so on. The result is a version of the language which, while still clearly Python, looks a bit more like C.

Running the RPython-based interpreter in CPython is supported; it is fully functional, if a bit slow. Running in this mode can be good for debugging. But the production version of PyPy is created in a rather different way: the PyPy hackers have created a multi-step compiler which is able to translate an RPython program into a lower-level language. That language might be C, in which case the result can be compiled and linked in the usual way. But the target language is not fixed; the translator is able to output code for the .NET or Java virtual machines as well. That means that the PyPy interpreter can be easily targeted to whatever runtime environment works best.

The result works. It currently implements all of the features of Python 2.5, with very few exceptions. There are some behavioral differences due to, for example, the use of a different garbage-collection algorithm; PyPy can be slower to call destructors than CPython is. Python extensions written in C can be used, though one gets the sense that this feature is still stabilizing. PyPy is able to run complex applications like Django and Twisted. On the other hand, for now, it only runs on 32-bit x86 systems, it is described as "memory-hungry," and Python 3 support seems to be a relatively distant goal.

Beyond that, it's fast. PyPy includes a built-in just-in-time compiler (JIT); it is, in a sense, a platform for the creation of JITs for various targets. The result is an interpreter which, much of the time, is significantly faster than CPython. For the curious, the PyPy Speed Center contains lots of benchmark results, presented in a slick, JavaScript-heavy interface. PyPy does not always beat CPython, but it often does so convincingly, and speed appears to be a top priority for the PyPy developers. It may well be that the speed of PyPy may eventually prove compelling enough that, as Alex Gaynor suggests, many of us will be using PyPy routinely instead of CPython in the near future.

There are some other interesting features as well. There is a stackless Python mode which supports microthreaded, highly-concurrent applications. There is a sandboxed mode which intercepts all external library calls and hands them over to a separate policy daemon for authorization. And so on.

What really catches your editor's eye, though, is the concept of PyPy as a generalized compiler for the creation of JITs for high-level languages. The translation process is flexible, to the point that it can easily accommodate stackless mode, interesting optimizations, or experimentation with different language features. The object model can be (and has been) tweaked to support tainting and tracing features. And the system as a whole is not limited to the creation of JIT compilers for Python; projects are underway to implement a number of other languages, including Prolog, Smalltalk, and JavaScript.

It could easily be argued that PyPy incorporates much of the sort of innovation which many people have said never happens with free software. And it is all quite well documented. This is a project which is not afraid of ambitious goals, and which appears to be able to achieve those goals; it will be interesting to watch over the next few years.

