Gilectomy

Did you know...? LWN.net is a subscriber-supported publication; we rely on subscribers to keep the entire operation going. Please help out by buying a subscription and keeping LWN on the net.

Python's (in)famous global interpreter lock (GIL), which effectively serializes multi-threaded access to the interpreter (thus hampering concurrency using threads), has long been seen as something that Python could do without. But there are both technical and political hurdles to clear before the GIL can be removed. Larry Hastings presented his thoughts and progress on doing a "gilectomy" to the CPython interpreter at the 2016 Python Language Summit.

Hastings said that he has a proof-of-concept solution that gets around the technical and political problems. There are two questions that often get asked: "Could we remove the GIL?" and "Should we remove the GIL?" It is clear that it can be removed, he said, because IronPython and Jython already have. The answer to the second is "maybe"; it will depend on what it buys versus the technical debt it incurs. But, he said, he is going to keep trying to remove the GIL until either it gets removed or everyone tells him to stop.

The GIL was added in 1992 by Guido van Rossum; since then, the world has changed, but Python hasn't. Now, everything, including eyeglasses, is multi-core. Python, however, cannot really take advantage of these cores using threads.

There are four technical considerations that need to be addressed, he said. Reference counting for the garbage collector is one. There is also a need to look at the globals and statics in the interpreter and make them per-thread variables. The C extension parallelism and reentrancy issues need to be handled as do places in the code where atomicity is required.

There are also three political considerations. Van Rossum has said that he will only consider removing the GIL if it does not negatively impact the performance of single-threaded programs. Breaking all of the C extensions, which is the outcome of some other GIL-removing projects, is not reasonable. Removing the GIL must also not over-complicate the code.

There are some potential solutions to the reference counting issue that should not be considered, Hastings said. Both tracing garbage collection and software transactional memory might perform reasonably, but both are likely to be quite complicated and to break all of the C extensions.

So reference counting remains in his proof of concept. That means using atomic increment and decrement operations, which leads to a 30% performance hit right off the bat. As more threads are added it gets worse. He has an idea about "buffered reference counting", but did not have time to describe that at the summit. For global data, PyThreadState can be used to make it per-thread data. He has added fine-grained locking to the small-block allocator so that it can be used by multiple threads as well.

Parallelism is simply something that C extensions will have to live with. It makes the lives of extension developers more difficult, but there really is no way to soften that blow, he said. In order to enforce atomicity, he has added a lock API to CPython (with "macros to hide it behind") so that all mutable objects get locked before accessing them. He noted that "mutable" refers to the C objects, not Python objects, so even immutable objects in Python, like strings, are still mutable from the perspective of the interpreter.

Hastings laid out a set of five rules for locking in CPython to ensure that locking functions smoothly. Locks must be recursive and objects must be self-locking wherever possible. The reference count cannot be touched except through the defined interface and the object type is immutable. The latter drew a question about the desirability of changing object types, but Hastings said that there will be some things that have to be given up to facilitate the removal of the GIL.

When code needs to take multiple locks, it should do it in address order. Finally, the kernel should not be involved in taking the lock unless there is contention. That maps to a futex on Linux, but Windows and Mac OS X have equivalent functionality.

His proof of concept lives in the same source tree as the regular CPython interpreter, which can be configured to run with or without the GIL. One thing that might be possible if the GIL-removal work pans out is to enforce best practices on C extensions, since there will be a new API. The GIL removal is somewhat complicated, so it may fail that particular political consideration, he said.

Hastings briefly described his eight-point plan to remove the GIL (after noting Van Rossum's 2007 "It isn't Easy to Remove the GIL" blog post). It is presumably based on the process he took with his "toy" proof of concept. It starts by adding the atomic increment/decrement, adds locks to various types (dict, list) and free lists, on through murdering the GIL and fixing up the tests.

He showed the results of a "dumb test" he ran using the proof of concept. It calculated the Fibonacci sequence in seven threads. It was roughly 3.5x slower than the standard CPython interpreter in terms of wall time and 25x slower in terms of CPU time (because seven threads were running). That is not as good as he had hoped for in this early stage (he was shooting for only 2x slower), but there are still a lot of low-hanging optimization possibilities.

The open questions ("apart from 'should we do it at all?'" he said with a grin) are about things like separating read and write locks or allowing user-settable locks in the language itself. It might also make sense to look at running multiple interpreters in the same process—GIL-removal time might be the right point to add that feature.

He concluded the talk by noting that he had "Gilectomy" stickers available and a GitHub repository set up for those interested. He said he was planning to "sprint" on the project right after the main PyCon conference; "I have T-shirts if you sprint with me."

There wasn't a lot of time for questions, but there were a few. One person asked about how Gilectomy impacts PyPy. Hastings said he didn't know, but thought that project was more prepared for these kinds of changes than CPython is. Nick Coghlan commented that there is a fair amount of code out there that should be doing locking but isn't; the programs are getting away with it mostly because the GIL—or, as another person suggested, CPU scheduling—protects it. Eliminating the GIL will expose those programs. Hastings also noted that it was unfortunate but that one of the costs of Gilectomy will be to break some C extensions, though he is unsure of how many.

[As evidence of the interest in the Python community about removing the GIL, Hastings took a photo of the (overly) full room where he gave a Gilectomy talk at PyCon later in the week. It can be seen on the right.]

