Making Python 3 more attractive

Please consider subscribing to LWN Subscriptions are the lifeblood of LWN.net. If you appreciate this content and would like to see more of it, your subscription will help to ensure that LWN continues to thrive. Please visit this page to join up and keep LWN on the net.

Larry Hastings was up next at the summit with a discussion of what it would take to attract more developers to use Python 3. He reminded attendees of Matt Mackall's talk at last year's summit, where the creator and project lead for the Mercurial source code management tool said that Python 3 had nothing in it that the project cares about. That talk "hit home for me", Hastings said, because it may explain part of the problem with adoption of the new Python version.

The Unicode support that comes with Python 3 is "kind of like eating your vegetables", he said. It is good for you, but it doesn't really excite developers (perhaps because most of them use Western languages, like English, someone suggested). Hastings is looking for changes that would make people want to upgrade.

He wants to investigate features that might require major architectural changes. The core Python developers may be hungry enough to get people to switch that they may be willing to consider those kinds of changes. But there will obviously be costs associated with changes of that sort. He wanted people to keep in mind the price in terms of readability, maintainability, and backward compatibility.

The world has changed a great deal since Python was first developed in 1990. One of the biggest changes is the move to multi-threading on multicore machines. It wasn't until 2005 or so when he started seeing multicore servers, desktops, and game consoles, then, shortly thereafter, laptops. Since then, tablets and phones have gotten multicore processors; now even eyeglasses and wristwatches are multicore, which is sort of amazing when you stop to think about it.

The perception is that Python is not ready for a multicore world because of the global interpreter lock (GIL). He said that he would eventually get to the possibility of removing the GIL, but he had some other ideas he wanted to talk about first.

For example, what would it take to have multiple, simultaneous Python interpreters running in the same process? It would be a weaker form of a multicore Python that would keep the GIL. Objects could not be shared between the interpreter instances.

In fact, you can do that today, though it is a bit of a "party trick", he said. You can use dlmopen() to open multiple shared libraries, each in its own namespace, so that each interpreter "runs in its own tiny little world". It would allow a process to have access to multiple versions of Python at once, though he is a bit dubious about running it in production.

Another possibility might be to move global interpreter state (e.g. the GIL and the small-block allocator) into thread-local storage. It wouldn't break the API for C extensions, though it would break extensions that are non-reentrant. There is some overhead to access thread-local storage because it requires indirection. It is "not as bad as some other things" that he would propose, he said with a chuckle.

A slightly cleaner way forward would be to add an interpreter parameter to the functions in the C API. That would break the API, but do so in a mechanical way. It would, however, use more stack space and would still have the overhead of indirect access.

What would it take to have multiple threads running in the same Python interpreter? That question is also known as "remove the GIL", Hastings said. In looking at that, he considered what it is that the GIL protects. It protects global variables, but those could be moved to a heap. It also enables non-reentrant code as a side effect. There is lots of code that would fail if the assumption that it won't be called simultaneously in multiple threads is broken, which could be fixed but would take a fair amount of work.

The GIL also provides the atomicity guarantees that Messier brought up. A lock on dicts and lists (and other data structures that need atomic access) could preserve atomicity. Perhaps the most important thing the GIL does, though, is to protect access to the reference counts that are used to do garbage collection. It is really important not to have races on those counts.

The interpreter could switch to using the atomic increment and decrement instructions provided by many of today's processors. That doesn't explicitly break the C API as the change could be hidden behind macros. But, Hastings said, Antoine Pitrou's experiments with using those instructions resulted in 30% slower performance.

Switching to a mark-and-sweep garbage collection scheme would remove the problem with maintaining the reference counts, but it would be "an immense change". It would break every C extension in existence, for one thing. For another, conventional wisdom holds that reference counting and "pure garbage collection" (his term for mark and sweep) are roughly equivalent performance-wise, but the performance impact wouldn't be known until after the change was made, which might make it a hard sell.

PyPy developer Armin Rigo has been working on software transactional memory (STM) and has a library that could be used to add STM to the interpreter. But Rigo wrote a toy interpreter called "duhton" and, based on that, said that STM would not be usable for CPython.

Hastings compared some of the alternative Python implementations in terms of their garbage-collection algorithm. Only CPython uses reference counting, while Jython, IronPython, and PyPy all use pure garbage collection. It would seem that the GIL and reference counting go hand in hand, he said. He also noted that few other scripting languages use reference counting, so the future of scripting may be with pure garbage collection.

Yet another possibility is to turn the C API into a private API, so extensions could not call it. They would use the C Foreign Function Interface (CFFI) for Python instead. Extensions written using Cython might be another possible approach to hide the C extension API.

What about going "stackless" (à la Stackless Python)? Guido van Rossum famously said that Python would never merge Stackless, so that wasn't Hastings's suggestion. Instead, he looked at the features offered by Stackless: coroutines, channels, and pickling the interpreter state for later resumption of execution. Of the three, only the first two are needed for multicore support.

The major platforms already have support for native coroutines, though some are better than others. Windows has the CreateFiber() API that creates "fibers", which act like threads, but use "cooperative multitasking". Under POSIX, things are a little trickier.

There is the makecontext() API that does what is needed. Unfortunately, it was specified by POSIX in 2001, obsoleted in 2004, and dropped in 2008, though it is still mostly available. It may not work for OS X, however. When makecontext() was obsoleted, POSIX recommended that developers use threads instead, but that doesn't solve the same set of problems, Hastings said.

For POSIX, using a combination of setjmp() , longjmp() , sigaltstack() , and some signal (e.g. SIGUSR2 ) will provide coroutine support though it is "pretty awful". While it is "horrible", it does actually work. He concluded his presentation by saying that he was mostly interested in getting the assembled developers to start thinking about these kinds of things.

One attendee suggested looking at the GCC split stack support that has been added for the Go language, but another noted that it is x86-64-only. Trent Nelson pointed to PyParallel (which would be the subject of the next slot) as a possible model. It is an approach that identifies the thread-sensitive parts of the interpreter and has put in guards to stop multiple threads from running in them.

But another attendee wondered if removing the GIL was really the change that the Mercurial developers needed in order to switch. Hastings said that he didn't think GIL removal was at all interesting to the Mercurial developers, as they are just happy with what Python 2.x provides for their project.

Though there may be solutions to the multi-threading problem that are architecture specific, it may still be worth investigating them, Nick Coghlan said. If "works on all architectures" is a requirement to experiment with ways to better support multi-threading, it is likely to hold back progress in that area. If a particular technique works well, that may provide some impetus for other CPU vendors to start providing similar functionality.

Jim Baker mentioned that he is in favor of adding coroutines. Jython has supported multiple interpreters for a while now. Java 10 will have support for fibers as well. He would like to see some sort of keyword tied to coroutines, which will make it easier for Jython (and others) to recognize and handle them. Dino Viehland thought that IronPython could use fibers to implement coroutines, but would also like to see a new language construct to identify that code.

The main reason that Van Rossum is not willing to merge Stackless is because it would complicate life for Jython, IronPython, PyPy, and others, Hastings said (with Van Rossum nodding vigorously in agreement). So having other ways to get some of those features in the alternative Python implementations would make it possible to pursue that path.

Viehland also noted that there is another scripting language that uses reference counting and is, in fact, "totally single threaded": JavaScript. People love JavaScript, he said, and wondered if just-in-time (JIT) compiling should be considered as the feature to bring developers to Python 3. That led Thomas Wouters to suggest, perhaps jokingly, that folks could be told to use PyPy (which does JIT).

Hastings said that he has been told that removing the GIL would be quite popular, even if it required rewriting all the C extensions. Essentially, if the core developers find a way to get rid of the GIL, they will be forgiven for the extra work required for C extensions. But Coghlan was not so sure, saying that the big barrier to getting people to use PyPy has generally been because C extensions did not work in that environment. Someone else noted that the scientific community (e.g. NumPy and SciPy users) has a lot of C extensions.