Keeping Python competitive

Benefits for LWN subscribers The primary benefit from subscribing to LWN is helping to keep us publishing, but, beyond that, subscribers get immediate access to all site content and access to a number of extra site features. Please sign up today!

Victor Stinner sees a need to improve Python performance in order to keep it competitive with other languages. He brought up some ideas for doing that in a 2017 Python Language Summit session. No solid conclusions were reached, but there is a seemingly growing segment of the core developers who are interested in pushing Python's performance much further, possibly breaking the existing C API in the process.

The "idea is to make Python faster", he said, but Python is not as easy to optimize as other languages. For one thing, the C API blocks progress in this area. PyPy has made great progress with its CPyExt API for C extensions, but it still has a few minor compatibility problems. PyPy tried to reimplement the NumPy extension a few years back, so that it would work with PyPy, but that effort failed. NumPy is one of the C extensions to Python that essentially must work for any alternative implementation. But the C API blocks CPython enhancements as well—Gilectomy, for example. It would be nice to find a way to change that, he said.

A limited stable ABI has been defined for Python, but the full API, which is a superset of the ABI, can change between releases. The ABI is not tested, however; there is no tool to do so. Stinner said he knows of multiple regressions in the ABI, for example. The standard library is not restricted to only using the stable ABI; thus the default is the full API. All of that makes the ABI useless in practice. But Gilectomy needs to use a somewhat different C API to gain parallelism.

A different C API could perhaps provide a benefit that would act as a carrot for users to switch to using it, but he is not sure what should be offered. It is, in some ways, similar to the Python 2 to 3 transition, changing the API for performance or parallelism may not provide enough incentive for extension authors and users to port their existing code.

The C API is used both by the CPython core and C extensions. Beyond that, it used by low-level debuggers as well. But all of the header files for Python reside in the same directory, which makes it hard to determine what is meant to be exposed and what isn't. In the past, there have been some mistakes in adding to the API when that wasn't the intent. It might make sense to break out the headers that are meant to describe the API into their own directory, he suggested.

Python 3.7 is as fast as Python 2.7 on most benchmarks, but 2.7 was released in 2010. Users are now comparing Python performance to that of Rust or Go, which had only been recently announced in 2010. In his opinion, the Python core developers need to find a way to speed Python up by a factor of two in order for it to continue to be successful.

One way might be just-in-time (JIT) compilation, but various projects have been tried (e.g. Unladen Swallow, Pyston, and Pyjion) and none has been successful, at least yet. PyPy has made Python up to five times faster; "should we drop CPython and promote PyPy?". Many core developers like CPython and the C API, however. But, in his opinion, if Python is to be competitive in today's language mix, the project needs to look at JIT or moving to PyPy.

He had some other ideas to consider. Perhaps a new language could be created that is similar to Python but stricter, somewhat like Hack for PHP. He is not sure that would achieve his 2x goal, though. Compilation ahead of time (AoT) using guards that are checked at runtime, like Stinner's FAT Python project, might be a possibility to get a JIT without it needing a long warmup time. A multi-stage JIT, like the one for JavaScript, might provide the performance boost he is looking for.

Brett Cannon (who is one of the developers of Pyjion) noted that JIT projects are generally forks of CPython. That means the JIT developers are always playing catch-up with the mainline and that is hard to do. Pyjion is a C extension, but the other projects were not able to do that; the interfaces that Pyjion uses only went in for Python 3.6. He thought there might be room for consolidating some of the independent JIT work that has gone on, however.

But Mark Shannon pointed out that Pyjion and others are function-based JITs, while PyPy is tracing based. Beyond that, PyPy works, he said. Alex Gaynor, who is a PyPy developer, said that the PyPy project has changed the implementation of Python to make it more JIT friendly; that led to "a huge performance gain". He is skeptical that making small API changes to CPython will result in large performance gains from a JIT.

An attendee suggested Cython, which does AoT compilation, but its types are not Pythonic. He suggested that it might be possible to use the new type hints and Cython to create something more Pythonic. Cython outputs C, so the 2x performance factor seems possible.

Another audience member said that while it makes sense to make the ABI smaller, it is being used, so how is that going to change? It might be possible to stop it growing or growing in certain directions. One way to do that might be to require new C APIs to be implemented in PyPy before they can be merged. That might avoid the "horrible things" that some extensions (e.g. PyQt) have done. Stinner responded, "I did not say I have solutions, I only have problems", to some chuckles around the room.

PyPy has gotten its CPyExt extension API to work better, so NumPy now works for the most part, an attendee said. Problems can be fixed using the original rewrite. The long arc is to push more extension writers away from the C API and to the C Foreign Function Interface (CFFI). But Stinner is still concerned that the problem is bigger than just extensions; the C API is preventing some innovative changes to CPython.

[I would like to thank the Linux Foundation for travel assistance to Portland for the summit.]

