Python packaging: playing well with others

Did you know...? LWN.net is a subscriber-supported publication; we rely on subscribers to keep the entire operation going. Please help out by buying a subscription and keeping LWN on the net.

While the Python language is well suited to the development of modular software, it has not always been entirely friendly toward those who would distribute the resulting modules. But, as Python packaging developer Nick Coghlan described in his linux.conf.au 2014 talk on the subject, help is on the way. A number of projects currently under development within the Python community should improve packaging formats and module repositories, making the distribution of Python software easier and more secure.

Fixing the process

Python packaging, Nick said, has long suffered from "some serious social issues." There are known quality problems with many of the packages in the Python Package Index (PyPI) — and with PyPI itself — but they have proved hard to address. It comes down to trying to improve the situation without breaking the modules that are already there — a difficult task, given that some of what's there is "horribly insecure" and can't be fixed without breaking things. So, rather than overhaul the system, various developers have tried suggesting small fixes around the edges, but it turns out that not even those fixes can be done without causing problems for existing packages. Indeed, those proposals were worse; they caused pain without addressing the real problems. So, over time, numerous developers have simply given up after beating their heads against this particular wall and Python packaging failed to improve.

What was needed, Nick said, was a way to enable more ambitious changes. A step was made in that direction recently with a change in how the Python enhancement proposal (PEP) process works. Historically, all PEPs had to go through the python-dev mailing list for discussion. Many PEPs can be handled there just fine, but others can bog down if they address areas where python-dev lacks significant expertise; packaging happens to be one of those areas. To deal with the problem, the Python community decided to allow special interest groups (SIGs) to directly manage PEPs in their subject area.

An associated problem will be familiar to many development communities. The traditional process requires that any PEP be approved by Python benevolent dictator for life (BDFL) Guido van Rossum to be officially adopted. But it turns out that, as Nick put it, "Guido doesn't scale." This problem is especially acute in areas that Guido finds personally uninteresting; once again, packaging qualifies. So now there is a "BDFL delegate" system where Guido can hand his PEP-approval powers to another developer for specific proposals; Nick is that delegate in the packaging area.

All of this has led to the formation of the "Python Packaging Authority" to handle packaging issues. The developers working on packaging (a group known as "distutils-sig") no longer had to get their decisions rubber-stamped by the python-dev list; it is now the decision-making body in this area. That has resulted in a major shift of mood in distutils-sig; developers who were once saying "everything is broken and nothing can be done about it" have shifted to "much is still broken, but we're working on it." The environment has become much less toxic, and people are willing to ask for help and contribute again.

What's changing

For the near term, a relatively modest set of changes has been set in motion. The first goal is to eliminate the need to run " setup.py install " on production systems. To that end, Python 3.4 will ship with the " pip " tool for package installation — a change that was described in this article last October. Pip is not perfect, and, in particular, it does not play all that well with Linux distributions' packaging schemes, but it is a step in the right direction and the remaining problems are being worked on.

Pip, Nick said, derives from the "easy_install" work originally done by the Chandler project. But this software came with some strange default settings; it assumes that users are building a single, integrated application rather than running a system full of applications. As a result, it does things that "make sysadmins cry." Pip has been built on much of the same infrastructure, but the defaults have been changed to be less surprising. It is still far from perfect; in particular, it still lacks proper dependency resolution. Even so, pip seemed like the best choice available.

Beyond the adoption of pip, the project is adopting the "wheel" format for the distribution of pre-built binary modules. That is a bit of a change for Python, which has traditionally used a "build from source" model, but building from source tends to work poorly on Windows and Mac OS systems. Packaging modules into wheels avoids the need to build them on the target systems; this mechanism now works for the installation of Windows and Mac OS packages. Binary distribution on the Linux side is still hampered by the fact that it's difficult to do cross-distribution builds; somehow they need to come up with better solutions in that area.

Another problem is that PyPI is, at this point, about ten years old; nobody really understands it, Nick said, and it lacks things like unit tests. It is quite hard to change without breaking things. So work is proceeding on a new "warehouse" scheme that is implemented as a contemporary web application; it will serve as a replacement for the current PyPI while providing a proper foundation for the addition of new functionality. Some of this work can be seen now at preview-pypi.python.org.

PyPI, as a distribution point for software, is an obvious target for attackers. Currently there is little in the way of defense against a compromise of that site, but work is underway to provide end-to-end signing for Python packages. This effort is described in PEP 458 ("Surviving a Compromise of PyPI"); it will, Nick said, "make trusting PyPI less scary."

Once those issues are taken care of, the next step is "playing well with others." One thing that can be done to help in this area is to fix another PyPI shortcoming: it cannot distribute package metadata by itself. Instead, one must download the full packages, which, Nick said, is "just madness." So there will be new APIs to properly export package metadata.

But there is a bigger problem, which Nick introduced by saying that "cross-platform tools are great!" It is really nice for a Python developer to be able to use the same commands to install packages on any target operating system. The problem is that system integrators — Linux distributors, for example — hate these tools. To them, a language-specific tool looks like yet another language community repeating all of the same security mistakes as all the communities that came before. Language-specific tools make it impossible for a system administrator to know what's on a system; they do not play well with the packaging tools already in place.

So the Python developers would like to move toward better integration with Linux distributions. One of the first things to do in this area is to provide better package metadata for distributors to use. Work toward this goal is described in PEP 426, which describes an extensible JSON metadata format for use in APIs and communications between tools. The idea is to include system integrators in the metadata model from the outset, so that distributors can work with Python's packaging system instead of having to fight it. Additionally, the packaging developers want to strongly discourage "version pinning" — forcing the use of specific versions of modules. Version pinning makes security updates difficult or impossible; the community needs a better way of describing dependencies that allows those updates to be made.

To that end, there will be an updated versioning system with a more tightly defined syntax. The current system, Nick said, was borrowed from CPAN; it's flexible, but not formally defined. PEP 440 creates a versioning system that is formally defined; it also adds concepts like an "integrator suffix" that lets distributors add their own sub-version information. Another PEP 440 feature is "semantic versioning," which is really just a set of rules making it easy to distinguish major updates from minor updates.

PEP 440 also tries to pave the way toward better dependency handling. The current scheme has two types of dependencies: "requires" and "runtime-requires". The new scheme, instead, has five, allowing the specification of dependencies needed to build, test, run, and develop the code, along with a minimal version-pinning mechanism for "meta distributions". A meta distributor is somebody who makes a collection of modules available for convenience, but who is not distributing those modules independently.

Nick closed by saying that much of what has been described here is still a work in progress. It cannot be considered to be close to being finished until, at a minimum, end-to-end package security is in place. What the packaging developers would most like to see at the moment is feedback from distributors: is the proposed set of changes sufficient to make the process of packaging Python modules easier? Assuming that the developers get to a "yes" answer, the management of Python code on all types of systems should get much easier in the not-too-distant future.

[Your editor would like to thank linux.conf.au for funding his travel to Perth].

