Rationalizing Python packaging

Did you know...? LWN.net is a subscriber-supported publication; we rely on subscribers to keep the entire operation going. Please help out by buying a subscription and keeping LWN on the net.

The Python language comes with a long list of nice features, in keeping with the language's "batteries included" mantra. One battery that is noticeably absent, though, is a comprehensive mechanism for the building, distribution, and installation of Python packages. That leaves packagers and users having to choose between a variety of third-party tools or just giving up and solving the whole problem themselves. The good news is that Python 3.4 is likely to solve this problem, but Python 2 users may still have to go battery shopping on their own.

Python packaging has long been recognized as a problem for users of the language. There is an extensive collection of add-on modules in the Python Package Index (PyPI), but there is no standard way for a user to obtain one of those modules (and, crucially, any other modules it depends on) and install it on their system. The distutils package — the engine behind the nearly omnipresent setup.py files found in modules — can handle some of the mechanics of installation, but it is showing its age and lacks features. Distutils2 is a fork of distutils intended to solve many of the problems there, but this project appears to have run out of steam. Setuptools is a newer approach found on many systems, but it has a long list of problems of its own. Distribute is "a deprecated fork" of Setuptools. And so on; one does not need to look for long to see that the situation is messy — and that's without looking at the variety of package formats ("egg," "wheel," etc.) out there.

For a while, the plan was to complete work on distutils2 and merge the result into the Python 3.3 release. But, in June 2012, that effort collapsed when it became clear that the work would not be anywhere near complete in time. The results were a 3.3 release without an improved packaging story, an epic email thread on the nature of the problem and what should be done about it, and a virtual halt to distutils2 work.

PEP 453

Well over one year later, a solution appears to be in sight; it takes the form of PEP 453, which, barring some unforeseen glitch, should be officially approved in the near future. This proposal, written by Donald Stufft and Nick Coghlan, charts the path toward better Python package management.

One might start by wondering why such a thing is needed in the first place. Linux users, of course, already have systems with nice package management built into them. But the world is full of users of other operating systems that lack comprehensive packaging systems. And, even on Linux, even on Debian, one is unlikely to find packages for all 35,690 packages found in PyPI, so Linux users, too, are likely to have to install modules outside of the distribution's packaging system. It would seem that there is a place for a package distribution mechanism for Python modules, much like the Perl community has long had with CPAN.

PEP 453 calls for that mechanism to be built on PyPI using the pip installer. Pip, which is already in wide use, lacks a number of the problems found in its predecessors (though pip is based on Setuptools — a dependency which is expected to go away over time). It does not attempt to solve the whole problem, so complicated programs with non-Python dependencies may still end up needing a more comprehensive tool like Buildout or conda. But, for most users, pip should be more than adequate. And, by designating pip as the officially recommended installer, the PEP should help to direct resources toward improving pip and porting modules to it.

Pip will become a part of the standard Python distribution, but in an interesting way. A copy of pip will be hidden away deep within the Python library; it can then be installed into the system using the (also included) ensurepip module. Anybody installing their own version of Python can optionally use ensurepip to install pip; otherwise they can get it independently or (for Linux users) rely on the version shipped by the distributor. Python will also include a bundle of certificate-authority certificates to verify package sources, though the PEP envisions distributors wanting to replace that with their own central CA certificate collection. For as long as pip needs Setuptools, that will be bundled as well.

This scheme thus calls for pip to be distributed with Python, but it will not strictly become a part of Python. It will remain an independently developed project that, it is expected, will advance more quickly than Python and make more frequent releases. Python's 18-month cycle was seen as being far too slow for a developing utility like pip, so the two will not be tied together. There is a plan to include updated versions of pip in Python maintenance releases, though, to ensure that security fixes get out to users eventually.

Pip for Python 2

Perhaps the most controversial part of earlier versions of this PEP was a plan to include a version of ensurepip in the next Python 3.3 and 2.7 releases as well. The motivation for this idea is clear enough: if pip is to be the standard Python package manager, it would be nice to make it easily available to all Python users. As much as the Python developers would like to see everybody using Python 3, they have a realistic view of how long it will really take for users — especially those with existing, working applications — to move off Python 2. Putting ensurepip into (say) Python 2.7.6 would make it easier for Python 2 developers to work with the official packaging system.

On the other hand, Python 2 is currently being maintained under a strict "no new features" policy; adding ensurepip would require an explicit exception that, some developers fear, could open the floodgates for similar requests from developers of other modules. There are also worries that, once ensurepip goes in, some versions of Python 2.7 will have different feature sets than others, creating confusion for application developers and users. And, though they were not in the majority, some developers clearly do not want to do anything that might encourage developers to stay with Python 2 for any longer than necessary. These concerns led to substantial opposition to adding ensurepip to point releases of older Python versions.

The end result is a compromise: the documentation for Python 3.3 and 2.7 will be updated to anoint pip as the standard package manager, but no other changes will be made to those versions — for now. Nick has stated his intent to put together a separate PEP to revisit the idea of bundling pip and Python 2.7 for separate consideration once the (relatively uncontroversial) question of getting pip into the 3.4 release is resolved.

Assuming there are no major disagreements, that resolution should happen soon. It needs to: the Python 3.4 release schedule calls for the first beta release — and associated feature freeze — to happen on November 24. The actual 3.4 release is currently planned for late February; after that, Python developers and users should have a standardized packaging and distribution scheme for the first time. "Better late than never" certainly applies in this case.

