What should be in the Python standard library?

LWN.net needs you! Without subscribers, LWN would simply not exist. Please consider signing up for a subscription and helping to keep LWN publishing

Python has always touted itself as a "batteries included" language; its standard library contains lots of useful modules, often more than enough to solve many types of problems quickly. From time to time, though, some have started to rethink that philosophy, to reduce or restructure the standard library, for a variety of reasons. A discussion at the end of November on the python-dev mailing list revived that debate to some extent.

Jonathan Underwood raised the issue, likely unknowingly, when he asked about possibly adding some LZ4 compression library bindings to the standard library. As the project page indicates, it fits in well with the other compression modules already in the standard library. Responses were generally favorable or neutral, though some, like Brett Cannon, wondered if it made sense to broaden the scope a bit to create something similar to hashlib but for compression algorithms. Gregory P. Smith had a different take, however:

I don't think adding lz4 to the stdlib is worthwhile. It isn't required for core functionality as zlib is (lowest common denominator zip support). I'd argue that bz2 doesn't even belong in the stdlib, but we shouldn't go removing things. PyPI makes getting more algorithms easy. If anything, it'd be nice to standardize on some stdlib namespaces that others could plug their modules into. Create a compress in the stdlib with zlib and bz2 in it, and a way for extension modules to add themselves in a managed manner instead of requiring a top level name? Opening up a designated namespace to third party modules is not something we've done as a project in the past though. It requires care. I haven't thought that through.

Steven D'Aprano objected to Smith's assertion about the Python Package Index (PyPI): "PyPI makes getting more algorithms easy for *SOME* people." He noted that in many environments (e.g. schools, companies) users cannot install additional software on the computers they are using, so PyPI is not the panacea it is sometimes characterized as.

That led Cannon to suggest discussing the standard library and its role: "We have never really had a discussion about how we want to guide the stdlib going forward (e.g. how much does PyPI influence things, focus/theme, etc.)." Paul Moore wasn't sure that discussing the matter would really resolve anything, though:

I'm not sure a formal discussion on this matter will help much - my feeling is that most people have relatively fixed views on how they would like things to go (large stdlib/batteries included vs external modules/PyPI/slim stdlib). The "problem" isn't so much with people having different views (as a group, we're pretty good at achieving workable compromises in the face of differing views) as it is about people forgetting that their experience isn't the only reality, which causes unnecessary frustration in discussions. That's more of a people problem than a technical one.

A larger standard library would help those without access to PyPI, Antoine Pitrou argued, while a smaller one does not provide huge benefits: "Python doesn't become magically faster or more powerful by including less in its standard distribution: the best it does is make the distribution slightly smaller." But there are definite downsides to having a large standard library, Benjamin Peterson said:

The [development] of stdlib modules slows to the rate of the Python release schedule.

stdlib modules become a permanent maintenance burden to CPython core developers.

The blessed status of stdlib modules means that users might use a substandard stdlib modules when a better thirdparty alternative exists. These include:

Steve Dower would rather see a smaller standard library with some kind of "standard distribution" of PyPI modules that is curated by the core developers. Later in the thread, he listed numerous different Python distributions as examples of what he meant, but that just highlighted another problem, Moore said: which of those should he recommend to his users? Right now, the standard library provides the base that a Python script can rely on:

Every single one of those distributions includes the stdlib. If we remove the stdlib, what will end up as the lowest common denominator functionality that all Python scripts can assume? Obviously at least initially, inertia will mean the stdlib will still be present, but how long will it be before someone removes urllib in favour of the (better, but with an incompatible API) requests library? And how then can a "generic" Python script get a resource from the web?

Moore acknowledged that maintaining modules in the standard library has a "significant cost" but wondered if moving to the distribution model was simply shifting those costs to users—without users gaining much from it. Nathaniel Smith looked at the list of distributions and came to a different conclusion: the "single-box-of-batteries" model is not really solving the problems it needs to solve.

If Python core wants to be in the business of providing a single-box-of-batteries that solves Paul's problem, then we need to rethink how the stdlib works. Or, we could decide we want to leave that to the distros that are better at it, and focus on our core strengths like the language and interpreter. But if the stdlib isn't a single-box-of-batteries, then what is it? It's really hard to tell whether specific packages would be good or bad additions to the stdlib, when we don't even know what the stdlib is supposed to be.

But Moore found that to be overstated somewhat. For him (and presumably others), the standard library is what you can expect to find when you have Python installed. That means that various things like StackOverflow answers, tutorials, books, and so on can rely upon those pieces being present, "much like you'd expect every Linux distribution to include grep". In addition, the "batteries included" attribute is likely to have been part of what helped Python grow into one of the most popular languages, D'Aprano said. "The current model for the stdlib seems to be working well, and we mess with it at our peril."

Nathaniel Smith sees some advantages to the "standard distribution" model, though he is not sure that it would really be the best option. "But what I like about it is that it could potentially reduce the conflict between what our different user groups need, instead of playing zero-sum tug-of-war every time this comes up." Others don't see it that way, though; "not every need can be solved by the stdlib", as Pitrou put it. He continued:

So, yes, there's a discussion for each concretely proposed package about whether it's sufficiently useful (and stable etc.) to be put in the stdlib. Every time it's a balancing act, and obviously it's an imperfect decision. That doesn't mean it cannot be done.

Moore concurred: "In exploring alternatives, let's not lose sight of the fact that the stdlib has been a huge success, so we know we *can* deliver an extremely successful distribution based on that model, no matter how much it might trigger regular debates :-)" In any case, as he pointed out, a more concrete proposal (in the form of a PEP) is going to be needed before any real progress can be made. Dower floated some ideas about what a distribution might look like along the way, but, without something like a PEP to discuss, participants are often talking past each other based on their assumptions.

The topic has come up before on the Python mailing lists and at Python Language Summits. In 2015, there was a discussion at the summit on adding the popular Requests module to the standard library. Participants recognized that there were significant barriers—development pace, certificate handling, no asyncio support—to moving it into the standard library. In the end, it made sense for Requests to stay out. At the 2018 summit, Christian Heimes brought up a number of batteries that should perhaps be removed from the set, though the effort to create a PEP listing them seems to have stalled.

No firm conclusions were drawn in the discussion, but part of the underlying problem seems to be a lack of clarity on what the purpose of the standard library is. At the 2015 summit, Cannon suggested an informational PEP be drafted to solidify that; until that happens, there will be wildly differing views on what role the standard library serves. At the moment, though, there is no process to accept or reject a PEP even if one were on offer; that will have to await the new Python Steering Council, which will be elected in early February. One of the first orders of business of that group is likely to address the PEP process.

As far as adding LZ4 goes, the overall feeling from the thread is that it would be useful to have it in the standard library—at least for those not looking to change the standard library model. Adding LZ4 also requires a PEP, however, so that process may be stalled by the governance change, as well.

