Reducing Python's startup time

LWN.net needs you! Without subscribers, LWN would simply not exist. Please consider signing up for a subscription and helping to keep LWN publishing

The startup time for the Python interpreter has been discussed by the core developers and others numerous times over the years; optimization efforts are made periodically as well. Startup time can dominate the execution time of command-line programs written in Python, especially if they import a lot of other modules. Python startup time is worse than some other scripting languages and more recent versions of the language are taking more than twice as long to start up when compared to earlier versions (e.g. 3.7 versus 2.7). The most recent iteration of the startup time discussion has played out in the python-dev and python-ideas mailing lists since mid-July. This time, the focus has been on the collections.namedtuple() data structure that is used in multiple places throughout the standard library and in other Python modules, but the discussion has been more wide-ranging than simply that.

A "named tuple" is a way to assign field names to elements in a Python tuple object. The canonical example is to create a Point class using the namedtuple() factory:

Point = namedtuple('Point', ['x', 'y']) p = Point(1,2)

p.x

p[0]

namedtuple()

namedtuple()

namedtuple()

_source

namedtuple()

The elements of the named tuple can then be accessed using the field names (e.g.) in addition to the usualmechanism. A bug filed in November 2016 identifiedas a culprit in increasing the startup time for importing the functools standard library module. The suggested solution was to replace thecall with its equivalent Python code that was copied from the _source attribute of a class created with. Theattribute contains the pure Python implementation of the named tuple class, which eliminates the need to create and execute some of that code at import time (which is whatdoes).

There are a few problems with that approach, including the fact that any updates or fixes to what namedtuple() produces would not be reflected in functools . Beyond that, though, named tuple developer Raymond Hettinger was not convinced there was a real problem:

I would like to caution against any significant changes to save microscopic amounts of time. Twisting the code into knots for minor time savings is rarely worth it and it not what Python is all about.

Nick Coghlan agreed with Hettinger's assessment:

Caring about start-up performance is certainly a good thing, but when considering potential ways to improve the situation, structural enhancements to the underlying systems are preferable to ad hoc special cases that complicate future development efforts.

Hettinger closed the bug, though it was reopened in December to consider a different approach using Argument Clinic and subsequently closed again for more or less the same reasons. That's where it stood until mid-July when Jelle Zijlstra added a comment that pointed to a patch to speed up named tuple creation by avoiding some of the exec() calls. It was mostly compatible with the existing implementation, though it did not support the _source attribute. That led to a classic "bug war", of sorts, where people kept reopening the bug, only to see it be immediately closed again. It is clear that some felt that the arguments for closing the bug were not particularly compelling.

After several suggestions that the proper way to override the bug-closing decisions made by Hettinger and Coghlan was to take the issue to python-dev, Antoine Pitrou did just that. According to Pitrou, the two main complaints about the proposed fix were that it eliminated the _source attribute and that "optimizing startup cost is supposedly not worth the effort". Pitrou argued that _source is effectively unused by any Python code that he could find and that startup optimizations are quite useful:

[...] startup time is actually a very important consideration nowadays, both for small scripts *and* for interactive use with the now very wide-spread use of Jupyter Notebooks. A 1 ms. cost when importing a single module can translate into a large slowdown when your library imports (directly or indirectly) hundreds of modules, many of which may create their own namedtuple classes.

In addition, the _source attribute is something of an odd duck in that it would seem to be part of the private interface because it is prefixed with an underscore, but also that it is meant to be used as a learning tool, which is not typical for Python objects. The underscore was used so that source could be used as a tuple field name but, as Hettinger noted, it probably should have been named differently (e.g. source_ ). But he is adamant that there are benefits to having that attribute, mostly from a learning and understanding standpoint.

Ever the pragmatist, Guido van Rossum offered something of a compromise. He agreed with Pitrou about the need to optimize named tuple class creation, but hoped that it would still be possible to support Hettinger's use case:

The cumulative startup time of large Python programs is a serious problem and namedtuple is one of the major contributors -- especially because it is so convenient that it is ubiquitous. The approach of generating source code and exec()ing it, is a cool demonstration of Python's expressive power, but it's always been my sense that whenever we encounter a popular idiom that uses exec() and eval(), we should augment the language (or the builtins) to avoid these calls -- that's for example how we ended up with getattr(). [...] Concluding, I think we should move on from the original implementation and optimize the heck out of namedtuple. The original has served us well. The world is constantly changing. Python should adapt to the (happy) fact that it's being used for systems larger than any of us could imagine 15 years ago.

As might be guessed, a pronouncement like that from Van Rossum, Python's benevolent dictator for life (BDFL), led Hettinger to reconsider: "Okay, then Nick and I are overruled. I'll move Jelle's patch forward. We'll also need to lazily generate _source but I don't think that will be hard." He did add "one minor grumble", however, regarding the complexity of the CPython code:

I think we need to give careful cost/benefit considerations to optimizations that complicate the implementation. Over the last several years, the source for Python has grown increasingly complicated. Fewer people understand it now. It is much harder to newcomers to on-ramp. [...] In the case of this named tuple proposal, the complexity is manageable, but the overall trend isn't good and I get the feeling the aggressive optimization is causing us to forget key parts of the zen-of-python.

That tradeoff between complexity and performance is one that has played out in many different development communities over the years—the kernel community faces it regularly. Part of the problem is that the negative effects of a performance optimization may not be seen for a long time. As Coghlan put it:

Unfortunately, these are frequently cases where the benefits are immediately visible (e.g. faster benchmark results, removing longstanding limitations on user code), but the downsides can literally take years to make themselves felt (e.g. higher defect rates in the interpreter, subtle bugs in previously correct user code that are eventually traced back to interpreter changes).

Van Rossum's pronouncement set off a predictable bikeshedding frenzy around named tuple enhancements that eventually moved to python-ideas and may be worthy of a further look at some point. But there was also some pushback regarding Hettinger's repeated contention that shaving a few milliseconds here and there from the Python startup time was not an important goal. As Barry Warsaw said:

[..] start up time *is* a serious challenge in many environments for CPython in particular and the perception of Python’s applicability to many problems. I think we’re better off trying to identify and address such problems than ignoring or minimizing them.

Gregory P. Smith pointed to the commonly mentioned command-line utilities as one place where startup time matters, but also described another problematic area:

I'll toss another where Python startup time has raised eyebrows at work: unittest startup and completion time. When the bulk of a processes time is spent in startup before hitting unittest.main(), people take notice and consider it a problem. Developer productivity is reduced. The hacks individual developers come up with to try and workaround things like this are not pretty. [...] In real world applications you do not control the bulk of the code that has chosen to use namedtuple. They're scattered through 100-1000s of other transitive dependency libraries (not just the standard library), the modification of each of which faces hurdles both technical and non-technical in nature.

The discussion (and a somewhat dismissive tweet from Hettinger [Note: Hettinger strongly disclaims the "dismissive" characterization.]) led Victor Stinner to start a new thread on python-dev to directly discuss the interpreter startup time, separate from the named tuple issue. He collected some data that showed that the startup time for the in-development Python 3.7 is 2.3 times longer than Python 2.7. He also compared the startup of the Python-based Mercurial source code management system to that of Git (Mercurial is 45 times slower) as well as comparing the startup times of several other scripting languages (Python falls into the middle of the pack there). In the thread, Pitrou pointed out the importance of "anecdotal data", which Hettinger's tweet had dismissed:

[...] We are engineers and have to make with whatever anecdotes we are aware of (be they from our own experiences, or users' complaints). We can't just say "yes, there seems be a performance issue, but I'll wait until we have non-anecdotal data that it's important". Because that day will probably never come, and in the meantime our users will have fled elsewhere.

Python has come a long way from its roots as a teaching language. There is clearly going to be some tension between the needs of languages geared toward teaching and those of languages used for production-quality applications of various kinds. That means there is a balance to be struck, which is something the core developers (and, in particular, Van Rossum) have been good at over the years. One suspects that startup time—and the named tuple implementation—can be optimized without sacrificing that.