The Python Language Summit is an annual event that is held in conjunction with the North American edition of PyCon. Its mission is to bring together core developers of various Python implementations to discuss topics of interest within that group. The 2015 meeting was held April 8 in Montréal, Canada. I was happy to be invited to attend the summit so that I could bring readers a report on the discussions there.

The summit was deemed the "Barry and Larry show" by some, since it was co-chaired by Barry Warsaw and Larry Hastings (seen at right in their stylish fezzes). Somewhere around 50 developers sat in on the talks, which focused on a number of interesting topics, including atomicity guarantees for Python operations, possible plans to make Python 3 more attractive to developers, infrastructure changes for development, better measurement for Python 3 adoption, the /usr/bin/python symbolic link, type hints, and more.





Comments (56 posted)

At the 2015 Python Language Summit, Matt Messier was first up to talk about Skython, which is an alternative Python implementation that he has been working on in stealth mode for the last two or three years. It has largely been created in a vacuum, he said, since it has just been him working on it. He has removed the global interpreter lock (GIL) from Python in Skython, which is its big feature.

He said there were lots of technical details he could go into about Skython, but that he had a limited amount of time, so he wanted to focus on one particular issue: the atomicity of operations on objects like lists and dicts in Python. Is appending to a list an atomic operation? Or can multiple threads operating on the same list interfere with each other?

There have been other attempts to remove the GIL, but they tend to slow down single-threaded operation with lots of fine-grained locking. The approach he has taken with Skython is to maintain data integrity for the interpreter itself, but to allow races when updating the same data structures in multiple threads of the program.

For example, if two threads are operating on the same list and both append to it at more or less the same time, both could complete in an undefined order. But one or both of the append operations could get lost and not be reflected in the list. He wondered if the Python core developer community wanted to specify that operations are atomic and, if it did, what that specification would be.

Jython (Python on the Java virtual machine) developer Jim Baker noted that Jython is using Java data structures that ensure the same atomicity guarantees that the standard Python interpreter (i.e. Python in C or CPython) uses. Those are not specified as part of the language, at least yet, but Baker said that there is lots of existing Python code that expects that behavior.

IronPython (Python targeting the .NET framework) developer Dino Viehland agreed with Baker. He said that, like Jython, IronPython is a Python without the GIL, but that it makes the same guarantees that CPython does. It uses lock-free reads for Python data structures, but takes a lock for update operations. Essentially, it has the same approach that Jython does, but with a different implementation.

Baker and others referenced a thread on the python-dev mailing list from many years ago (that appears to be related to this article from Fredrik Lundh). There was also a draft of a Python Enhancement Proposal (PEP) from the (now defunct) Unladen Swallow project that Alex Gaynor brought up. It was suggested that relying on old mailing list posts, articles, and PEP drafts was probably not the right approach and that either a new PEP or an update to the language reference to clarify things was probably in order.

Messier recognized that not atomically handling concurrent append (or other) operations may not be expected but, for performance it is important for Skython to bypass the fine-grained locking. While Skython is not open source, it is planned for it to be released under the Python Software Foundation (PSF) license soon. He had hoped that would happen on the day of the summit, but it appears to still be a week off.

The name "Skython" came from the name of the company where development started: SilverSky, which has since been acquired by BAE Systems. The target domain for Skython, which someone asked about, is "Python for the Cloud", which was characterized as a "cop out answer". It is intended to be highly scalable for back-end servers for web services, Messier continued. He was asked how it compares to today's solution using lots of Tornado worker processes. The idea is that handling a bunch of separate processes can be problematic, so putting them all into one multi-threaded process may simplify some things.

Brett Cannon asked about performance, but Messier said that has not been a focus of his efforts so far. He has been trying to get the Python unit tests to pass. The performance is not good right now, but he believes there is "a lot of room" for optimization.

Skython is based on Python 3.3.6, he said, which was met with applause in the room. C extensions can be written (or ported) but the C API is not the same as that provided by CPython. Most of the standard library just works with Skython. In addition, it uses a mark-and-sweep garbage collector, rather than the reference-counting implementation used by CPython.

Comments (1 posted)

Larry Hastings was up next at the summit with a discussion of what it would take to attract more developers to use Python 3. He reminded attendees of Matt Mackall's talk at last year's summit, where the creator and project lead for the Mercurial source code management tool said that Python 3 had nothing in it that the project cares about. That talk "hit home for me", Hastings said, because it may explain part of the problem with adoption of the new Python version.

The Unicode support that comes with Python 3 is "kind of like eating your vegetables", he said. It is good for you, but it doesn't really excite developers (perhaps because most of them use Western languages, like English, someone suggested). Hastings is looking for changes that would make people want to upgrade.

He wants to investigate features that might require major architectural changes. The core Python developers may be hungry enough to get people to switch that they may be willing to consider those kinds of changes. But there will obviously be costs associated with changes of that sort. He wanted people to keep in mind the price in terms of readability, maintainability, and backward compatibility.

The world has changed a great deal since Python was first developed in 1990. One of the biggest changes is the move to multi-threading on multicore machines. It wasn't until 2005 or so when he started seeing multicore servers, desktops, and game consoles, then, shortly thereafter, laptops. Since then, tablets and phones have gotten multicore processors; now even eyeglasses and wristwatches are multicore, which is sort of amazing when you stop to think about it.

The perception is that Python is not ready for a multicore world because of the global interpreter lock (GIL). He said that he would eventually get to the possibility of removing the GIL, but he had some other ideas he wanted to talk about first.

For example, what would it take to have multiple, simultaneous Python interpreters running in the same process? It would be a weaker form of a multicore Python that would keep the GIL. Objects could not be shared between the interpreter instances.

In fact, you can do that today, though it is a bit of a "party trick", he said. You can use dlmopen() to open multiple shared libraries, each in its own namespace, so that each interpreter "runs in its own tiny little world". It would allow a process to have access to multiple versions of Python at once, though he is a bit dubious about running it in production.

Another possibility might be to move global interpreter state (e.g. the GIL and the small-block allocator) into thread-local storage. It wouldn't break the API for C extensions, though it would break extensions that are non-reentrant. There is some overhead to access thread-local storage because it requires indirection. It is "not as bad as some other things" that he would propose, he said with a chuckle.

A slightly cleaner way forward would be to add an interpreter parameter to the functions in the C API. That would break the API, but do so in a mechanical way. It would, however, use more stack space and would still have the overhead of indirect access.

What would it take to have multiple threads running in the same Python interpreter? That question is also known as "remove the GIL", Hastings said. In looking at that, he considered what it is that the GIL protects. It protects global variables, but those could be moved to a heap. It also enables non-reentrant code as a side effect. There is lots of code that would fail if the assumption that it won't be called simultaneously in multiple threads is broken, which could be fixed but would take a fair amount of work.

The GIL also provides the atomicity guarantees that Messier brought up. A lock on dicts and lists (and other data structures that need atomic access) could preserve atomicity. Perhaps the most important thing the GIL does, though, is to protect access to the reference counts that are used to do garbage collection. It is really important not to have races on those counts.

The interpreter could switch to using the atomic increment and decrement instructions provided by many of today's processors. That doesn't explicitly break the C API as the change could be hidden behind macros. But, Hastings said, Antoine Pitrou's experiments with using those instructions resulted in 30% slower performance.

Switching to a mark-and-sweep garbage collection scheme would remove the problem with maintaining the reference counts, but it would be "an immense change". It would break every C extension in existence, for one thing. For another, conventional wisdom holds that reference counting and "pure garbage collection" (his term for mark and sweep) are roughly equivalent performance-wise, but the performance impact wouldn't be known until after the change was made, which might make it a hard sell.

PyPy developer Armin Rigo has been working on software transactional memory (STM) and has a library that could be used to add STM to the interpreter. But Rigo wrote a toy interpreter called "duhton" and, based on that, said that STM would not be usable for CPython.

Hastings compared some of the alternative Python implementations in terms of their garbage-collection algorithm. Only CPython uses reference counting, while Jython, IronPython, and PyPy all use pure garbage collection. It would seem that the GIL and reference counting go hand in hand, he said. He also noted that few other scripting languages use reference counting, so the future of scripting may be with pure garbage collection.

Yet another possibility is to turn the C API into a private API, so extensions could not call it. They would use the C Foreign Function Interface (CFFI) for Python instead. Extensions written using Cython might be another possible approach to hide the C extension API.

What about going "stackless" (à la Stackless Python)? Guido van Rossum famously said that Python would never merge Stackless, so that wasn't Hastings's suggestion. Instead, he looked at the features offered by Stackless: coroutines, channels, and pickling the interpreter state for later resumption of execution. Of the three, only the first two are needed for multicore support.

The major platforms already have support for native coroutines, though some are better than others. Windows has the CreateFiber() API that creates "fibers", which act like threads, but use "cooperative multitasking". Under POSIX, things are a little trickier.

There is the makecontext() API that does what is needed. Unfortunately, it was specified by POSIX in 2001, obsoleted in 2004, and dropped in 2008, though it is still mostly available. It may not work for OS X, however. When makecontext() was obsoleted, POSIX recommended that developers use threads instead, but that doesn't solve the same set of problems, Hastings said.

For POSIX, using a combination of setjmp() , longjmp() , sigaltstack() , and some signal (e.g. SIGUSR2 ) will provide coroutine support though it is "pretty awful". While it is "horrible", it does actually work. He concluded his presentation by saying that he was mostly interested in getting the assembled developers to start thinking about these kinds of things.

One attendee suggested looking at the GCC split stack support that has been added for the Go language, but another noted that it is x86-64-only. Trent Nelson pointed to PyParallel (which would be the subject of the next slot) as a possible model. It is an approach that identifies the thread-sensitive parts of the interpreter and has put in guards to stop multiple threads from running in them.

But another attendee wondered if removing the GIL was really the change that the Mercurial developers needed in order to switch. Hastings said that he didn't think GIL removal was at all interesting to the Mercurial developers, as they are just happy with what Python 2.x provides for their project.

Though there may be solutions to the multi-threading problem that are architecture specific, it may still be worth investigating them, Nick Coghlan said. If "works on all architectures" is a requirement to experiment with ways to better support multi-threading, it is likely to hold back progress in that area. If a particular technique works well, that may provide some impetus for other CPU vendors to start providing similar functionality.

Jim Baker mentioned that he is in favor of adding coroutines. Jython has supported multiple interpreters for a while now. Java 10 will have support for fibers as well. He would like to see some sort of keyword tied to coroutines, which will make it easier for Jython (and others) to recognize and handle them. Dino Viehland thought that IronPython could use fibers to implement coroutines, but would also like to see a new language construct to identify that code.

The main reason that Van Rossum is not willing to merge Stackless is because it would complicate life for Jython, IronPython, PyPy, and others, Hastings said (with Van Rossum nodding vigorously in agreement). So having other ways to get some of those features in the alternative Python implementations would make it possible to pursue that path.

Viehland also noted that there is another scripting language that uses reference counting and is, in fact, "totally single threaded": JavaScript. People love JavaScript, he said, and wondered if just-in-time (JIT) compiling should be considered as the feature to bring developers to Python 3. That led Thomas Wouters to suggest, perhaps jokingly, that folks could be told to use PyPy (which does JIT).

Hastings said that he has been told that removing the GIL would be quite popular, even if it required rewriting all the C extensions. Essentially, if the core developers find a way to get rid of the GIL, they will be forgiven for the extra work required for C extensions. But Coghlan was not so sure, saying that the big barrier to getting people to use PyPy has generally been because C extensions did not work in that environment. Someone else noted that the scientific community (e.g. NumPy and SciPy users) has a lot of C extensions.

Comments (20 posted)

PyParallel is an alternative version of Python that is aimed at removing the global interpreter lock (GIL) to provide better performance through parallel processing. Trent Nelson prefaced his talk by saying that he hadn't made much progress on PyParallel since he presented it at PyCon 2013. He did give a few talks in the interim that were well-received, however. He got started back working on the code in December 2014, with a focus on making it stable while running the TechEmpower Frameworks Benchmark, which "bombs the server" with lots of clients making simple requests that the server responds to with JSON or plaintext. The benchmark has lots of problems, he said, but it is better than nothing.

Because it focuses on that benchmark, PyParallel performs really well when running it, Nelson said. So it is really good at stateless HTTP and maintains low latency even under a high load. It will saturate all CPUs available, with 98% of that in user space and just 2% in the Windows kernel.

The latency is low, and it also has low variance. On a normal run of the benchmark with clients attempting to make 50,000 requests per second, PyParallel shows a fairly flat graph, with relatively few outliers. Nelson displayed the graphs, which report the following. Tornado and Node.js servers on the same hardware showed a lot more variance in latency (as well as higher latency than PyParallel overall). Node.js performed better than Tornado, but had some outliers that were seven times the size of the mean (Tornado and PyParallel had worst-case latencies less than three times their mean). Both the Tornado and Node.js benchmarks were run on Linux, since they are targeted at that operating system, while PyParallel was run on Windows for the same reason.

Nelson is working on another test that is more complicated than the simple, stateless HTTP benchmark. It is an instantaneous search feature for 50GB of Wikipedia article data, but it is not working yet.

PyParallel is running on Python 3.3.5. He plans to use the customizable memory allocators that are provided by Python 3.4 and would like to see that API extended so that the reference-count-management operations could also be customized.

Effectively, PyParallel tests to see if an operation is happening in parallel and, if so, performs a thread-safe version. In the common case where the operations have naturally been serialized, it takes the faster normal path. Minimizing the overhead of that test is one of the best ways to increase performance.

In the process of his work, he broke generators and exceptions, at least temporarily. He purposely disabled importing and trace functions. He also "destroyed the PyObject structure" by adding a bunch of pointers to it. Most of those pointers are not needed, so he plans to clean it all up.

People can get the code at download.pyparallel.org. "At the very least, it is very very fast", he said. He has also hacked up the CPython code to such an extent that it makes a good development testbed for others.

Comments (none posted)

The core development infrastructure for Python, which includes things like version-control systems and repository hosting, is the subject of two current PEPs. The PEPs offer competing views for how to move forward, Nick Coghlan said. He noted that Brett Cannon made a comment at one point that Python changes the way its development processes and systems work periodically, then leaves them alone for a few years. That is the enterprise approach, he said with a chuckle, which "sucks as much for us as it does for them".

Two PEPs

Coghlan has one proposed plan involving Kallithea (PEP 474), while Donald Stufft (who was not present at the summit) has a proposal involving GitHub and Phabricator (PEP 481). Coghlan's PEP is not focused on CPython development directly (his deferred PEP 462 is, however) but is instead looking at the infrastructure for many of the parts and pieces that surround CPython (e.g. the PEPs and the developer guide). We looked at some of the early discussion of the issue back in December.

Coghlan's interest in the issue stems from the fact that he already works on process and infrastructure for development as part of his day job at Red Hat. His job entails figuring out how to get code from desktops into the continuous-integration (CI) system and then into production. It is a job that people don't want to work on for free, he said. Instead they will find "something more entertaining to do".

He said that one idea behind his proposal is to try to respect the time people are putting into the patches they are contributing to Python. He would also like to minimize the time between when a developer makes a contribution and when they see it land in the Python mainline. The changes he wants to see would still allow people to use their current workflow, he said. It would create a new workflow that existing developers could cherry pick pieces from if they liked them. New projects could default to the new workflow.

One of the complaints about his proposal, which is based on free-software solutions hosted on Python Software Foundation (PSF) infrastructure, is that there would be no commercial support available, unlike with GitHub. But Red Hat allows him to spend 20% of his time working on the Python infrastructure, which will provide some of the support.

The pull-request model of GitHub is a good one, Coghlan said. The pull request becomes the main workflow element. For relatively simple projects with fairly small development teams, it works well. For those kinds of projects, you can live without an issue tracker, wiki, and mailing lists as what is needed is a way to propose changes and to discuss them, which pull requests do. That distills the development problem down to its minimal core.

Both proposals will accept GitHub pull requests as part of the workflow, though only Stufft's uses GitHub itself directly (and keeps a read-only copy of any repositories in Phabricator). Someone asked, facetiously, why not move fully to GitHub. Coghlan said it is partly a matter of risk management. Outsourcing the infrastructure to a proprietary tool is too risky in the long run.

In addition, GitHub only provides "all or nothing" access to its repositories. That makes it harder to build a good on-ramp for new developers. If Python uses its own service, it can use or create a fine-grained access control mechanism to allow new developers some access without giving them access to everything. That is currently done, to some extent, in the bug tracker, which is based on Roundup.

His intent is for the Python infrastructure (what he is calling "forge.python.org") to interface with a number of different services and tools, such as GitHub, Gerrit, and GitLab, in order to try to mesh with the workflow of contributors while still supporting the existing process for current core developers.

GitHub is well-suited for a small, central development team, whereas other tools are a better fit for how Python development is done. he said. "'Just use GitHub' is the answer for an awful lot of projects, but I don't think it's right for Python." OpenStack uses Gerrit for its code review because it is better suited to a large, distributed development team. There are some good ideas in the OpenStack workflows that might be applied to CPython development.

Brett Cannon said that Coghlan has "strong opinions" on the matter, which is part of why Cannon has been put in charge of making the decision between the two PEPs. He hopes to make a decision by the beginning of May, so those with an opinion should be trying to convince him of the right approach before then. With a grin, Coghlan agreed, "I am thoroughly biased, which is why I don't decide" the issue. Cannon said that he doesn't particularly care that GitHub is closed-source, so long as Python can get its data out if it ever needs to.

It is important to be able to accept GitHub pull requests, Coghlan said, as well as those from Bitbucket, GitLab, and potentially others. Mozilla has also adopted this practice. Mozilla agrees that open source needs open infrastructure, but projects have to go where their developers are.

Jacob Kaplan-Moss said he would try to "channel Donald [Stufft]" a bit. He noted that Django switched to GitHub and has tried to optimize it for the Django workflow as well as community contributions. GitHub is better for the contributors, though, it is just "OK" for the core developers. It can be made to work, but is not optimal. There is an existential question, he said, about whether a project focuses on the workflow for its core developers or for its contributors. Both are perfectly valid choices, but a choice needs to be made.

Patch backlog

Barry Warsaw is concerned that turning more toward contributor workflows will cause a loss of core developers. Coghlan noted that there are 2000 unmerged patches on bugs.python.org. The bottleneck is not on contributions, he said, but on review. It doesn't make sense to make it easier for someone to send a patch if the project is going to essentially ignore it, Thomas Wouters said.

The review process is being held back by the current workflow options, though, Coghlan said. You can't just take five minutes to review a patch, see that it passes the CI tests, and then say go ahead and merge it. Stufft's original proposal was GitHub-only, but he has added Phabricator into the mix since then, which addresses Coghlan's concerns about PEP 481. Coghlan would prefer his option, but can live with Stufft's.

The choice of Phabricator is a good one from a workflow perspective; if you are looking for good workflow design, Facebook (which created Phabricator) is a good place to look, Coghlan said. He personally doesn't want to work in PHP, which is what Phabricator is written in, but he can work on other parts of the problem if that is the direction chosen.

Cannon asked Kaplan-Moss about the bug backlog in Django after the switch to GitHub. At first, he said that Django had a huge backlog before the switch and still does today. After looking a little deeper, though, he noted that the bug backlog had been cut by a third since the switch, but he is "not sure if they are related".

The huge patch backlog indicates that the workflow for core developers needs to be fixed before the contributor workflow, Cannon pointed out. Contributors may not like it, but they seem to be willing to deal with the existing Python workflow, which is completely different than that of any other project. Once code can get reviewed and merged easily, other changes can follow. "As of now, no one is happy", he said. One important piece is to not lose the "things that we like" about the current workflow, Warsaw said, though he didn't go into any detail.

Reusing Python's choice

Jython developer Jim Baker asked about other projects that might want to piggyback on the choice made. It would be great if Jython could simply use the infrastructure and workflow that CPython decides on, he said. Cannon said that he is "just trying to be the guy that makes the decision", but that any choice will be one that other projects can pick up if they wish.

Coghlan expanded on that noting that the containerization choices that have become available relatively recently will make that all a lot easier. There is a lot of hype and "marketing rubbish" around Docker, but there is some good technology too. Docker has put a nice user experience on top of a bunch of Linux tools, which provides a packaging solution for Linux that application developers don't hate. It will make it easier for anyone that wants to run their own version of the infrastructure that is adopted. His goal is help make it so that "open source projects don't have to live with crappy infrastructure anymore". Cannon pointed out that members of the PSF board have told him that there would be money available to help make some of these things happen once a decision is made.

One of the advantages of a pull request is that it identifies unambiguously which version of the code a patch will apply to, an attendee said. Is it possible to automate turning the existing patch backlog into pull requests or to at least give hints on the version a patch is targeting, he wondered. Perhaps a Google Summer of Code (GSoC) project could be aimed at this problem.

Another problem is the lack of a test farm, which is at least partly due to the fragility of the tests themselves. If there were a build and test farm, the systems in the farm would never agree on the test results, or at least not reliably.

Coghlan and another attendee said that there are some efforts to get GSoC students involved in solving some of these problems. One project is to put a REST API on Roundup, which may help doing some of the automated processing of the patch backlog.

An OpenStack developer said that he was in favor of fixing the core developer workflow as a way to make things better for the whole community. While it is important to consider GitHub because everyone uses it, the most important thing is to try to ensure that patches land in the mainline within a reasonable time frame. Both Kallithea and Phabricator are good tools, but neither existed when OpenStack was looking for something, so it chose Gerrit. The project is making headway on making its review and CI systems more reusable by others, as well.

The final comment was that whatever happens, some people will complain about it. But that shouldn't make the project afraid to make a change.

Comments (1 posted)

"If you can't measure it, you can't migrate it" was the title of the next presentation, which came from Glyph Lefkowitz, who is the maintainer of the Twisted event-driven network framework. "I famously have issues with Python 3", he said, but what he mainly wants is for there to be one Python. If that is Python 2.8 and Python 3 is dropped, that would be fine, as would just having Python 3.

Nobody is working on Python 2 at this point. Interested developers cannot just go off and do the work themselves, since the core developers (which he and others often refer to as "python-dev" after the main CPython-development mailing list) are actively resisting any significant improvements to Python 2 at this point. That is because of a "fiat decision" by python-dev, not because there are technical reasons why it couldn't be done.

Beyond that, "nobody uses Python 3", at least for some definition of "nobody". There are three versions of Python in use at this point: 2.6, 2.7, and everything else. Based on some Python Package Index (PyPI) data that he gathered (which was a few months old; he also admitted the methodology he used was far from perfect), Python 2.7 makes up the majority of the downloads, while 2.6 has a significant but far smaller chunk, as well. All the other Python versions together had a smaller slice than even 2.6.

He pointed to the "Can I use Python 3?" web site, which showed 9,900 of 55,000 packages ported. But he typed in one he uses (Sentry) and it is blocked by nine dependencies that have not yet been ported to Python 3. There is a "long tail problem", in that there are lots of packages that need porting and, because of interdependencies, plenty of things won't work until most of those packages are ported. But there are lots of small packages that are essentially unmaintained at this point; they work fine for Python 2 so the developers aren't putting out updates, much less porting them to Python 3.

Lefkowitz said he spends a "lot of time worrying about Python 3", but other maintainers are just giving up. New programmers who are learning Python 3 get "really mad about packages that don't work". They go on Reddit and Hacker News to scream about it. That causes maintainers to just quietly drop out of the community sometimes. He talked to several who did not want to be identified that were burnt out by the continual harassment from those who want Python 3 support. The problem is "not being healed from the top", he said.

There are a number of examples where the project is not communicating well about deprecated features or packages, he said. PIL (Python Imaging Library) is still mentioned in official documentation, even though it has been officially supplanted by Pillow for years. That fact is not unambiguously communicated to users, who then make their projects dependent on an outdated (and unavailable for Python 3) package.

He also has some ideas on things that could be done to make Python users happier. To start with, there needs to be a better story for build artifacts. The Go language has a great support for building and sharing programs. Basically, any Go binary that is built can be copied elsewhere and just run. But static linking as the solution for binary distribution is an idea that has been around for a long time, one attendee said.

Lefkowitz noted that 6th graders who are building a game using Pygame just want to be able to share the games they make with their friends. Users don't want a Python distribution, they just want some kind of single file they can send to others. Guido van Rossum asked if these 6th-grade Pygame programmers were switching to Go, and Lefkowitz said that they weren't. Mostly they were switching to some specialized Java teaching tool that had some limited solution to this problem (it would allow sharing with others using that same environment). "We could do that for Python", he said, so that users can share a Pygame demo without becoming experts in cross-platform dynamic linking.

Performance is another area where Python could improve. People naively think that when moving to Python 3 they will suddenly get 100x the performance, which is obviously not the case. He suggested making PyPy3 the default for Python 3 so that people "stop publishing benchmarks that make Python look terrible".

Better tools should be on the list as well, he said. The Python debugger (pdb) looks "pretty sad compared to VisualVM".

But it doesn't actually matter if the Python community adopts any of those ideas. There is no way to measure if any of them have any impact on adoption or use. To start with, python-dev needs to admit that there is a problem. In addition, the harassment of library maintainers needs to stop; it would be good if some high-profile developers stepped in once in a while to say that on Reddit and elsewhere. In terms of measurement, the project needs to decide on what "solved" looks like (in terms of metrics) then drive a feedback loop for that solution.

Another thing the project should do is to release at least ten times as often as it does now (which is 18-24 months between major releases and around six months between minor releases). The current release cadence comes from a "different geologic era". Some startups are releasing 1000x as often as the current Python pace.

The problem with too few reviewers may be alleviated by a faster release cycle, Lefkowitz said. Twisted went from one release every two years to one release per quarter, so he has direct experience with increasing the frequency of releases. What Twisted found was that people move more quickly from contributors to reviewers to core developers when those cycles are shorter.

It requires more automation and more process, in faster, lighter-weight forms. The "boring maintenance fixes" will come much faster under that model. That allows new contributors to see their code in a release that much more quickly. The "slower stuff" (new features and so on) can still come along at the same basic rate.

He offered up a few simple metrics that could be used to measure and compare how Python 3 is doing. He would like to see python-dev come to some consensus on which metrics make sense and how they should be measured. For example, the PyPI numbers might be a reasonable metric, though they may be skewed by automated CI systems constantly downloading particular versions.

Another metric might be to measure the average number of blockers as reported by caniusepython3.com. The number of projects ported per month might be another. The project could even consider user satisfaction surveys to see if people are happy with Python 3. He would like to see further discussion of this on the python-dev mailing list.

Coghlan noted that one other factor might be where users are getting their Python. Since the Linux distributions are not shipping Python 3 by default (yet, mostly), that may be holding Python 3 back some in the Linux world.

Several others wanted to discuss the packaging issue. Thomas Wouters noted that there is a place for python-dev to do something about packaging, but that any such effort probably ought to include someone who is teaching 6th graders so that their perspective can be heard. Brett Cannon pointed to the Education Summit that was scheduled for the next day as a possible place to find out what is needed. Lefkowitz said that was important to do, because many have ideas on how to create some kind of Python executable bundle, but it requires knowledge from core developers to determine which of those ideas are viable.

That is the essence of the problem, Van Rossum said. The people who know what needs to be done and the people who can do it are disjoint sets. That is as true for Language Summit attendees as it will be for Education Summit attendees. Beyond that, the Distutils special interest group (SIG) is "the tar pit of SIGs".

People are already doing similar things using tools like py2exe and others, Lefkowitz said. It would be good to get them together to agree that there is a distribution problem for Python programs. Each of the solutions has its own way of tracking imports, collecting them up, and copying them around, so it would be good to come up with something common.

Barry Warsaw described a Twitter tool and format called PEX that takes a "well-written setup.py " and turns that into a kind of executable. It contains the Python interpreter, shared libraries, and imported modules needed to run the program. It "seems like the right direction" for packaging and distributing Python programs.

Łukasz Langa said that Facebook has something similar. It is "hacky but it works". It collects all of the shared library files into a single file, collects the imported modules, zips all of that up, and prepends a Bash script onto the front so that it executes like any other program. Startup time is kind of long, however. Google also has a tool with the same intent, Wouters said.

Lefkowitz concluded by saying that he thought python-dev should provide some leadership or at least point a finger in the right direction. Getting a widely adopted solution could drive the adoption of Python 3, he said. Van Rossum suggested that someone create an informational PEP to start working on the problem.

Comments (42 posted)

Many Python scripts are written to be executed with /usr/bin/python , which they expect to give them the proper version of the Python interpreter. In a session at the Python Language Summit—held April 8 in conjunction with PyCon—the future of the program at that path was discussed. At this point, /usr/bin/python is a symbolic link that points to Python 2 in most cases, but there are arguments for pointing it elsewhere—or removing it entirely. Several different developers offered their ideas on what you should happen with that link in the future: move it, eliminate it, or something else entirely. Perhaps surprisingly, "something else entirely" won a straw poll at the end of the discussion.

Nick Coghlan was first up in the discussion. He noted that the symbolic link will be gone from Fedora soon. Prior to that, programs that wanted Python 2 would use either " /usr/bin/python " or " /usr/bin/python2 ", while those wanting Python 3 would always use " /usr/bin/python3 ". Enough progress has been made in the Fedora tools that most installation images will only have Python 3—and no symbolic link for /usr/bin/python will be installed. Installing the Python 2 package will create the symbolic link (and point it at Python 2) but that will not be the default.

Coghlan wondered whether the "upstream recommendation" about the symbolic link (in the form of PEP 394) should change. It currently recommends that " python " point to Python 2, but that will eventually need to change. By the time Python 3.6 is released in early 2017, the unqualified symbolic link will have been gone from Fedora for more than a year, Coghlan said. The question is whether Fedora (and others) will want to bring the symbolic link back in the Python 3.6 time frame, but point it to Python 3, rather than 2, which is his preferred solution. Over that year, anything that refers to the unqualified symbolic link will break in Fedora, which should force it to get fixed. So more conservative platforms could potentially upgrade directly from the link pointing to Python 2 to pointing at Python 3.

Up next was Matthias Klose, who does not think the symbolic link should be changed at all. He noted that distributions have been dealing with upgrades like Python 2 to 3 for a long time with a variety of mechanisms. For GCC, for example, Debian and Ubuntu manually handled a symbolic link for GCC 4.9. The "alternatives" mechanism is another possibility, but that makes more sense for things like choosing an editor than it does for choosing a version of Python. "Diversions", where the program gets renamed and replaced by a newer version, can also be used, but that is not done often, he said.

There is a parallel to switching the Python symbolic link in the switch of the /bin/sh link from Bash to dash. That change was made in Ubuntu in 2006 and in Debian in 2009 but there are still complaints about shell scripts that won't run on Ubuntu (because they contain Bash-isms). He showed that there are still unresolved bugs in the Debian bug tracker from the switch. That change was made more than eight years ago and problems are still trickling in, he said.

The /bin/sh program has a "concrete meaning" as a POSIX shell, but the Python symbolic link lacks that. Some distributions have already switched the link to Python 3, which has caused "breakage in the wild", Klose said. It will take years to track down all of the breakage and fix it, so it is just easier not to change the symbolic link at all. Programs that care should simply specify Python 2 or 3.

Barry Warsaw said that he was aligned with Klose. PEP 394 should clearly state that programs should be explicit and choose either python2 or python3 . Coghlan asked, what about programs that don't care what version they get? Warsaw said that programs shipped by distributions should care, thus should be explicit.

There is a different problem for what users get when they sit down at the terminal and type " python ", though. The Bash "command not found" functionality could be used to suggest python3 to users, Warsaw said. For distribution-supplied programs, though, the " #!/usr/bin/python " or " #!/usr/bin/env python " lines (also known as "shebang" lines) should be changed to be explicit on which version the program needs. If they don't care, they should "use Python 3".

Monty Taylor is "somewhere in between" Coghlan and Klose/Warsaw. He would like to see the symbolic link continue to exist until Python 2 fades away. That would come in something like five years, he said, not six months. He would like to not to have to care about Python 2.6, which is now past its end of life but, because Red Hat is still supporting it for certain RHEL releases, he still needs to support it for his packages. Those kinds of situations are likely to persist. Someday, there will be only one Python, but that is not true yet.

Thomas Wouters asked about PyPy. Does the symbolic link always mean CPython? No one seemed interested in switching the link in that direction, however.

The version a user gets when they type python is an important question that should be decoupled from the question about the shebang line, Glyph Lefkowitz said. Having an unqualified python on the shebang line should give a warning now, but for the command-line case, something different could be done. He suggested creating some sort of tool that gives users a menu of choices when they type " python ". Warsaw suggested that some kind of configuration parameter could be used to govern whether users got the menu or a particular version of Python. That is what Apple does for programs of this sort, Lefkowitz said.

The various tutorials and other documentation typically just specify " python " for the command line (or the shebang line), so distributions will need to provide something there, one attendee noted. Users are likely to just want whatever the default is, which is Python 2 for now, but that will change.

Larry Hastings conducted a straw poll to see which of the four options was most popular. It was an informal poll that explicitly allowed people to vote more than once, but the outcome was interesting. Seven developers thought that python should point to Python 3 in the 3.6 time frame; 11 thought the symbolic link should not be changed; 19 thought it should be switched at the point where there is only one Python; and 27 agreed with Lefkowitz's idea of a new program that would get run when users type python .

Comments (3 posted)

One of the headline features targeted at Python 3.5 (which is due in September) is type hinting (or type hints). Guido van Rossum gave an introduction to the feature at a Python Language Summit session. Type hints are aimed at static analysis, so several developers of static analysis tools were involved in the discussion as well.

The current proposal for type hints for Python is laid out in PEP 484. It uses "function annotations as I originally envisioned them", Van Rossum said. Those annotations were proposed back in 2000 with the idea that they would provide information to the interpreter to "generate super good code". But the annotations don't really help with code generation, so they are not meant to help the interpreter at this point.

Instead, type hints are designed to be ignored by the interpreter and to "not slow it down in most cases". The feature is targeted at being used by a "lint on steroids", he said.

He put up a slide with example code that he said showed some of the problems with the annotation syntax, but gave a reasonable flavor of how it would work. Here is an excerpt:

from typing import List, Tuple, Callable def zip(xx: List[int], yy: List[int]) -> List[Tuple[int, int]]: ... def zipmap(f: Callable[[int, int], int], xx: List[int], yy: List[int]) -> List[Tuple[int, int, int]]: ...

In the example, zip() takes two arguments that are lists of integers and returns a list of two-integer tuples. zipmap() takes a function that takes two integer arguments and returns an integer along with two lists of integers; it returns a list of three-integer tuples. There is also support for generic types, so that the annotations can go "beyond concrete types", he said.

Stub files are "boring stuff that is nevertheless important", Van Rossum said. They provide a mechanism to annotate functions in C extension modules without becoming a burden on Argument Clinic, which is a domain-specific language for specifying arguments to Python built-ins. Stubs are also useful for things that you can't or don't want to annotate. The stubs are stored in .pyi files corresponding to the Python extension (e.g. base64.pyi ) using the same function annotation syntax. There is one addition, though, of an @overload decorator for overloaded functions.

For 3.5, he is hoping to get typing.py added to the standard library. That is the entirety of the changes needed for this proposal, as there are no changes to CPython or to the Python syntax. The addition is "pure Python", but there are "a lot of metaclasses" and other scary stuff in typing.py . There are no plans for annotations for the standard library in 3.5, though he does anticipate some third-party stubs for standard library modules. The mypy tool that served as inspiration for the PEP already has some stubs for the standard library.

Putting typing.py into the standard library sends a signal that this is what the core Python developers want in terms of type hints. It encourages everyone who thinks that type hints are a good thing to use the same syntax. For example, the PyCharm IDE has its own notion of stubs and Google has a bunch of tools that it has released as open source (or will); both of those could benefit from a single standard type hint syntax.

There are no plans to force this feature on anyone that doesn't want to use it. He would like to get it into 3.5 before the feature freeze that accompanies the first beta release (due in May). That target will help "focus the PEP design". The typing.py module would be added as a provisional package, which means that it can still evolve as needed during the rest of the release cycle.

Some have wondered why there isn't a new syntax being designed for type hints. One reason is that typing.py will still work with earlier versions of Python 3, Van Rossum said. Those who are interested can just install it from the Python Package Index (PyPI). New syntax is also a "tar pit of bikeshedding". For 3.6, core developers "might muster up the courage" to add syntax for variable types (rather than use comments as is proposed with PEP 484).

There is a problem with forward references right now that the PEP solves by using string literals rather than types:

class C: def foo(self) -> 'List[C]': ...

__future__

from __future__ import annotations

Łukasz Langa is working on a way to get around that problem using aimport:That would turn all annotations into string values as they are parsed, which would neatly avoid the problem that the CPython parser can't handle the forward references, while the static analyzers can.

At that point, Mark Shannon from Semmle, which is a company that makes static analyzers for Python and other languages, stepped up to talk about the proposal. He had a number of questions and concerns about the PEP, though syntax was not among them. Shannon said that he didn't care what the syntax was, his worries were about the semantics of the annotations.

Shannon is concerned about the lack of a distinction between classes and types. Also, the scope of type declarations is not well-defined. There is not much support for duck typing, either. Van Rossum admitted that duck typing is not supported, mostly because it doesn't fit well with static type analysis. The intended scope of type declarations is clear in Van Rossum's mind, but it may not be in the PEP, he said.

Shannon said that it was important to stop thinking about programs as code to run. Instead, for static analysis purposes, they should be looked at as a bit of text to analyze. He also suggested that any tools have two modes: "linting" mode to report when the type being used is not the same as what is required and "strict" mode that reports when the tool is unable to prove that the proper type is being used.

Van Rossum invited Shannon to co-author the PEP with him if he was willing to commit to the 3.5 time frame. Shannon said he was willing to work on it under that constraint.

The hope is that there will be new syntax for variable types in the 3.6 time frame, Van Rossum said. Jython developer Jim Baker was in favor of that. It would allow access to the variable annotations from the standard abstract syntax tree (ast) module.

Larry Hastings wondered why the PEP was trying to avoid using Argument Clinic. It is, he said, the perfect place to put this kind of information. Van Rossum said that there must have been some kind of misunderstanding at one point, so he apologized and agreed that Argument Clinic should be used.

The basic idea behind PEP 484 is to create a common notation, Van Rossum said. He was mostly hoping that the assembled developers would not be too unhappy with that notation, which seemed to be true. Thomas Wouters noted that he had not mentioned exceptions, which Van Rossum acknowledged. He has heard about some bad experiences with Java exception checking, so he avoided dealing with that for now. Langa, who is another co-author of the PEP, agreed that exceptions are "out of scope for now".

A PyCharm developer spoke up to note that the project has been doing type inference on Python programs for four years or so. The type system in the PEP is similar to what PyCharm uses, so "we feel it fits the development needs well". PyCharm can infer types for 50-60% of variables in user code, but can't get further than that without getting type annotations for function parameters.

Steve Dower said that the PEP should work well with Visual Studio, though there were still some issues to think about. It currently works by inferring types from the docstrings but could take advantage of the annotations. Other projects and companies also seemed happy with the changes.

Langa noted that at Facebook, at least, having optional typing available (as the Hack language does) eventually led to a cultural shift at the company. At some point, not having the annotations became a red flag during code review, so the company's code is moving toward type annotations everywhere.

Comments (none posted)