In March, we reported on the contentious argument surrounding the Encrypted Media Extensions (EME) framework developed by a working group at the World Wide Web Consortium (W3C). At the time, there were several active protest efforts underway to dissuade the W3C from renewing the charter for the working group in question, since that renewal was slated to come up for a vote soon. Since then, although public activism has quieted down significantly, there have been several important developments.

To recap, the EME framework defines a set of APIs for Content Decryption Modules (CDMs) that implement some form of authentication scheme used to enable or disable playback of <audio> or <video> elements. While there is a simple, plain-text CDM defined in the specification (and even though open-source CDMs have been developed), the ultimate goal of EME is to allow media-delivery companies like Netflix or Hulu to deploy proprietary, binary-only CDMs that implement a DRM scheme.

The W3C's oft-stated position is that defining the EME framework in the context of other W3C web standards is better than staying out of the discussion, which would merely encourage media companies to implement their DRM playback modules through other means—such as in Flash or Silverlight. The limited API of a CDM, according to this line of reasoning, is less harmful than the security-hole–riddled Flash virtual machine and, furthermore, by engaging media companies directly the W3C can have a positive influence.

A number of outside groups take strong exception to W3C's position and object fundamentally to the inclusion of any DRM-supporting technology in a web standard. First and foremost among the critics is the Electronic Frontier Foundation (EFF), which led a campaign in January to persuade the W3C to adopt a "nonaggression covenant" on Digital Millennium Copyright Act (DMCA) litigation before it renewed the charter of the HTML Media Extensions Working Group, the group developing EME.

Charters

Like all W3C working groups, the Media Extensions group operates under a time-limited charter. In March, a W3C members' meeting was held, during which the group charter was scheduled to come up for renewal. The EFF pushed to have that renewal tied to adoption of the DMCA nonaggression covenant. But, despite several public declarations in support of the EFF's proposal (and some well-publicized protests by other DRM opponents), the Media Extensions group's charter was extended through September 2016.

The W3C blog records that the covenant was discussed at the meeting. The exact events that took place leading up to the renewal, however, are not public. In a June 6 blog post, EFF's Cory Doctorow wrote that:

Enough W3C members endorsed the proposed change that the charter could not be renewed. After 90 days' worth of discussion, the working group had made significant progress, but had not reached consensus. The W3C executive ended this process and renewed the working group's charter until September.

Similar wording is found in an April EFF blog post, attributing the renewal to "the executive of the W3C." In both instances, the phrasing may suggest that there was considerable internal debate in the lead-up to the meeting and that the final call was made by W3C leadership. But, it seems, the ultimate decision-making mechanism (such as who at W3C made the final decision and on what date) is confidential; when reached for comment, Doctorow said he could not disclose the process.

The W3C announcement did not address the details either, nor did the renewal notice sent to the working group's public mailing list. That notice pointed readers to another, W3C-member-only mailing list. Nevertheless, the blog post announcing the charter extension did acknowledge that there are security implications to EME. In particular, the concern for security researchers is that disclosing vulnerabilities in EME or in a particular CDM would result in prosecution under the DMCA. Although that argument found traction at the W3C, the blog post directed interested parties to a new, still-in-development working group meant to host discussions over W3C policy, rather than encouraging such debate within the Media Extensions group.

Round two

For its part, the EFF has moved on to target the working group's new charter-renewal date in September. As Doctorow explained in an email to the Media Extensions mailing list, the new plan is to make the nonaggression covenant an "exit condition" of the working group—meaning that "EME would not become a W3C recommendation until the group agreed to some covenant." The new proposal does not mandate that the covenant adopted be the one written by the EFF, just that it address the same issues.

In addition to protecting security researchers from DMCA prosecution for disclosing flaws in EME, Doctorow said, the nonaggression covenant would protect interoperability between implementations.

The reason for including interop in the covenant is that, unlike other W3C recommendations, EME will limit interop to browsers that have explicit deals with publishers -- that is, the publishers will have to bless an element of the user-agent (the CDM) for the user-agent to render its content. This is a new step for the W3C, and for the browser ecology. It's true that the Web allows servers to distinguish among clients based on things like auth tokens (my bank's web-server won't let your browser see my account details), and of course browsers have always been able choose not to implement the decoding or display of certain content. But this creates a new situation where only certain implementations of otherwise identical user-agents are acceptable; that choice is determined by publishers and enforced by law.

In other words, there are two interoperability concerns. First, a fully EME-compliant browser may not be able to play any EME-protected content, because playback is actually dependent on the publisher's choice of CDM and, importantly, the publisher can choose to block any browser for any reason.

If a particular browser's EME implementation is discovered to perform an operation not authorized by the publisher(such as to allow users to pause playback, store content for later viewing, or even provide accessibility features), the publisher can block the browser in question in the CDM module. Second and more serious, because the CDM is an "anti-circumvention" tool, the publisher can also sue the browser vendor under the DMCA.

Doctorow subsequently proposed adding a discussion of the DMCA nonaggression covenant to the working group's timeline. However, the working group's chairman, Microsoft's Paul Cotton, refused the request outright:

Discussing such a proposed covenant is NOT in the scope of the current HTML Media Extensions WG charter. If the W3C Director has wanted the HME WG to take on this task when the WG was re-chartered on March 31 I am sure he would have added this dependency to our charter.

Perhaps that reaction is to be expected; a quick survey of the working group's mailing list archive reveals that few if any questions of policy elicit any sort of response from list subscribers, particularly in recent months. The group members, it seems, prefer to stick to issues of clarifying the wording of the specification and to ignore everything else.

To be fair, the group's purpose is to write the specification, and there appears to be pressure from other working groups at the W3C to wrap the process up. Such pressure exists, even if for no other reason, because finishing the EME specification would finally complete the work of the original HTML Working Group, of which the current Media Extensions group is the last remaining vestige. Still, one could be forgiven for finding the general unresponsiveness of list subscribers to be a source of serious frustration. If, as it seems, the group members themselves cannot be persuaded to entertain calls to adopt a nonaggression covenant, the EFF has another tactic available: appealing to the W3C Advisory Committee instead.

The Advisory Committee is part of the W3C's permanent governance structure; it consists of one representative from each W3C member organization. Doctorow is the EFF representative, and he posted an open letter to other representatives, asking them to commit to supporting the EFF's move to make the DMCA nonaggression covenant an exit condition for the Media Elements working group. It is unlikely, the letter states, that the group will have completed the EME specification by September, so another charter renewal decision will have to be made.

It is difficult to handicap the EFF's odds of success. The previous attempt to force adoption of a nonaggression pact gained a fair amount of support, but may have simply been overruled by W3C leadership. Keeping the pressure on will in all likelihood allow the EFF to pick up additional allies, but whether or not it can force the W3C's executive leadership team to reconsider may be a different question entirely.

Comments (3 posted)

At the 2016 Python Language Summit, Petr Viktorin, who is the team lead for the Python maintenance group at Red Hat, described the progress that Fedora has made in switching to Python 3 by default. He also presented some work that has been done to split up the standard library to try to reduce Python's footprint for cloud deployments.

Viktorin pointed to a site that is tracking Fedora's Python 3 porting efforts. In particular, he showed the history graph that displays the progress since October 2015. Some 1300 packages are now either able to run on both Python 2 and 3 or just on 3, though there are still 1700 or so to go.

There is a group of nearly 200 packages that are "mispackaged" (shown as blue in the graph). Those are packages that have been ported to Python 3 upstream, but are not packaged for Fedora. That is "the biggest problem we have right now", he said. The team is trying to address that using an automated system to grab the package from PyPI, turn it into an RPM, and then test it. Every other package that depends on it would then also be rebuilt and tested using the updated package.

The goal is to reach 50% of packages for Fedora 25, which is planned for release in November. There are various pieces that are still on Python 2, including some desktop toolkits. GTK+\ 2 support will not be ported to Python 3, which affects GIMP, Inkscape, and Sugar that all have Python-based plugins. For the enterprise, Samba is one of the biggest projects that remains based on Python 2. Viktorin said he would "probably end up porting Samba", which was met with applause.

Both the Mercurial and Bazaar revision-control systems remain on Python 2, as does a lot of Fedora infrastructure. Kushal Das spoke up to point out that the fedmsg message bus is Python 2, so "if you want on the bus, you have to be running Python 2". OpenStack is making progress toward Python 3 due to the efforts of Victor Stinner, while Twisted has likewise been making progress, thanks to the work done by Amber Brown.

The Red Hat team is small, Viktorin said, so the focus is on helping others to do the porting to Python 3. To that end, it has been working on some porting guides. There is the py3c library that will help porting C extensions to Python 3 if the C foreign function interface (CFFI) cannot be used, though he suggested that C extensions should be using CFFI. Py3c includes a porting guide. There is also an RPM porting guide available.

He then shifted gears to discuss the Python standard library. There is "lots of stuff" that isn't being used in the library and the Fedora cloud developers are saying that Python is too big to include in the cloud images. So Fedora is splitting up the library in support of its system-python effort.

The idea is that for essential system tools and cloud deployments there will be a Python available with a subset of the full standard library. There are some precedents, including Debian's python-minimal and Fedora's unbundling of some standard library packages, such as Tkinter, IDLE, and the test package (which is the largest package in the standard library by far).

Someone in the audience wondered about simply compressing the modules using ZIP compression. Viktorin said that was considered, as was simply shipping the bytecode cache files (i.e. .pyc ). But doing so doesn't save much space.

The selection of standard library modules in system-python will likely change over time, Viktorin said. There may be value in finding a way for packages to declare what part of the standard library they need to use. It could be useful for MicroPython, he said; an attendee noted that it might be useful for Windows as well.

Comments (3 posted)

Nathaniel Smith envisions a future where just-in-time (JIT) compiler techniques will be commonly used in Python, especially for scientific computing. He presented his ideas on where things are headed at the 2016 Python Language Summit. He currently works at the University of California, Berkeley on NumPy and other scientific Python projects. Part of what he has been doing is "working on the big picture of what JITs will mean for scientific computing".

The adoption of Python 3 in scientific computing has been quite slow since 3.0 was released in 2008. But it appears to be reaching an inflection point in 2016; everyone is teaching Python 3, he said, and one-third of all NumPy downloads are for Python 3. He is hearing "when do we drop support for Python 2?" from projects these days.

That is typical of various phenomena, where there is a long period with lots of work going on, but seemingly no change. At some point, the curve of adoption (for example) goes nearly vertical; it goes from 20% to 80% quickly. "Transitions are sneaky like that", Smith said, and you see the same curve in unrelated areas, like epidemiology, where virus propagation has a similar trend. The percentage of web browser users with JavaScript JITs over time exhibits a similar curve as well.

Smith has a hypothesis that in the next two to four years, there will be a significant transition to JIT-based Python—not IronPython, Jython, or some other alternative implementation, but for CPython itself. PyPy is ten years old at this point, but to a first approximation, no one is using it. PyPy has always been resource-limited, but there is growing interest in the industry for CPython JIT technology. There has been a "huge uptick" in companies investing in Python JITs and, at the same time, JIT technology is commodifying. LLVM or libraries from Microsoft and IBM can be used to ease the building of JITs.

One of the major blockers is being resolved right now, he said. He called it "the PyPy problem": small programs don't need PyPy, but large programs can't use it because they need to access C-based extensions (e.g. NumPy). PyPy has learned "by bitter experience" that support for the C extension libraries is required.

But the advent of "whole-language JITs" has come. That means that Pyjion can pass all of the NumPy tests and Pyston is nearly there (99.27% in early May), he said. PyPy is "holding [its] nose" and implementing the technique, which has allowed it to go from 92.4% of NumPy tests passing (using the standard NumPy) in April to 96.2% in May. Others are learning from PyPy, so the numbers are changing rapidly; by the end of May, Pyston was passing all of the tests, he said.

So a few months from now, we will go from zero "drop-in compatible JITs" for Python to three. They may not be ready for deployment in production quite yet, but they are getting there.

That transition will have consequences and it is worth thinking about what is needed to get ready for them. It will lead to changes in the Python ecosystem. He is organizing a Python compilers workshop in conjunction with SciPy, which will be held in Austin, Texas in mid-July. Some of the consequences will be discussed there.

The first consequence that Smith described is that, for libraries like NumPy, there is a "catch-22". If it needs to be fast for CPython, it has to be written in C, but if it needs to be fast for a JIT, you cannot use C. He showed a simple mysum() function that totaled up the elements in an iterable. If it is passed a Python object like list(range(N)) , the JIT knows what it is and can do lots of optimizations. But if it is passed a NumPy array, which is "opaque C stuff", the JIT doesn't understand it, so it will have trouble even achieving the performance of a non-NumPy version on a JIT-less CPython.

One way to handle that would be for the JITs to gain knowledge of the NumPy internals. "As a NumPy developer, you can imagine how I feel about this", he said. But there are lots of projects that already have that knowledge (e.g. Numba, PyPy, Cython, and more; he predicts Pyston will get it "any day now"). Those projects don't like reaching into NumPy either, but have to for performance reasons.

His dream is to have one codebase that can work in any of these environments. It could be based on Cython, since "we know it works". The code could be converted to C for CPython or used directly by PyPy, Numba, and others.

"JIT engines are viciously complicated beasts", Smith said. Another consequence of the shift to JIT-based Python will be that the development and maintenance of JIT implementations will require focused and sustained effort that only companies can provide—at least currently. There are two paths forward. One is that CPython will still be driven by a diverse set of volunteers and the JITs will be driven mostly or completely by dedicated corporate teams.

There are some reasons why that path might not be the right one, though. "Companies are great", Smith said, but only represent one slice of the community. For example, he works on Python packaging for scientific-computing packages because it is hard for companies to justify doing that kind of work upstream when they have a business model based on the pain of that packaging.

There is an alternative, emerging model that would add paid contributors that work for the community. A small number of them could make a big difference as they could keep the big picture in mind and cover gaps that the companies are not filling. He pointed to the $6 million in funding for the Jupyter project as an example. Jupyter (formerly IPython) is "an overgrown REPL", but it was able to attract that kind of funding; a Python-JIT project could too.

"If that's what we want, we need to start planning now", Smith said. The Python Software Foundation (PSF) is not set up to handle that kind of mission. "Building that kind of institutional capacity takes time", he said, so work on that should start soon.

Comments (3 posted)

Dino Viehland and Brett Cannon of the Microsoft Azure data science tools group presented on Pyjion, a just-in-time (JIT) compiler for Python, at the 2016 Python Language Summit. They also discussed a JIT API they would like to be added to Python so that various JIT engines can be dropped into Python for testing and evaluation.

Cannon started by noting that Pyjion is the only Python JIT that targets Python 3 only, with Viehland adding that it is based on 3.5.1. There are three goals to the Pyjion project, which Cannon and Viehland have been working on in their spare time at Microsoft. The first is to create an API for CPython that will allow plugging in a JIT for the language.

Cannon said that a proposal for a frame evaluation API, which would allow a JIT engine to be called when a frame object is being executed, was posted to python-ideas in mid-May. A PEP was posted to the python-dev mailing list shortly after the summit. The idea behind it is "pretty simple", Viehland said.

A second goal was to produce a proof-of-concept JIT to help drive the API design, Cannon said. They used the CoreCLR JIT to start with, but the back-end might change at some point, Viehland said.

The third goal was to create a C++ framework for CPython JITs to build from. CoreCLR is written in C++; the framework may be useful to those trying to use other C++-based JITs. For example, the framework does some translation of bytecode to machine code and some type inference, Viehland said.

The reason they are trying this is fairly obvious—"faster is nicer"—Cannon said. Pyjion does its JIT at the code-object level. The Python bytecode is translated to the equivalent Microsoft intermediate language (MSIL) code. An abstract interpreter is used to gather details on the code, such as inferring types and recognizing when float and integer values can be handled without boxing.

A key piece of the puzzle is that Pyjion uses the CPython API to maintain compatibility, Cannon said. That means that extension libraries written in C (e.g. NumPy) will still run with Pyjion. In fact, the entire Python test suite is passing, with the exception of some variable-tracing tests.

All of that was done with just two changes to the CPython API. A function pointer was added to the interpreter state to call out whenever a frame gets evaluated ( InterpreterState->eval_frame ). In addition, some "scratch space" was added to Python code objects ( PyCodeObject->co_extra ). The scratch space is simply a Python object, so the memory management of it is straightforward (it goes away when the code object does).

The eval_frame hook is useful for things like debugging and tracing as well, Cannon said. It allows an injection point that did not exist before, which surprisingly opens a lot of doors. It is also simple to use.

The scratch space is used by Pyjion to track various attributes of the code: how many times it has been executed or whether JIT compilation has already been tried and failed, for example. It also contains a pointer to the JIT-compiled code and some other housekeeping information.

There were some "bumps in the road", especially with regard to the stack. Python has two stacks (one for execution and one for exception handling), while CoreCLR only has one. A bigger problem is that there are some bytecodes in CPython that leave items on the stack when the frame exits, which is forbidden in CoreCLR.

Nick Coghlan noted that 3.6 is the right time to fix any of these bytecode issues since the bytecode format will be changing. "People will hate us already, so do it now". Alex Gaynor said that PyPy has already fixed the problems with extra data left on the stack; CPython could just copy those changes.

Cannon then presented some preliminary benchmarks compared to stock Python 3.5.1. Several showed some improvement using Pyjion, though the string-heavy tests were much worse; there has been no effort to optimize those yet, he said. Out of the 41 benchmarks they ran, 14 are slower with Pyjion, 12 are the same, and 15 are faster. As Viehland noted, that was all accomplished as a part-time effort by the two of them.

There are plenty of opportunities for further optimization, Cannon said. The main thing they were striving for was compatibility with the C extensions, which they were able to achieve. Viehland added that inlining is not yet supported, which would provide a huge boost; there are also more types that could be optimized. Cannon said that they "want a JIT space race in Python", which is possible with only small additions to the API.

One attendee asked about supported platforms. Cannon said he had not built it for Linux, simply because he hadn't ported the build scripts yet. Viehland said that he expects that it will run well on Linux since "the whole point is to get ASP.NET on Linux", but they haven't done it yet.

Larry Hastings pointed out that code objects have always been immutable in Python, though he is not sure that it matters that the scratch space changes that. Cannon said that the scratch space is simply that, so it could be thrown away without causing any problems. If a code object were serialized, the scratch space could simply be ignored and the code object would still run.

The API does not provide support for tracing JITs such as PyPy, one attendee said. Cannon agreed and said that the API would have to change to support all possible JITs. Viehland said that it may make sense to add more to the API over time. The current proposal "just gets our foot in the door".

Beyond that, Pyjion is x86_64-only at this point; there is no support for ARM processors. Viehland said there has been thought about moving to the ChakraCore JIT, which does support ARM.

Comments (none posted)

Python users often complain that the language is slow. Kevin Modzelewski presented some of his findings on Python's slowness at the 2016 Python Language Summit. He works at Dropbox on the Pyston just-in-time (JIT) compiled version of Python; that project has learned some interesting things along the way about what causes Python to be slow.

The question really is why Python is slower than JavaScript, Modzelewski said, it's "obvious why it is slower than C". The common wisdom is that interpreters are slow, but the interpreter overhead for Python isn't that large. Users can get rid of that overhead using different techniques, but still complain that Python is slow.

For example, if you take a Python program and compile it with Cython without any type annotations, you will see around a 10-20% performance increase. You can also try micro-benchmarks to test the interpreter overhead, but those will also suggest that the overhead is not that high.

He asked: "What is it then?" Python has lots of features and some of them are quite slow. The C runtime is "very dynamic", which makes it slow. There are various "speed bumps" that are useful for users, but cost a lot for a JIT compiler. For example, tracebacks that include frame information for every exception are expensive. It is an attractive part of Python, but it has to be turned on all of the time. It might be a decent tradeoff to allow it to be disabled some of the time. "Everyone here can think of more" of these speed bumps, he said.

Another example he gave demonstrates the slowness of the C runtime:

import itertools sum(itertools.repeat(1.0, 100000000))

sum()

That will calculate the sum of 100 million 1.0s. But it is six times slower than the equivalent JavaScript loop. Float addition is fast, as is, but the result is not.

Larry Hastings asked what it was that was slowing everything down. Modzelewski replied that it is the boxing of the numbers, which requires allocations for creating objects. Though an audience member did point out with a chuckle that you can increase the number of iterations and Python will still give the right answer, while JavaScript will not.

Modzelewski said that the Python core developers have spent a lot of time on increasing the speed of the interpreter and on optimizing bytecode dispatch in particular. Either those problems have been fixed or those areas weren't really the problem, because enough time has been spent to have fixed problems like that by now. He suggested that efforts to improve Python's performance be shifted to other areas.

Raymond Hettinger wondered about the tradeoffs involved. There is a lot of checking for things like signals and recursion depth in the C runtime that have a fairly high cost. Would it make sense to allow those to be relaxed or eliminated under some circumstances? Modzelewski agreed that it might make sense to try that, but since Pyston is an alternative Python, it can't make those kinds of changes unilaterally. Hettinger was also concerned that there are proposals for features that could make the C runtime costs higher and wanted to ensure that the impact of those was discussed before they become part of the language.

Comments (none posted)

Kushal Das is concerned about the number of patches in the CPython review queue. He has done some work to automate the testing of those patches, which he presented at the 2016 Python Language Summit.

There are too many patches and too few contributors to review them, he said. His first patch took a year and a half to get reviewed and merged—then it broke the next day. So he started thinking about how the patches could be tested automatically, which would help reviewers by giving them some assurance that the patches will build and run.

He started looking for systems to run the tests on. The Python Software Foundation infrastructure team was not particularly thrilled with the idea of running random code from the internet. Beyond that, building from source and running all the tests took longer than fifteen minutes, so the Travis CI system he was testing on would time out and kill the process.

He started asking around and soon heard about ci.centos.org, which is a Jenkins-based system with fairly beefy bare-metal servers. It takes less than five minutes to build and test on those systems. The CentOS project is "really happy" to have people using its continuous integration servers, Das said.

He has written an IRC bot that listens for announcements of bug-fix patches that are posted (which come from a different bot). It then builds and tests the main branch with the patch applied and could post a comment in the bug report. When the GitHub migration is complete, he will update his code to use that bug tracker.

Barry Warsaw noted that patches often come in without tests or without correct tests. He suggested adding something like diff-cover into the mix to determine whether the tests are covering the changed lines, which Das said could be done. It would be nice to be able to test Windows and OS X as well, an attendee said, but even without that, an automated system like this is an improvement over what is done now.

Comments (none posted)

As the final presentation of the 2016 Python Language Summit—though it was followed by a few lightning talks that we are not covering—Christian Heimes led a discussion on the Python security response team. There have been some problems along the way that generally boil down to a need for more people working on the team.

Some of the problems that have occurred are things like bug reports being sent to the list, but that couldn't be reproduced, or distributions not updating their Python packages because it wasn't clear to them that there was a security fix made in an upstream release. Heimes suggested that security fixes be clearly marked in the "News" file that accompanies releases. Though there is still the problem of unrecognized security bugs, as one attendee pointed out.

There have been some bug reporters that keep mailing the team about unfixed bugs. The problem is that he gets busy sometimes, so there may be a need for more people on the team, Heimes said. Nick Coghlan said that being on the team should be part of someone's job, otherwise that work will just fall below the tasks on their list that are part of their job.

One issue that needs to be addressed as part of the migration to GitHub is ensuring there is a way to create embargoed bug reports in the new system, as one attendee noted. Right now, the Roundup-based bug tracker does have that capability, which will be needed for security bug reporting.

Guido van Rossum said that there have been problems with being responsive to external bug reporters. They sometimes get to the point of specifying dates when they will release information about the bug if they have not heard back. He said that he gets frustrated reading those emails because he doesn't have any information about the bug or the status of a fix that he could pass on. Ned Deily said that it isn't really clear who has the responsibility to handle the reports.

Coghlan suggested that creating a report that showed the time gap between the security bug reports and fixes would help. Customers would then see the problem and push to improve the response time, which could result in someone being tasked and paid to do so. Russell Keith-Magee noted that the Django Software Foundation is now paying someone to handle security bug reports, which has helped quite a bit.

Comments (none posted)