This edition contains the following feature content:

This week's edition also includes these inner pages:

Brief items: Brief news items from throughout the community.

Announcements: Newsletters, conferences, security updates, patches, and more.

Please enjoy this week's edition, and, as always, thank you for supporting LWN.net.

Comments (none posted)

A "simple" utility to make a system beep is hardly the first place one would check for security flaws, but the strange case of the "Holey Beep" should perhaps lead to some rethinking. A Debian advisory for the beep utility, which was followed by another for Debian LTS, led to a seemingly satirical site publicizing the bug (and giving it the "Holey Beep" name). But that site also exploits a new flaw in the GNU patch program—and the increased scrutiny on beep has led to more problems being found.

The beep program exists to help users who need more than the simple BEL character to make a beep sound in the terminal. As noted in that BEL page, though: "On modern systems this may not make a noise; it may instead make a visual indication such as flashing the screen, or do nothing at all." Beyond that, beep is meant to give more choices than just a simple BEL , with options for frequency, duration, and more. It seems unlikely that it was also meant to allow beepers to elevate their privileges, but since it needs access to some privileged functionality, installing it setuid-root is not unheard of. Most systems today do not even have the PC speaker device in question, but it requires privileges to send the ioctl() commands needed to generate tones from it. The "Ioctl Wackiness" section of the man page foreshadows some of what has occurred:

By default beep is not installed with the suid bit set, because that would just be zany. On the other hand, if you do make it suid root, all your problems with beep bailing on ioctl calls will magically vanish, which is pleasant, and the only reason not to is that any suid program is a potential security hole. Conveniently, beep is very short, so auditing it is pretty straightforward.

The Debian advisories were released on April 2 and 3; the Holey Beep appeared soon after. The advisories and web page do not give much in the way of detail about the flaw, though the site does link to a patch for the problem. It is described as a privilege escalation flaw; while beep is not installed by default on Debian systems, it is installed as setuid-root if the user (or some dependent package) requests it. So a fairly straightforward race condition can be exploited to gain root privileges.

As described in a blog post by "pirhack", the race condition can be used to write an attacker-controlled value to any file in the filesystem. It was "a real pain in the ass" to figure out how to exploit the race, but it turns out that part of the struct input_event used by the program is not initialized and will contain the value of one of the command-line parameters from the stack. That event structure is then written to a device specified on the command line. By using symbolic links and signals at just the right time, four attacker-controlled bytes can be written to any file. Pirhack's exploit example stores " /*/x " into /etc/bash/bashrc ; creating a /tmp/x with the payload of interest will cause any new login that uses Bash to run it via the exploit.

All of that is bad news, of course, but it is fairly easily fixed. A patch was posted as part of a bug report at the (apparently inactive) GitHub repository for beep. It simply closes the race window and fully initializes the event structure for good measure. There is a similar patch linked from the Holey Beep page, but it is not exactly the same, as Hanno Böck reported to the oss-security mailing list:

However it turned out that on that joke holey beep webpage there's a patch with a hidden easter egg that's actually a vulnerability in GNU patch. GNU patch supports a legacy "ed" format for patches and that allows executing external commands.

The actual external command that gets run is mostly harmless, though it does highlight a clear flaw in patch:

!id>~/pwn.lol;beep

pwn.lol

id

That results in a file calledin the home directory of whoever ran patch (containing the output of) and, naturally, a beep. The bug in patch was reported upstream . It would seem that part of the goal of the 0day.marketing folks who put together the Holey Beep site (and were also responsible for the Dirty COW web site ) was to expose this bug in patch to a wider audience.

The Holey Beep site also lampoons a few other practices that can be found in security vulnerability (and other) sites these days, including explicit instructions that should never under any circumstances be followed. For example, the page recommends the following command (which is not unlike the commands recommended by some software packages in order to install them) to see if a system is vulnerable to Holey Beep:

$ curl https://holeybeep.ninja/am_i_vulnerable.sh | sudo bash

modprobe

At the moment, that script is benign (the speaker module and then, of course, run beep), but that could change at any time. One can find "advice" of this sort in various places, so it is worth calling attention to it. Sadly, the sarcasm of that page may fly right over the heads of those most in need of its lessons.

But the web page had other effects too. What might have slid by as a minor fix to a fairly obscure utility was instead examined closely by many more in the security-research world. From that, it was found that there were even more problems: the fix for Holey Beep is incomplete, there are integer overflows in beep's argument handling, and beep leaks information about the existence of hidden files, which could cause other problems for special files that have side effects when they are opened. All of that led Böck to wonder about the future of beep:

I question whether beep should be saved. It would require someone carefully reviewing the code and effectively become the new upstream. And all that for a tool talking to the PC speaker, which doesn't exist in most modern systems anyway. Instead distros should consider not installing it as suid or just killing the package altogether. I heard some distros (suse) replace beep with a simple "printf '\a'" which seems also a safe solution. (although it obviously kills all frequency/length/etc features of original "beep").

Given the size of beep, the seeming simplicity of its job, and the number of problems found in it, makes one wonder how many other vulnerabilities (of various severity levels) are out there lurking in utilities that are rarely, if ever, examined. It is also clear that the 0day.marketing folks have been sitting on a flaw in patch for some amount of time; how many more of those are out there, known but unannounced? Sometimes it's rather worrisome that the security of our systems is such a beeping mess. Some kind of systematic review seems called for, but doesn't seem that beeping likely.

Comments (49 posted)

The Python Package Index (PyPI) is the principal repository of libraries for the Python programming language, serving more than 170 million downloads each week. Fifteen years after PyPI launched, a new edition is in beta at pypi.org, with features like better search, a refreshed layout, and Markdown README files (and with some old features removed, like viewing GPG package signatures). Starting April 16, users visiting the site or running pip install will be seamlessly redirected to the new site. Two weeks after that, the legacy site is expected to be shut down and the team will turn toward new features; in the meantime, it is worth a look at what the new PyPI brings to the table.

Growing needs and a restart

In the early 2000s, several Python developers wrote and ran their own tools cataloging and linking to available Python packages. In 2002, Richard Jones successfully proposed PEP 301 to create an official index meant to run on a single server and linking to Python packages hosted elsewhere. Jones, and Martin von Löwis who joined him as a core maintainer soon after, started, administered, and improved the site — before the advent of Django, Flask, Pyramid, and other Python web frameworks.

Jones, von Löwis, and (starting in the 2010s) Donald Stufft were volunteers — as with Wikipedia, "the Cheese Shop" (as it was named by Barry Warsaw) became popular before it got consistent upkeep from paid staffers, and demands on PyPI's infrastructure grew steadily. For packagers' convenience and to improve the experience of end users, the maintainers started allowing packagers to upload files onto a central server via ssh ; the PyPI application assumed those files lived on its local filesystem.

For security, performance, and user experience reasons, PyPI stopped indexing projects with files that were hosted externally in 2015. As PEP 470 ("Removing External Hosting Support on PyPI") stated, often "end users want to use PyPI as a repository but the author wants to use PyPI solely as an index". Meanwhile, PyPI's file-hosting needs grew to over a terabyte. Volunteer developers and system administrators battled outages, malicious packages, and spam attacks, while the age of the code base and its structure made it hard to maintain — and Sisyphean to develop new features. Generous infrastructure donations helped; for instance, Fastly's donation of Content Delivery Network (CDN) service in 2013 improved performance substantially.

A slow takeover of functionality

Stufft worked on packaging and distribution projects for several years. He did this mostly as a volunteer, though he now works for Amazon Web Services and spends two paid days per week on PyPI, pip , and related tools. He started a replacement PyPI effort, Crate, in 2011. A few years later, he changed tack and began work on what turned into Warehouse, which proved to be a solid foundation for PyPI 2.0. Warehouse is a web application using the Pyramid web framework, with 100% test coverage of its Python code, and a Docker-based development environment to make it easier to hack on locally.

Volunteer contributors, such as developer Dustin Ingram, joined the project. Designer and front-end developer Nicole Harris volunteered, assessing legacy PyPI, articulating design goals, and starting an overhaul of the user interface. Ernest W. Durbin III worked steadily in development and operations as a volunteer, improving the infrastructure behind Warehouse's pre-production installations, first at preview-pypi.python.org, then warehouse.python.org, then pypi.io, and, since late 2013, pypi.org.

Given its years of live testing, calling pypi.org a "beta" belies its longevity and durability. (Stufft's original migration plan predicted Warehouse would gradually come to "own" various database tables "as time goes on" but didn't predict it would take quite this long.) Warehouse always had read access to the canonical PyPI database; this was easier than creating a mirror database, and enforced discipline for Warehouse developers. Legacy PyPI allowed packagers to upload releases via command-line tools like setuptools or through an in-browser interface. However, its uploading routines increasingly failed to fully record new releases (causing HTTP 500 internal server errors), which led to a ~10% error rate by June 2016. At that point, Stufft advised Python packagers that it was a better experience to upload releases to the canonical PyPI database via Warehouse, using the command line tool Twine, than via pypi.python.org. Starting in July 2017, PyPI went so far as to disable uploading via the old site.

Throughout this period, Warehouse was labeled "pre-production" to acknowledge its missing features, layout changes, and occasional outages. Uploading (an API interaction) worked well, but the browser user interface still lacked significant features. Most notably, important features, such as email management, and significant project owner/maintainer administrative functionality, such as release deletion, were only available using the legacy site.

Fresh code and momentum

In early 2016, maintainers of Python packaging and distribution tools were eager to see Warehouse development speed up so that it could replace legacy PyPI. I started speaking with the Python Software Foundation (PSF) Packaging Working Group to discuss applying for Mozilla Open Source Support (MOSS) funding; an award proposal was submitted in 2017 that Mozilla accepted. MOSS-funded work started in December 2017; I serve as Warehouse's project manager. Harris, Durbin, Ingram, Laura Hampton, and I have improved PyPI's code base and infrastructure toward to the goal of redirecting traffic to the new site and shutting down the old one.

The group has also nurtured new contributors. Jones and Stufft found that legacy PyPI could not attract a group of volunteer contributors to reduce the workload on the core maintainers, mainly because newcomers found it nearly impossible to understand, or even locally deploy, the code base. Warehouse's frameworks, docs, developer environment setup, and configuration are superior, making onboarding new developers and deploying their work far easier. Just between February 20 and March 20 this year, Warehouse merged 127 pull requests by 20 distinct authors; it continues to attract new contributors, some of whom are entirely new to open source.

Changes, new features, and deprecations

The most obvious improvement in Warehouse is the browser interface. The new site looks, as longtime Python users finding the site often notice, like a site from the current decade. The colors have changed, it's mobile-responsive, and the layout reflects what Harris has learned from user testing. The new front-end is more accessible to people with visual and motor-control disabilities (with more work to come). In the legacy code base, it was difficult to change the interface because content and presentation were mixed together. In contrast, Warehouse uses model/view/controller conventions, and uses front-end frameworks and tools: Jinja2 for templates, Sass with SCSS to handle CSS, Stimulus for JavaScript, and gulp to process and prepare front-end files for serving.

Beyond just the new interface is new functionality. Warehouse provides a chronological release history for each project (example), an easy-to-read project activity journal for project maintainers (see screen shot below), user-visible Gravatars and email addresses for maintainers on project pages, and support for multiple project URLs (e.g., for a homepage and a repo) on a project's PyPI page. Previously, to put a project description on PyPI, maintainers had to submit documents formatted in reStructuredText. Warehouse supports Markdown README files, thanks to improved metadata handling that required improvements to many parts of the Python packaging toolchain.

The original PyPI drew upon SourceForge and Freshmeat.net software categories to create a list of standard "Trove classifiers". Packagers label their releases with these classifiers to describe their target platforms, Python versions, intended audience, and frameworks, and to suggest the project's maturity status. Warehouse uses ElasticSearch for faceted search. This lets users perform intersection searches and filter the project list by multiple classifiers, making packagers' classifier choices more useful (see screen shot below). Project maintainers also no longer need to register a project with a separate command before initially uploading it to PyPI.

Overall, Warehouse has newer back-end infrastructure than legacy PyPI did, supporting new features and a more scalable site. Instead of assuming that it lives on a single server, Warehouse assumes that its PostgreSQL database, file storage, search, queueing (Redis), and other parts may live in different containers or on different machines. Durbin added configuration management and instrumented Warehouse to gather statistics to view using Datadog. In the course of his infrastructure work, he built cabotage, a new deployment infrastructure tool that securely manages secrets with end-to-end TLS and lets PyPI maintainers automate managing software and configuration changes.

In the interests of more sustainable long-term policies and to fight spam, PyPI has removed or deprecated several features already. For instance, one of the steps taken to handle a spam attack earlier this year is to require that users verify an email address in order to upload releases. And uploading new releases via the web interface instead of the API is no longer allowed, which simplifies PyPI's job.

In general, very little about a PyPI project can now be altered via the browser. Project maintainers used to be able to update release descriptions in the browser, but to update release metadata, maintainers now need to upload a new release to respect release metadata immutability. PyPI no longer allows HTTP access to APIs; it's now HTTPS-only. Also, in advance of PyPI's CDN (Fastly) turning off support for TLS 1.0 and 1.1 on June 30, Warehouse supports only TLS versions 1.2 and above.

Download counts are no longer visible in PyPI's API; instead, PyPI advises curious statisticians to use the data set it uploads to Google BigQuery. As the open-source service Libraries.io improves its PyPI dependency analysis and metrics coverage, PyPI is increasingly directing users there, instead of providing such services itself. Similarly, getting out of the documentation-hosting game and deferring to Read the Docs, PyPI no longer allows package maintainers to upload docs to pythonhosted.com. In addition, as legacy PyPI shuts down, users will also lose the ability to log in with OpenID and Google auth.

Warehouse's signature handling demonstrates a shift in Python's thinking regarding key management and package signatures. Ideally, package users, software distributors, and package distribution tools would regularly use signatures to verify Python package integrity. For the most part, however, they don't, and there are major infrastructural barriers to them effectively doing so. Therefore, GPG/PGP signatures for packages are no longer visible in PyPI's web interface. Project maintainers can still attach signatures to their release uploads, and those signatures still appear in the Simple Project API as described in PEP 503. Stufft has made no secret of his opinion that "package signing is not the Holy Grail"; current discussion among packaging-tools developers leans toward removing signing features from another part of the Python packaging ecology (the wheel library) and working toward implementing The Update Framework instead. Relatedly, Warehouse, unlike legacy PyPI, does not provide an interface for users to manage GPG or SSH public keys.

Thanks to redirects, most sites, services, and tools will probably be able to seamlessly switch to the new site when it launches on April 16. Migration guides for Python users, project maintainers, and API users are available. Currently the main snags seem to be the TLS 1.0/1.1 deprecation affecting users with old versions of OpenSSL (including users on some versions of Mac OS X) and the redirects affecting companies whose private internal package indexes include packaging clients that cannot follow an HTTP 302 redirect from pypi.python.org to pypi.org.

Future features

Shutting down legacy PyPI frees Warehouse to make major database schema changes that would have broken features in legacy PyPI and frees maintainers to concentrate on new improvements. As the MOSS award runs out, PSF's Packaging Working Group is seeking further funding to continue Warehouse work, particularly to audit and improve accessibility and application security. Volunteer Luke Sneeringer and others are discussing better authentication for release uploaders, including a bearer token authentication scheme involving Macaroons, and two-factor authentication. While Stufft is deferring to Ingram, Harris, and Durbin for day-to-day Warehouse leadership, he aims to eventually deprecate its XML-RPC API and architect new APIs, probably along RESTful lines. Warehouse developers will discuss and work on some of these tasks during sprints at PyCon in Cleveland, Ohio this May and at EuroPython in Edinburgh, Scotland in July.

Beyond security, accessibility, and APIs, Warehouse contributors are interested in performing further systematic user testing and adding user-friendly features like group/organization support for related projects and, potentially, language localization. Warehouse will also need to make it easier to change project ownership: with the acceptance of PEP 541, a long-awaited policy on "Package Index Name Retention," PyPI administrators have a policy framework to address requests to take over maintainership and ownership of abandoned project names. PyPI administrators are finalizing the implementation details now, which will enable administrators to start resolving hundreds of backlogged requests. Rather than treat user support requests like bug reports, Warehouse developers plan to create or integrate with a proper user support ticket system.

The pace of further improvements will depend on whether Python packaging and distribution tools receive further financial support and on how volunteers' enthusiasm and investment grows or shifts once the deadline urgency has passed. There is plenty to do even after the switch. The ongoing story of Python packaging will continue to evolve, and Warehouse — or something that eventually replaces it — will have to adapt.

[I would like to thank Ernest W. Durbin III, Nicole Harris, Dustin Ingram, and Donald Stufft for reviewing this article.]

Comments (24 posted)

As of this writing, 5,392 non-merge changesets have been pulled into the mainline repository for the 4.17 release. The 4.17 merge window is thus off to a good start, but it is far from complete. The changes pulled thus far cover a wide part of the core kernel as well as the networking, driver, and filesystem subsystems.

Some of the more significant changes merged so far are:

Core kernel

The ever-expanding perf_event_open() system call has gained the ability to place a kprobe or uprobe and create an event associated with it; the probe continues to exist for the life of the resulting file descriptor. Probes created this way will not be visible in the tracefs virtual filesystem. This interface was created as a way of ensuring that probes will be cleaned up when the process that created them exits.

system call has gained the ability to place a kprobe or uprobe and create an event associated with it; the probe continues to exist for the life of the resulting file descriptor. Probes created this way will not be visible in the tracefs virtual filesystem. This interface was created as a way of ensuring that probes will be cleaned up when the process that created them exits. The scheduler's load estimation code has been improved, especially for mobile and embedded workloads.

The new BPF_RAW_TRACEPOINT command for the bpf() system call attaches a BPF program to a tracepoint but performs no processing of the tracepoint arguments before calling that program. That enables tracing with minimal overhead, but requires more awareness when writing BPF programs. See this commit for a terse overview of the feature.

Architecture-specific

Support for the Andes Technologies NDS32 architecture has been added.

As described in this article, support for the blackfin, cris, frv, m32r, metag, mn10300, score, and tile architectures has been removed; it seems that none of them had any remaining users. The merge commit for this removal shrinks the kernel by almost 470,000 lines of code.

The SPARC "application data integrity" feature is now supported; it allows the application of tags to virtual-memory addresses. The tag is a four-bit value that is placed in the upper bits of the address; any reference lacking the proper tag will generate a trap. See this commit for details.

Filesystems

The XFS filesystem now supports the lazytime mount option.

mount option. The Btrfs filesystem has long supported a set of ioctl() operations to control transactions. It has been concluded that these operations are unused, so they have been removed for 4.17.

operations to control transactions. It has been concluded that these operations are unused, so they have been removed for 4.17. The CIFS filesystem now supports SMB 3.1.1 pre-authentication integrity. SMB 3.1.1 is also no longer marked "experimental".

The ext4 filesystem has been made more robust against maliciously crafted filesystem images. Maintainer Ted Ts'o warns " I still don't recommend that container folks hold any delusions that mounting arbitrary images that can be crafted by malicious attackers should be considered sane thing to do, though! "

Networking

The reliable datagram socket protocol now supports zero-copy operation.

It is now possible to apply BPF scripts to filter traffic sent by the sendmsg() and sendfile() system calls. See this commit for details.

and system calls. See this commit for details. A receive-side implementation of the TLS protocol has been added. This adds to the existing transmit implementation to give full in-kernel TLS protocol support.

A set of control-group-specific BPF hooks has been added to the bind() and connect() system calls; attached programs can modify how those calls work. Some information can be found in this commit.

Hardware support

Graphics : ARM Versatile panels, Allwinner DesignWare HDMI phys, AMD Vega12 GPUs, and Raydium RM68200 720x1280 DSI panels.

: ARM Versatile panels, Allwinner DesignWare HDMI phys, AMD Vega12 GPUs, and Raydium RM68200 720x1280 DSI panels. Media : NXP TDA1997x HDMI receivers, Omnivision ov2685 and ov5695 sensors, Renesas capture engine units, Sony CXD2880 DVB-T2/T tuner/demodulators, and SoundGraph iMON receivers.

: NXP TDA1997x HDMI receivers, Omnivision ov2685 and ov5695 sensors, Renesas capture engine units, Sony CXD2880 DVB-T2/T tuner/demodulators, and SoundGraph iMON receivers. Miscellaneous : Marvell 88PG86X voltage regulators, Aspeed KCS IPMI interfaces, and Amiga Gayle PATA controllers.

: Marvell 88PG86X voltage regulators, Aspeed KCS IPMI interfaces, and Amiga Gayle PATA controllers. Networking : NXP MCR20AVHM transceivers, Microchip LAN743x gigabit Ethernet interfaces, and Intel Ethernet connection E800 series interfaces.

: NXP MCR20AVHM transceivers, Microchip LAN743x gigabit Ethernet interfaces, and Intel Ethernet connection E800 series interfaces. Pin control: Qualcomm SDM845 pin controllers, NXP IMX6SLL pin controllers, MediaTek MT2712 pin controllers, and Allwinner H6 pin controllers.

Miscellaneous

The perf script command now supports Python 3 scripts.

command now supports Python 3 scripts. The Linux kernel memory model is now a part of the kernel proper. It includes a formal description of how memory coherency works in the kernel, along with an extensive set of tests to prove adherence to the model. See this commit for an overview of what has been merged.

Internal kernel changes

Building the kernel for x86 now requires a compiler with asm goto support. In practice that means that GCC 4.5 or later is now needed. It also rules out the use of Clang on x86 until that compiler gains asm goto support.

support. In practice that means that GCC 4.5 or later is now needed. It also rules out the use of Clang on x86 until that compiler gains support. wait_on_atomic_one() has been replaced by wait_event_var() , a more general interface; see this article for details.

has been replaced by , a more general interface; see this article for details. A massive kernel-wide cleanup has removed almost all direct invocations of system-call implementations from within the kernel. This is useful for a number of reasons: it adds flexibility to the system-call interface and makes it easier to remove set_fs() calls, among other things. A significant change to the x86 system-call mechanism is in the works for the near future.

If this turns out to be a normal two-week merge window, it can be expected to remain open until April 15, with the final 4.17 release happening in early June. The remainder of the changes merged for this release will be summarized once the merge window closes.

Comments (33 posted)

The Linux network stack does not lack for features; it also performs well enough for most uses. At the highest network speeds, though, any overhead at all is too much; that has driven the most demanding users toward specialized, user-space networking implementations that can outperform the kernel for highly constrained tasks. The express data path (XDP) development effort is an attempt to win those users back, with some apparent success so far. With the posting of the AF_XDP patch set by Björn Töpel, another piece of the XDP puzzle is coming into focus.

The core idea behind the XDP initiative is to get the network stack out of the way as much as possible. While the network stack is highly flexible, XDP is built around a bare-bones packet transport that is as fast as it can be. When a decision needs to be made or a packet must be modified, XDP will provide a hook for a user-supplied BPF program to do the work. The result combines minimal overhead with a great deal of flexibility, at the cost of a little "some assembly required" label on the relevant man pages. For users who count every nanosecond of packet-processing overhead (to the point that the 4.17 kernel will include some painstaking enhancements to the BPF JIT compiler that reduces the size of the generated code by 5%), figuring out how to put the pieces together is worth the trouble.

The earliest XDP work enabled the loading of a BPF program into the network interface device driver, with the initial use case being a program that dropped packets as quickly as possible. That may not be the most exciting application, but it is a useful feature for a site that is concerned about fending off distributed denial-of-service attacks. Since then, XDP has gained the ability to perform simple routing (retransmitting a packet out the same interface it arrived on) and, for some hardware, to offload the BPF program into the interface itself.

There are limits, though, to what can be done in the context of a network-interface driver; for such cases, AF_XDP is intended to connect the XDP path through to user space. It can be thought of as being similar to the AF_PACKET address family, in that it transports packets to and from an application with a minimum of processing, but this interface is clearly intended for applications that prioritize packet-processing performance above convenience. So, once again, some assembly is required in order to actually use it.

That assembly starts by calling socket() in the usual way with the AF_XDP address family; that yields a socket file descriptor that can (eventually) be used to move packets. First, however, it is necessary to create an array in user-space memory called a "UMEM". It is a chunk of contiguous memory, divided into equal-sized "frames" (the actual size is specified by the caller), each of which can hold a single packet. By itself, the UMEM looks rather boring:

After the memory has been allocated by the application, this array is registered with the socket using the XDP_UMEM_REG command of the setsockopt() system call.

Each frame in the array has an integer index called a "descriptor". To use those descriptors, the application creates a circular buffer called the "fill queue", using the XDP_UMEM_FILL_QUEUE setsockopt() call. This queue can then be mapped into user-space memory using mmap() . The application can request that the kernel place an incoming packet into a specific frame in the UMEM array by adding that frame's descriptor to the fill queue:

Once a descriptor goes into the fill queue, the kernel owns it (and the associated UMEM frame). Getting that descriptor back (with a new packet in the associated frame) requires creating yet another queue (the "receive queue"), with the XDP_RX_QUEUE setsockopt() operation. It, too, is a circular buffer that must be mapped into user space; once a frame has been filled with a packet, its descriptor will be moved to the receive queue. A call to poll() can be used to wait for packets to arrive in the receive queue.

A similar story exists on the transmit side. The application creates a transmit queue with XDP_TX_QUEUE and maps it; a packet is transmitted by placing its descriptor into that queue. A call to sendmsg() informs the kernel that one or more descriptors are ready for transmission. The completion queue (created with XDP_UMEM_COMPLETION_QUEUE ) receives descriptors from the kernel after the packets they contain have been transmitted. The full picture looks something like this:

This whole data structure is designed to enable zero-copy movement of packet data between user space and the kernel, though the current patches do not yet implement that. It also allows received packets to be retransmitted without copying them, since any descriptor can be used for either transmission or reception.

The UMEM array can be shared between multiple processes. If a process wants to create an AF_XDP socket attached to an existing UMEM, it simply passes its socket file descriptor and the file descriptor associated with the socket owning the UMEM to bind() ; the second file descriptor is passed in the sockaddr_xdp structure. There is only one fill queue and one completion queue associated with the UMEM regardless of how many processes are using it, but each process must maintain its own transmit and receive queues. In other words, in a multi-process configuration, it is expected that one process (or thread) will be dedicated to the management of the UMEM frames, while each of the others takes on one aspect of the packet-handling task.

There is one other little twist here, relating to how the kernel chooses a receive queue for any given incoming packet. There are two pieces to that puzzle, the first of which is yet another new BPF map type called BPF_MAP_TYPE_XSKMAP . This map is a simple array, each entry of which can contain a file descriptor corresponding to an AF_XDP socket. A process that is attached to the UMEM can call bpf() to store its file descriptor in the map; what is actually stored is an internal kernel pointer, of course, but applications won't see that. The other piece is a BPF program loaded into the driver whose job is to classify incoming packets and direct them to one of the entries in the map; that will cause the packets to show up in the receive queue corresponding to the AF_XDP socket in the chosen map entry.

Without the map and BPF program, an AF_XDP socket is unable to receive packets. You were warned that some assembly was required.

The final piece is a bind() call to attach the socket to a specific network interface and, probably, a specific hardware queue within that interface. The interface itself can then be configured to direct packets to that queue if they should be handled by the program behind the AF_XDP socket.

The intended final result is a structure that can enable user-space code to perform highly efficient packet management, with as much hardware support as possible and with a minimum of overhead in the kernel. There are some other pieces that are needed to get there, though. The zero-copy code is clearly one of them; copying packet data between the kernel and user space is fatal in any sort of high-performance scenario. Another one is the XDP redirect patch set being developed by Jesper Dangaard Brouer; that functionality is what will allow an XDP program to direct packets toward specific AF_XDP sockets. Driver support is also required; that is headed toward mainline for a couple of Intel network interfaces now.

If it all works as planned, it should become possible to process packets at a much higher rate than has been possible with the mainline network stack so far. This functionality is not something that many developers will feel driven to use, but it is intended to be appealing to those who have resorted to user-space stacks in the past. It is a sign of an interesting direction that kernel development has taken: new functionality is highly flexible, but using it requires programming for the BPF virtual machine.

Comments (10 posted)

As the 4.17 merge window opened, it seemed possible that the kernel lockdown patch set could be merged at last. That was before the linux-kernel mailing list got its hands on the issue. What resulted was not one of the kernel community's finest moments. But it did result in a couple of evident conclusions: kernel lockdown will almost certainly not be merged for 4.17, but something that looks very much like it is highly likely to be accepted in a subsequent merge window.

As a reminder: the purpose of the lockdown patches is to enforce a distinction between running as root and the ability to run code in kernel mode. Proponents of UEFI secure boot maintain that this separation is necessary; otherwise the promise of secure boot (that the system will only run trusted code in kernel mode) cannot be kept. Closing off the paths by which a privileged attacker could run arbitrary code in kernel mode requires disabling a number of features in the kernel; see the above-linked article for the details. Most users will never miss the disabled features, but there are always exceptions.

There are, naturally, a number of disagreements on how the lockdown mode is implemented. The use of a blacklist to disable "dangerous" kernel command-line options seems sure to let some of those options through. It is unlikely that all of the potentially hazardous operations supported by device drivers can ever be found. And so on.

The interesting thing, though, is that almost nobody seems to object to the lockdown concept in general — as long as it can be turned off. Even Linus Torvalds, who argued against the lockdown patches and their developers in typical Torvalds style, sees some potential value in the lockdown concept. There does not appear to be any significant opposition to making it available in the kernel.

The sticking point

The reason that the lockdown patches will not be merged this time around thus doesn't depend on their core purpose. Instead, the whole thing hinges on a single detail: the patch set automatically turns the lockdown mode on if secure boot is detected at startup time. It is the tying together of lockdown and secure boot that brought about a long and unpleasant linux-kernel thread.

Torvalds pointed out that there is a long list of security-related features that can be enabled in current kernels. None of those features depend on whether secure boot is enabled on the system; they are configured in or out on their own merits. The behavior of the kernel should not vary as a result of a BIOS setting, he argued. He also claimed that connecting the two features means that few kernel developers will ever test kernels with lockdown enabled, since few of them enable secure boot on their development systems. No "sane distribution" would ship a kernel with this mode turned on, he said. One little problem with that last claim, as Matthew Garrett pointed out, is that many major distributions have been shipping a version of this patch set for about five years.

On the other side, proponents argue that lockdown without secure boot (or something like it) will instill a false sense of security, since the lockdown can be circumvented by attacking the boot chain. As Garrett put it: "Without some sort of verified boot mechanism, lockdown is just security theater." The same is said to be true of a kernel that supports secure boot without lockdown; that kernel can be compromised after boot to run untrusted code in kernel mode — exactly the scenario that secure boot is meant to prevent. A secure-boot kernel without lockdown is not only false security for its user; it is presented as a threat to others as well:

Because a kernel signed with a generally trusted key that doesn't implement any lockdown functionality is effectively a bootloader that will load unsigned material on most machines on the market, which reduces the security of users running those machines with Secure Boot enabled.

See also this blog post from Garrett describing his view of this discussion in more detail.

Not everybody agrees that lockdown without secure boot is useless; they see it like all of the other hardening technologies that have been put into the kernel. Compromising the boot chain (and forcing a reboot) is not always an easy thing to do, especially for a remote attacker. Secure boot is unlikely to ever protect all of the places where a persistent exploit could be placed anyway — init scripts, for example. Even without secure boot, it is argued, lockdown raises the bar for a potential attacker.

The "bootloader" argument is an interesting one; it says that a kernel without lockdown can be compromised and used to load a new, modified kernel that hides any malware it contains. According to Peter Jones, this is a common model for malware installations. Under this line of reasoning, any kernel that can be corrupted in this way and carries a signature that will enable it to boot on a secure-boot system can be used to attack any system that trusts the signing key. Automatically enabling lockdown with secure boot is a way to avoid creating this kind of attack tool.

If that is the issue, Alan Cox said, then we have already lost:

Vendors of all OS's have released enough buggy but signed kernel images over the past years that rummaging around in the archive will find you a wide choice of signed boot images that'll then let you do wtf you like including chaining some other target.

One other aspect of this issue that came up briefly is the fear that, if Linux looks like a tool that can be used to compromise secure-boot systems running Windows, that Microsoft might blacklist the signing key and render Linux unbootable on most x86 hardware. David Howells expressed this worry, for example. Greg Kroah-Hartman said, though, that he has researched this claim numerous times and it has turned out to be an "urban myth".

Resolution?

Toward the end of the discussion, Torvalds (and others) suggested that lockdown should just be enabled unconditionally, especially since distributors have been shipping it for some time. The problem with that, of course, is that lockdown does occasionally break a working system. In such cases, users have been advised by distributors to disable secure boot as the easiest solution. Torvalds rephrased that point:

We'd like to just enable it all the time, but it's known to break some unusual hardware cases that we can't fix in software, and we wanted *some* way to disable it that requires explicit and verified user intervention to do that, and disabling secure boot is the easiest hack we could come up with.

Had things been expressed that way from the beginning, he said, the connection between lockdown and secure boot would have been "much more palatable".

A statement like that strongly suggests that the lockdown feature, even perhaps with a secure-boot connection, should be able to get past Torvalds eventually. Howells has said that he is reworking the patch set to loosen that connection, which may help as well. While another attempt to push this work for 4.17 could happen, it seems more likely that everybody will want to step away from this discussion and address the issue again in 4.18.

Comments (17 posted)

Car manufacturers, like most companies, navigate a narrow lane between the benefits of using free and open-source software and the perceived or real importance of hiding their trade secrets. Many are using free software in some of the myriad software components that make up a modern car, and even work in consortia to develop free software. At the recent LibrePlanet conference, free-software advocate Jeremiah Foster covered progress in the automotive sector and made an impassioned case for more free software in their embedded systems. Foster has worked in automotive free software for many years and has played a leading role in the GENIVI Alliance, which is dedicated to incorporating free software into in-vehicle infotainment (IVI) systems. He is currently the community manager for the GENIVI Alliance.

First, Foster talked about the importance of software in modern vehicles. He pointed out that software increasingly becomes the differentiator used to market cars. Horsepower no longer sells these vehicles, Foster says—features do. He claims that some companies even sell the car at cost (the old "razor/blades" or "printer/ink" business model) and make their money on aftermarket apps and features. Companies are finding it effective to get hardware from other manufacturers while improving the user experience through their software. Some of these features contribute to safety (such as alerts that help you drive within the lane or parallel park), and some may be critical, such dashboard icons that warn the driver of electrical system problems or low brake fluid.

Second, Foster introduced the special requirements that free software has to face in automobiles. The requirements and challenges facing car manufacturers are daunting, and free software will have to adapt to meet them, he said. Physical safety and software security are obvious priorities. Many automobile components are no longer purely mechanical and are, instead, electronically controlled based on sensor data, so software is part of the vehicle-safety equation.

Next, Foster turned to the benefits free software offers car manufacturers. Cars use specialized microcontrollers for different functions: one type for braking, another for measuring wheel speeds, and so on. The sheer variety of hardware is one selling point for free software, which anyone can port so it tends to turn up quickly on new systems.

Foster cited two other advantages free software offers to cars: low cost and ease of customization. For instance, the software can be upgraded to conform to the ISO 26262 safety standard (summarized by a National Instruments white paper) and brought to a high Automotive Safety Integrity Level (ASIL). The Linux kernel has not achieved this conformance, but work is being done by Nicholas McGuire at the Open Source Automation Development Lab (OSADL), in conjunction with the Linux Foundation Real Time working group, to certify the Linux kernel and a small C library for safety-critical systems. Certification is difficult because it must be done on hardware and software together, and therefore does not apply to a different version of the kernel on the same hardware, or to the same version of the kernel on different hardware. But, since Linux is free software, the certification process can be undertaken by any organization.

Do car companies understand the value of free software? Increasingly, Foster says, they do. A lot of car development goes on in Germany, where the younger employees of manufacturers are especially attuned to the value of free software. German engineers also tend to respect quality and are willing to critique technical choices (the Volkswagen emissions scandal aside), which are traits that favor free software.

On the other hand, the value of the aftermarket, as described earlier, makes manufacturers hesitate to offer truly open systems. What if users could freely customize their cars? That would eat into the profits that manufacturers could make with their own add-on tools and apps. Manufacturers can cite safety as another concern. Foster hopes that regulation, such as "right to repair" bills, could shake companies free from their attempts to control the aftermarket and thus indirectly remove barriers to the use of free software.

The acceptance of copyleft by manufacturers is qualified and unstable. Many prefer the Mozilla Public License (MPL) 2.0. The mix of software from different companies and sources complicates compliance with the GPL. Companies that putatively release free software tend not to nurture real communities, but instead just "throw it over the wall." Interestingly, auto companies that use GPL software tend to stick to software licensed under the GPLv2 and refuse to move to the GPLv3. Foster cites a couple reasons for this conservatism. Most peeving to them is the requirement that any update systems allow users to install any updates they make themselves, or obtain from third parties. This is often known as an "anti-Tivoization" clause and is mentioned in an earlier LWN article about GENIVI.

Next, the GPLv3 requires full installation information, which companies fear may force the release of trade secrets. Foster believes that this requirement would not force car companies to offer as much as they are afraid it does. But they would certainly have to share private keys that allow changes to code. The GPLv3 also prohibits the code's developers from asserting patents in order to restrict others from using the code. Foster said that this requirement is also implied by the GPLv2, but it's not explicit and therefore does not scare the manufacturers.

Car companies' aversion to the GPLv3 has deleterious effects on the software in their cars, according to Foster. Often they just choose code distributed under more permissive licenses (or "push-over licenses," as Richard Stallman called them in a LibrePlanet keynote). They may also refuse to upgrade their GPLv2-licensed code because the upgrade falls under the GPLv3. Thus, they will have to tolerate the bugs and security flaws that remain in the old code. Foster says that car manufacturers were among the downstream users whose behavior led the Yocto project to provide limited support for old GPLv2 versions.

To make GPL software more attractive to automobile companies, Foster suggested that the developer could sell exceptions , as done by various companies along the way. In other words, the developers could strip the anti-Tivoization clause and installation requirements when licensing the code to the manufacturer. Foster has written an article on this topic

Foster had other observations about the effect of using free software. He urged regulators to read code. The US Federal Aviation Administration (FAA) has hundreds of staff who can check the source code of airplanes, and Foster attributes the remarkable safety of air travel partly to this.

In general, I got a fairly positive sense from Foster's talk of the progress that free software is making in the automotive industry. Whether they really adopt an open approach to development or undermine it with technical or legal sophistry remains an open question, however.

Comments (36 posted)