This edition contains the following feature content:

This week's edition also includes these inner pages:

Brief items: Brief news items from throughout the community.

Announcements: Newsletters, conferences, security updates, patches, and more.

Please enjoy this week's edition, and, as always, thank you for supporting LWN.net.

Comments (none posted)

At PyCon 2017, Kavya Joshi looked at some of the differences between the Python reference implementation (known as "CPython") and that of MicroPython. In particular, she described the differences in memory use and handling between the two. Those differences are part of what allows MicroPython to run on the severely memory-constrained microcontrollers it targets—an environment that could never support CPython.

CPython is the standard and default implementation that everyone loves and uses, she said. But it has a bad reputation when it comes to memory use. That has led to some alternate implementations, including MicroPython.

As the "leanest and meanest" of the alternative implementations, MicroPython targets microcontrollers, so that Python can be used to control hardware. MicroPython can run in 16KB of RAM using 256KB of ROM; it implements most of Python 3.4. Some things have been left out of the language and standard library, since they do not make sense for microcontrollers, such as metaclasses and multiprocessing.

CPython and MicroPython are fairly similar under the hood, Joshi said. They are both bytecode interpreters for stack-based virtual machines. They are also both C programs. Yet they are on the opposite ends of the memory-use spectrum.

As a simple experiment, and not any kind of formal benchmark, she tested Python 3.6 and MicroPython 1.8 on a 64-bit Ubuntu 16.10 system with 4GB of RAM. Her test was to create 200,000 objects of various types (integers, strings, lists) and to prevent them from being garbage collected. She then measured the heap use of the programs from within Python itself.

CPython and MicroPython both use custom allocators to manage their heaps. But CPython will grow its heap on demand, while MicroPython has a fixed-sized heap. Since Joshi is measuring heap use, that fixed size is the upper boundary for the measurements. But, as we will see, MicroPython does not use the heap in the same way that CPython does, which will also affect the measurements.

She showed graphs of the heap usage, starting with the integer test (Joshi's slides are available at Speaker Deck). The test created 200,000 integer objects for consecutive integers starting at 1010 ( 10**10 in Python-speak). The CPython graph shows a linear increase in heap use, while the MicroPython graph is completely flat. The graphs look the same for string objects. The CPython heap use is more than that of MicroPython, but the actual numbers do not matter all that much, she said, as it is the shape of the graphs that she is interested in.

Lists show something a bit different, however. The test creates a list that it keeps appending small integers to; the graphs show the heap use as the list gets bigger. Both graphs show a step function for heap usage, though the MicroPython steps seem to double in depth and height, while the increase in step size for CPython is more gradual. One of the questions she would like to answer is why the two interpreters "have such vastly different memory footprints and memory profiles".

Objects and memory

To explain what is going on, she needed to show how objects are implemented internally for both interpreters. Since CPython uses more memory in these tests, there must be an underlying reason. Either CPython objects are larger or CPython allocates more of them—or both.

CPython allocates all of its objects on the heap, so the statement " x = 1 " will create an object for "1" on the heap. Each object has two parts, the header ( PyObject_HEAD ) and some variable object-specific fields. The header is the overhead for the object and consists of two eight-byte fields for a reference count and a pointer to the type of object it represents. That means 16 bytes is the lower bound on the size of a CPython object.

For an integer, the object-specific field consists of an eight-byte length indicating how many four-byte values will follow (remember that Python can represent arbitrary-sized integers). So to store an integer object that has a value of 230-1 or less (two bits are used for other purposes), it takes 28 bytes, 24 of which are overhead. For the values in the test ( 10**10 and up), it took 32 bytes per object, so 200,000 integer objects consumed 6MB of space.

In MicroPython, though, each "object" is only eight bytes in size. Those eight bytes could be used as a pointer to another data structure or used more directly. "Pointer tagging" allows for multiple uses of eight byte objects. It is used to encode some extra information in the objects; since all addresses are aligned to eight-byte boundaries, the three low-order bits are available to store tag values. Not all of the tag bits are used that way, though; if the low-order bit is one, that indicates a "small" integer, which is stored in the other 63 bits. A 10 2 in the low-order two bits means there is a 62-bit index to an interned string in the rest of the value. 00 2 in those bits indicates the other bits are a pointer to a concrete object (i.e. anything that is not a small integer or interned string).

Since the values stored in the test will fit as small integers, only eight bytes are used per object. Those are stored on the stack, rather than the heap, which explains the flat heap usage graph line for MicroPython. Even if you measure the stack usage, which is a little more complicated to do, Joshi said, MicroPython is using way less memory than CPython for these objects.

For the string experiment, which created 200,000 strings of ten characters or less, the explanation is similar. CPython has 48 bytes of overhead for an ASCII string ( PyASCIIObject ), so a one-character, two-byte string (e.g. "a" plus a null terminator) takes up 50 bytes; a ten-character string would take up 59 bytes. MicroPython stores small strings (254 characters or less) as arrays with three bytes of overhead. So the one-character string takes five bytes, which is 1/10 of the storage that CPython needs.

That still doesn't explain the flat line in the graph, however. Strings in MicroPython are stored in a pre-allocated array, she said. There are no new heap allocations when a string object is created. Effectively, the allocation has just been moved in time so it doesn't appear as an increase in heap size in the graph.

Mutable objects

Strings and integers are both immutable objects, but the third experiment used lists, which are mutable objects. In CPython, that adds overhead in the form of a PyGC_HEAD structure. That structure is used for garbage-collection tracking and is 24 bytes in size. In all, a PyListObject is 64 bytes in size; but you also have to add pointers for each of the objects in the list, which are stored as an array, and the storage for each object.

The steps in the graph for the list experiment indicate dynamic resizing of the array of list elements. If that resizing was done for each append, the graph would be linearly increasing as with integers and strings, but CPython overallocates the array in anticipation of further append operations. That has the effect of amortizing the cost of the resizing over multiple append operations.

The garbage collection mechanism for MicroPython is completely different from that of CPython and does not use reference counting. So, MicroPython concrete objects, which are used for lists, do not have a reference count; they still use a type pointer, though, as CPython does. Removing the reference count saves eight bytes over CPython. In addition, there is no additional overhead for garbage collection in MicroPython, which saves 24 bytes. Thus, mutable objects in MicroPython are 32 bytes smaller than their CPython counterparts.

So it turns out that CPython objects are larger than those of MicroPython, generally substantially larger. The differences for other object types, and classes in particular, are not as clear cut when compared to CPython 3.6, Joshi said, as there have been some CPython optimizations that substantially reduce the overhead. In addition, CPython allocates more objects in these tests than MicroPython does. MicroPython stores small integers directly on the stack, rather than allocate them from the heap as CPython does.

Garbage collection

The overhead incurred to track CPython objects for garbage collection is probably making the audience wonder how that works, Joshi said. CPython uses reference counting, which simply tracks references to an object; every time an object is assigned to a variable, placed on a list, or inserted into a dictionary, for example, its count is increased. When a reference goes away, because of a del() operation or a variable going out of scope, for instance, the count is decreased. When the count reaches zero, the object is deallocated.

But that doesn't work in all cases, which is where the PyGC_HEAD structure for mutable objects comes into play. If objects refer to themselves or indirectly contain a reference to themselves, it results in a reference cycle. She gave an example:

x = [ 1, 2, 3] x.append(x)

x

del(x)

PyGC_HEAD

Ifsubsequently goes out of scope oris executed, the reference count will not drop to zero, thus the object would never be freed. CPython handles that with a cyclic garbage collector that detects and breaks reference cycles. Since only mutable objects can have these cycles, they are the only objects tracked using theinformation.

MicroPython tracks allocations in the heap using a bitmap. It breaks the heap up into 32-byte allocation units, each of which has its free or in-use state tracked with two bits. There is a mark-and-sweep garbage collector that maintains the bitmap when it periodically runs; it manages all of the objects that are allocated on the heap. In that way, the mark-and-sweep approach is effectively trading execution time for a reduction in the overhead needed for tracking memory.

Joshi said that she had "barely scratched the surface" of the optimizations that MicroPython has to reduce its memory use. But the code is available; it is clear and easy to follow C that any who are interested should go and look at.

Differing approaches

She completed the talk with an evaluation of why CPython and MicroPython chose the approaches they did. Typically, there is a trade-off between memory use and performance, but that is not really the case here. MicroPython performs better than CPython for most benchmarks, Joshi said, though it does do worse on some, especially those involving large dictionaries. The trade-off here is functionality; MicroPython does not implement all of the features of CPython. That means, for example, that users do not have access to the entire ecosystem of third-party packages that CPython users enjoy.

Different design decisions were made by the CPython and MicroPython projects in part because of the philosophies behind their development. Back in the 1980s and early 1990s when CPython was being created, its developers wanted "simplicity of implementation and design first", they would "worry about performance later".

That is exactly the opposite of the philosophy behind MicroPython. It is a fairly modern implementation, so it could borrow ideas from Lua, JavaScript, and other languages that have come about in the decades since CPython was created. MicroPython is also solely targeted at these highly constrained microcontrollers. Effectively, they are targeting very different kinds of systems and use cases, which helps explain the choices that have been made.

A YouTube video of Joshi's talk is available.

[I would like to thank the Linux Foundation for travel assistance to Portland for PyCon.]

Comments (11 posted)

The Brave web browser is a project from a new company called Brave Software. It was founded by Brendan Eich, who is the inventor of JavaScript and former developer and CTO at Mozilla; he hopes to dramatically re-invent the advertising model of the web while strengthening user anonymity and security. Brave's value proposition is that instead of being served advertisements from web sites that use the revenue to pay their bills, users can opt to directly pay the content providers of their choosing with cryptocurrency. Also, there is a recognition of the utility of targeted advertising, so users have an option of saving a local, protected profile that can be used anonymously to obtain targeted advertisements instead of having their online behavior tracked and sold by a third party.

Brave is an open-source browser derived from Chromium, and as such it is based on the Blink web engine. Advertisements and user tracking are blocked by default as a built-in feature in Brave, as opposed to other browsers that offer that functionality via plugins. In 2016, Brave announced that, on top of blocking advertisements, it would let users choose to replace them with other advertisements that are sourced from a curated list of partners. Revenue from these ads would go to Brave Software as well as being shared with the publishers and others.

When the idea of advertisement replacement was first floated, the concept rattled the online publishing industry so badly that there was a backlash. That resulted in a cease and desist letter to Brave from seventeen newspaper publishing companies to vehemently protest the advertising scheme. They said that what Brave was doing was illegal and brought forward claims of copyright violation, breach of contract, unfair business practices, and unauthorized access. It is unclear if any of those accusations has any legal merit; in any case, it has not slowed down Brave Software in building the Brave browser and raising funds for its development.

However, it is not publicly known whether Brave Software still intends to roll out its planned ad-replacement mechanism. If so, there remain murky ethical issues (and possibly legal issues) in replacing advertisements from a content provider for the financial benefit of an unrelated party without the consent of the publisher, even if some of the money is given back to the said publisher. This is the reason for most of the publisher outrage at the idea.

Solving the intrusive advertisement problem

Brave may seem similar in functionality to other browsers with ad-blocking plugins installed, but the underlying goal of Brave is to subvert the existing advertising and user tracking model. Serving advertisements on the web has grown to become a complex labyrinth of scripts, cookies, and other mechanisms that range from benign to unscrupulous to outright malicious. Advertisements also suck up bandwidth that users may have to pay for; that is a burden for mobile users with limited data plans, for example.

In addition, advertisers aren't just serving advertisements to prospective customers, but also mining the personal data of said customers so that marketing can be tailored and fine-tuned. While there is value to be had for consumers in the form of customized advertisements, there are legitimate concerns about the breaches in user privacy that such tracking entails. Worse still, some advertisements are written with predatory JavaScript that attempts to install malware on users' computers. Brave's raison d'être is to block such intrusiveness and offer an alternative model of payment for users, publishers, and advertisers, while providing a revenue stream for itself as well.

The Brave browser blocks all advertisements by default. This also includes blocking tracking pixels and third-party cookies. However, Brave Software has devised a mechanism to pay content producers to offset the loss of income from blocking advertisements on their site. Using cryptocurrency, users may opt to either pay content producers a monthly fee or enable "Brave Ads", which are ads from Brave Software's partners that are delivered via anonymous protocols so users cannot be tracked.

Revenue from Brave Ads is shared with content publishers (55%), Brave's advertising partners (15%), the users themselves (15%), and a portion goes back to Brave Software (15%). Currently, Bitcoin is used for these payments, but there are plans to replace the use of Bitcoin with a new cryptocurrency called Basic Attention Tokens.

The idea behind this new payment ecosystem is to eliminate the privacy-violating data collection that is currently inherent in getting advertisements to users, and for users to directly pay content publishers. The third parties who track user behavior hoard a vast collection of user-related data that advertisers have to pay for, which puts user privacy and security at risk. By making the browser the ultimate arbiter for advertising and payments for those advertisements, the bulk of the most objectionable part of the advertising ecosystem is reduced.

Basic Attention Tokens

Advertisers seek user engagement. Thus, users' attention becomes a currency, and Brave Software has devised a scheme to quantize and monetize it in the form of Basic Attention Tokens. Since the browser is able to record user interaction with advertisements, that interaction can be enumerated and recorded for the purposes of remuneration. Apart from direct user engagement, the algorithms used to determine user attention are based on how much of an advertisement is visible on screen, and for how long. The cumulative total of these engagements is tracked on-device by the browser, and will eventually add up to an amount where it can be rewarded with BATs.

Basic Attention Tokens are built on top of Ethereum, a decentralized blockchain technology that can be used to create any number of applications, including cryptocurrency. Ethereum provides "smart contracts", which consist of code that resides in a blockchain and executes based on conditions and events. These smart contracts are used to create BAT ledgers that pay content providers based on advertisement views.

Interest in BATs as a cryptocurrency is exceedingly high. On 31 May 2017, Brave Software held an initial online sale of BATs that raised about 35 million dollars, and sold out within 30 seconds. Purchases were made using ETHs (a cryptocurrency built atop Ethereum), and about 1.5 billion units of BATs were sold. Despite the large amount of money being exchanged, only about 130 people were able to buy the tokens, and half the supply was bought out by only five parties. That frenzied trading resulted in a big chunk of money for continued development of Brave's browser and infrastructure technology.

Anonymized user activity

One of the key selling points of Brave is how "user intent"—the browsing behavior of a browser user—will be profiled and kept (voluntarily) by the users themselves instead of being tracked and compiled by third parties. When a user opts-in for Brave Ads, they will want to get relevant, targeted advertisements without compromising their privacy. When enabled, the browser will gather browsing information and use it to create "intent signals" to Brave Ads, without any specific user identification or cookies. To prevent fingerprinting, only a small subset of information is used to signal intent.

Brave also has built-in defense against browser fingerprinting, although it is disabled by default as it breaks the functionality of some sites. Fingerprinting is the underhanded use of browser or operating system features to identify computers and track them across the web. In fingerprinting-protect mode, Brave will attempt to thwart fingerprinting attempts via canvas, WebGL, AudioContext, and battery status. That mode also disables WebRTC so that it cannot leak private IP addresses.

Browser experience

Brave feels similar to Google Chrome and other Chromium-based browsers. It is possible to start a different browsing session in each tab, and group tabs by session, which is useful for logging into two or more web accounts simultaneously. Private mode can also be started in a tab, instead of requiring an entirely new window like Google Chrome. The main feature is advertisement blocking, and when browsing sites such as The New York Times and The Wall Street Journal, I did not see a single advertisement. When browsing a more advertisement-heavy site like ZDNet, Brave manages to block most of the ads, but some still appear when they are presented in the form of "partner links".

Chrome's browser plugins and extensions are compatible with Brave, but the browser does not support directly using them. Instead, a curated list of extensions is included in the Brave browser download; the developers claim that having a list of approved, tested extensions is meant to maintain the security of the browser. Therefore, if users want an extension, they need to make a feature request to the developers.

Since the use of BATs has not yet been implemented in Brave, Bitcoin is used for monthly contributions paid to web sites of the user's choosing. In the options screen, there is a tab called Payments where this can be enabled. A Bitcoin wallet is created automatically, and users can choose to deposit some of their own money into it if they want. When enabling Brave Ads and browsing participating web sites, payments from advertisers should be deposited into that wallet every 30 days. The Brave developers hope users will ultimately spend all of their collected payments on publisher web sites to help make this new payment ecosystem work.

If you want to try out Brave yourself, you can download it from the web site, where packages are available for Windows 7 or later, macOS 10.9 or later, and Linux. It is also available for Android and iOS in the appropriate app store. If you're running Linux, there are instructions that describes how to install it. Packages are available for Debian, Ubuntu, Mint, Fedora, and openSUSE. It can also be installed as a generic x86-64 binary or via a snap.

Conclusion

The brand new advertisement ecosystem that Brave is pushing is clearly designed to pressure existing players to improve their practices. Brave Software wants to position itself as the spearhead in disrupting the existing ecosystem, but will users, publishers, and advertisers hop on board the new model? Brave maintains that it will not engage in dodgy practices itself; the open-source nature of its browser will help ensure that. However, it remains to be seen if most users and publishers will agree to the new model, especially with Brave Software having the final word on the rules of the new game, which includes how much revenue is shared and with whom.

For now, only Brave incorporates this technology, and it will probably need more widespread adoption before the business model becomes sustainable for publishers. Also, as with all ad-blocking technology, there will be a pushback from advertisers and third parties relying on the user-tracking business model; perhaps they will find subversive ways to defeat ad blocking. It is an arms race, and it is unclear whether users, advertisers, or content publishers come out on top.

Comments (35 posted)

Normally, the -rc6 kernel testing release is not the place where one would expect to find a 900-line memory-management change. As it happens, though, such a change was quietly merged immediately prior to the 4.12-rc6 release; indeed, it may have been the real reason behind 4.12-rc6 coming out some hours later than would have been expected. This change is important, though, in that it addresses a newly publicized security threat that, it seems, is being actively exploited.

A correction: Ben Hutchings pointed out that the Qualys analysis is based on the "main thread" stack, not any other thread stacks which, with glibc at least, are not allowed to grow. Apologies for the confusion.

The stack area in a running process is, on most architectures, placed at a relatively high virtual address; it grows downward as the process's stack needs increase. A virtual-memory region that automatically grows as a result of page faults brings some inherent risks; in particular, it must be prevented from growing into another memory region placed below it. In a single-threaded process, the address space reserved for the stack can be large and difficult to overflow. Multi-threaded processes contain multiple stacks, though; those stacks are smaller and are likely to be placed between other virtual-memory areas of interest. An accidental overflow could corrupt the area located below a stack; a deliberate overflow, if it can be arranged, could be used to compromise the system.

The kernel has long placed a guard page — a page that is inaccessible to the owning process — below each stack area. (Actually, it hasn't been all that long; the guard page was added in 2010). A process that wanders off the bottom of a stack into the guard page will be rewarded with a segmentation-fault signal, which is likely to bring about the process's untimely end. The world has generally assumed that the guard page is sufficient to protect against stack overflows but, it seems, the world was mistaken.

On June 19, Qualys disclosed a set of vulnerabilities that make it clear that a single guard page is not sufficient to protect against stack overflow attacks. These vulnerabilities have been dubbed "Stack Clash"; the associated domain name, logo, and line of designer underwear would appear to not have been put in place yet. This problem has clearly been discussed in private channels for a while, since a number of distributors were immediately ready with kernel updates to mitigate the issue.

The fundamental problem with the guard page is that it is too small. There are a number of ways in which the stack can be expanded by more than one page at a time. These include places in the GNU C Library that make large alloca() calls and programs with large variable-length arrays or other large on-stack data structures. It turns out to be relatively easy for an attacker to cause a program to generate stack addresses that hop over the guard page, stomping on whatever memory is placed below the stack. The proof-of-concept attacks posted by Qualys are all local code-execution exploits, but it seems foolhardy to assume that there is no vector by which the problem could be exploited remotely.

The fix merged for 4.12 came from Hugh Dickins, with credit to Oleg Nesterov and Michal Hocko. It takes a simple, arguably brute-force approach to the problem: the 4KB guard page is turned into a 1MB guard region on any automatically growing virtual memory area. As the patch changelog notes: "It is obviously not a full fix because the problem is somehow inherent, but it should reduce attack space a lot." The size of the guard area is not configurable at run time (that can wait until somebody demonstrates a need for it), but it can be changed at boot time with the stack_guard_gap command-line parameter.

The 1MB guard region should indeed be difficult to jump over. It is (or should be) a rare program that attempts to allocate that much memory on the stack, and other limits (such as the limit on command-line length) should make it difficult to trick a program into making such an allocation. On most 64-bit systems, it should be possible to make the guard region quite a bit larger if the administrator worries that 1MB is not enough. Doubtless there are attackers who are feverishly working on ways to hop over those regions but, for a while at least, they may well conclude that there are easier ways to attack any given system.

The real problem, of course, is that a stack pointer can be abused to access memory that is not the stack. Someday, perhaps, we'll all have memory-type bits in pointers that will enable the hardware to detect and block such attacks. For now, though, we all need to be updating our systems to raise the bar for a successful compromise. Distributors have updates now, and the fix is in the queue for the next round of stable kernel updates due on June 21.

Comments (57 posted)

The kernel's command line allows the specification of many operating parameters at boot time. A silly bug in command-line parsing was reported by Ilya Matveychikov on May 22; it can be exploited to force a stack buffer overflow with a controlled payload that can overwrite memory. The bug itself stems from a bounds-checking error that, while simple, has still been in the Linux kernel source since version 2.6.20. The subsequent disclosure post by Matveychikov in the oss-security list spawned a discussion on what constitutes a vulnerability, and what is, instead, merely a bug.

Many kernel command-line parameters allow the specification of an array of integer values, using a syntax like:

foo=1-10

Within the kernel, the result of such a parameter will be the filling of an array with the values one through ten. Array parameters are parsed with the get_options() function:

char *get_options(const char *str, int nints, int *ints);

The nints value specifies the maximum number of integer values that should be placed into the ints array. Unfortunately, nobody noticed that get_options() simply ignores nints ; if the command line contains a parameter like foo=1-1000000 , one-million entries will be written, regardless of whether the destination array has the space to hold them. There are just over 200 get_options() call sites in the 4.12-rc6 kernel; any one of them could be used to overwrite memory via a hostile command line.

The example Matveychikov provided gave a huge range for the netdev parameter, triggering the bug. The overflow can be demonstrated while booting an affected kernel using qemu :

qemu-system-x86_64 -no-reboot -no-shutdown -kernel \ /boot/vmlinuz-4.4.0-66-generic -append "netdev=3735928559-3735999999"

The numbers were chosen so that it wrote variants of the hexadecimal number 0xdeadbeef over and over when overflowing into memory (3735928559=0xdeadbeef), demonstrating that a controlled (if restricted) payload is possible. Matveychikov's patch fixing the error was sent to the linux-kernel and stable lists, but it was not merged and seems to have slipped through the cracks.

Indeed, the fate of this patch is instructive in its own right. Even if one is not concerned about any potential security implications of this problem, it still seems like a bug worth fixing. But it disappeared into the linux-kernel noise, and the one maintainer who seems to have read it, Greg Kroah-Hartman, rejected it as a stable-kernel patch submission that did not follow the relevant rules. Nobody has bothered to direct it to a maintainer who will actually apply it, so the problem remains unfixed.

Bug or vulnerability?

This isn't the first bounds-checking error to be exploitable via the kernel command line. Matveychikov points out that this bug is similar to CVE-2017-1000363, an overflow of the parport_nr array. On the oss-security list, Simon McVittie raised the question of whether there is a realistic way for an attacker to exploit the Linux kernel boot command line without already having compromised the system. McVittie asked: "is this a security vulnerability, or just a bug?"

Daniel Micay argued strongly that this bug does not represent a security vulnerability, saying that "it's unreasonable to consider the kernel line untrusted". After all, as Micay elaborated, there is a whole host of command-line knobs that allow users to alter kernel functions. Florian Weimer disagreed, calling it a potential secure boot bypass, "so it matters in some theoretical sense to some downstreams which carry those Secure Boot patches." Kurt Seifried replied that, for the purposes of CVE assignment, this is a vulnerability. Micay took issue with the fact that CVEs are assigned to bugs like this at all, calling it "meaningless Red Hat security theatre". Seifried replied that CVE assignment has nothing to do with Red Hat, and that Micay should take up the issue with the MITRE/CVE Board.

Is this bug a potential secure-boot bypass? Yes, it is. "Secure boot" isn't a monolithic piece of code that ends with UEFI, it also relies on every privileged piece of code down the boot chain to ensure the integrity of the running operating system. The UEFI portion verifies the bootloader and kernel, and will happily hand off execution to any program that's signed, even buggy kernels. That program is supposed to keep control of the system, preventing even a root user from performing actions that could compromise the system; many actions that root can usually perform are often disabled when secure boot is in use.

Thus, secure-boot developers argue, kernel functionality and command-line parameters need to be locked down so that the kernel cannot be modified in ways that will subvert secure boot. For example, this patch from David Howells, which was merged for the 4.12 kernel, provides annotations to mark parameters that affect hardware, which in turn will let the kernel lock some of those parameters down. Buggy command-line processing that allows the overwriting of unrelated memory clearly defeats this sort of lockdown; it can, thus, only be seen as a vulnerability by anybody who is concerned about the use of command-line (or module) parameters to defeat secure boot.

The kernel has certainly been getting more secure over the last few years, with automated tools such as fuzzers and efforts like the Kernel Self Protection Project taking a more proactive approach to security. Bugs still slip in, of course, and sometimes it can take quite some time before they are found and fixed as well. When they are found, though, it is important that those fixes do get merged; letting this one linger helps no one.

Comments (36 posted)

Windows Management Instrumentation (WMI) is a vaguely defined mechanism for the control of platform-specific devices; laptop functions like special buttons, LEDs, and the backlight are often controlled through WMI interfaces. On Linux, access to WMI functions is restricted to the kernel, while Windows allows user space to use them as well. A recent proposal to make WMI functions available to user space in Linux as well spawned a slow-moving conversation that turned on a couple of interesting questions — only one of which was anticipated in the proposal itself.

Darren Hart, the kernel's x86 platform-driver maintainer, proposed making WMI functions available to user space in early May. The reasoning behind this change took a while to come out, but can be found in this message that was posted one month later. The problem that Hart is trying to solve is the result of a good change in the market: vendors of laptops and similar devices are starting to acquire an interest in supporting Linux, and they are finding the current scheme hard to work with.

Getting Linux working well on a contemporary laptop means ensuring that all of the special buttons work as expected, the backlight can be dimmed, the audio volume can be controlled, the radios can be turned on and off, etc. Each laptop vendor has its own way of connecting these features to WMI functions; as a result, the kernel has to be taught about each new laptop design before it will work as intended. The changes required are typically small to the point of being trivial, but they are indeed required, and it seems that some vendors are finding the process of getting those changes into the mainline cumbersome. It takes too long (a claim that Greg Kroah-Hartman questioned), and requires working with those pesky kernel developers.

Hart would like to see that platform-support code move to user space, eliminating the need to get changes into the kernel to support a new device. It would, he said, open the flood gates to interesting new developments and "allow vendors to 'own their own destiny' and innovate and support their platforms independently". He also argued that moving this code to user space would avoid cluttering the kernel with platform-support code that nobody will care about 18 months in the future.

He did have one concern, though, having to do with backward compatibility. The existing kernel-space drivers make use of WMI functions now; allowing user space to play with the same functions is likely to lead to confusion at best. So the proposal included a blacklist mechanism allowing the kernel to block access to any WMI function that it is using. So far, so good, but there is a potential problem: if a new feature is added to the kernel that involves using a WMI function that had not been used before, blocking access to that function might break a user-space application that started using it before the kernel did. Hart's question was: would this sort of ABI change run afoul of the kernel's no-regressions policy?

This question was discussed and a potential solution emerged, but many of the developers were interested in a separate question. Andy Lutomirski probably expressed their concern best:

I should add that one thing that's really, really nice about Linux is that, when you install it on a supported laptop, it works. On Windows, you end up with a horrible pile of bloated, buggy, unsupported userspace crapware to make random buttons work. While I'm all for improving the manageability situation on Linux, let's please make sure we don't regress the ordinary laptop functionality story and make it as bad as it is on Windows. We're currently in a surprising situation in which laptops frequently work *better* on Linux than Windows, and I think we should preserve that.

In other words, the idea of moving platform support to a user-space blob — perhaps a proprietary blob at that — proved to be surprisingly unappealing to a number of developers on the list. From this point of view, the fact that Windows allows user-space access to WMI functions is not an argument in favor of Linux doing the same; indeed, it could be an argument for doing the opposite. Hart responded that user-space code will not necessarily be of lower quality than kernel code, and that making this functionality available to user space may increase the pool of developers who are able to work on platform support. In particular, he said, some of the people who report platform bugs now could maybe fix them if the relevant code were in user space.

This has the look of the sort of conversation that repeats over the years without any real conclusions, but there were a few things that came out of this one. The first is that there is little interest in changing the status quo if it would inhibit the addition of support for future platforms to the kernel. The ability to run Linux on a laptop without having to chase down some vendor's platform-support blob is worth keeping. So any mechanism giving access to user space would have to allow the addition of support for future platforms, regardless of what might have been done in user space.

There was no definitive conclusion on the backward-compatibility issue, but there was a proposal for a potential solution to the problem. Rather than blocking access to a given WMI function, a platform driver could intercept accesses and ensure that they are properly carried out. So, for example, if the kernel gains the ability to adjust the backlight brightness on a given platform, it will need to intercept access to the WMI function(s) it uses. If a user-space application attempts to use one of those functions, the kernel-space driver can look at what the application is trying to do, cause it to happen, and adjust its own internal state accordingly. If this interception is implemented correctly, it should ensure that a user-space brightness application written before the kernel gained that ability will continue to function afterward.

The final conclusion was that there has been enough discussion of ideas without the code to accompany them. Once the proposal has been implemented, it will be easier to see what the implications are. Hart has accepted the assignment to get some code posted for review. At that point there will almost certainly be another lengthy discussion on the topic.

Comments (21 posted)

At Open Source Summit Japan (OSSJ)—OSS is the new name for LinuxCon, ContainerCon, and CloudOpen—Sasha Levin gave a talk on the kernel's application binary interface (ABI). There is an effort to create a kernel ABI specification that has its genesis in a discussion about fuzzers at the 2016 Linux Plumbers Conference. Since that time, some progress on it has been made, so Levin described what the ABI is and the benefits that would come from having a specification. He also covered what has been done so far—and the the extensive work remaining to be done.

An ABI is how one piece of compiled code talks to another, he said. For the kernel, the ABI is how user space communicates with the kernel. That takes the form of system calls, ioctl() commands, and, to some extent, the calls in the virtual dynamic shared object (vDSO).

The current process for Linux development says that kernel patches cannot break programs that rely on the ABI. That means a program that runs on the 4.0 kernel should be able to run on the 5.0 kernel, Levin said. While breakage is not intended, there is nothing that would automatically prevent it from happening. There are no tools that would detect an ABI break, so the responsibility to find these kinds of problems falls to user space. If a user-space program breaks because of an ABI change, users and developers of that program can complain to the kernel developers and, hopefully, get the problem fixed upstream.

The kernel ABI gets extended by some developer coming up with a new feature that has a user-space interface and, with luck, some documentation that describes it. Usually the maintainer will require documentation before merging the feature, he said. Normally, though, someone else will write the corresponding user-space code to use the new feature; that might be developers for the GNU C library (glibc) or QEMU, for example. Then the kernel and user-space developers test to find things that are broken on both sides of the interface; the code is "massaged until stuff works" between the two.

But that process has a number of flaws. Basic validity checks are forgotten and can often reappear as security vulnerabilities later on. Effectively, these missing checks allow user space to cause the kernel to do things that were never planned for, which is always dangerous. There are also undefined behaviors on both sides of the interface because there is no complete specification. Even if all of the checks are made, there is still room for the kernel to end up performing operations that were not planned. The lack of a specification can also lead to problems down the road; failing to verify flags and other parameters mean that changes in the kernel can cause existing programs to break.

Backward compatibility is supposed to be verified by the Linux Test Project (LTP) and other tools, which help, but the real verification is done by all of the user-space applications. LTP will catch big things, Levin said, but not the majority of backward compatibility breaks.

The user-space ABI is broken "every other release or so"; usually they are small things that no one cares about, he said. But, one of the problems is that it can take some time before the new release gets widespread testing. In the meantime, some user-space program could start relying on the new ABI, while another program still relies on the unbroken, old behavior; that would be a recipe for a difficult-to-solve problem.

Much of the validation that is done in the kernel is done on an ad hoc basis. There is no clear definition of what should be checked or how the checks should be done. Each system call typically has its own way of checking, which opens up room for bugs, he said. If there are 20 different versions of ways to check some kind of parameter and one of them gets fixed, it is common for the other 19 to be missed. He suggested that using " git blame " on any major system call will show missed checks; "look and you will be unpleasantly surprised".

Even in user space there are lots of different implementations for system call parameter handling. For example, strace has its own library describing all of the system calls, C libraries do their own parameter checking, and different fuzzers all have their own way of generating system call parameters. That is just more duplication, which leaves more room for mistakes, Levin said. Most of those implementations are written by developers who aren't necessarily familiar with the kernel side of the interface as well. "It's a mess."

The existing documentation, in the form of man pages, is "pretty good". But man pages only cover about 80% of the use cases; they are not supposed to completely document the ABI of the kernel. The documentation is meant to help programmers get their programs working with the kernel, thus it is a "summary briefing" rather than any kind of "contract".

Contract

Today, the contract is embodied in the kernel code for a specific kernel version. The documentation is based on someone's understanding of the kernel code, which may be wrong, and the kernel code itself is subject to change. There generally is no proactive effort to see if an ABI change affects a particular user-space program; its users find out later when things break.

Having a contract would kill multiple birds with one stone, he said. It would force the kernel and user space to behave in a specific way. The backward compatibility problems would disappear, since changes that affect it could be detected. It would prevent a whole class of errors between user space and the kernel. It would fully document the ABI and it would also allow code reuse for the ABI, with the usual benefits that brings.

What would this contract look like? It should be human readable so that people can review it, but also should be machine readable so that it can be turned into code for tests or to use in the kernel and user space. Hopefully, it would only need to be written once but could be used by all of the potential consumers. As a starting point, the system call descriptions used by the syzkaller fuzzer look reasonable. They are used to create calls to system calls that are correct enough that they tickle new parts of the kernel code.

On the kernel side, the contract would be used to generate code to verify parameters and return values as part of the ABI. The code would validate the input and output parameters based on the specification. It would try to prevent calls with invalid arguments from even getting to the real system call code. That would reduce the amount of validation checks needed in the individual system calls; for example, file descriptors could be verified in only one place and system calls could rely upon getting a valid one.

For user space, it would make things easier for programs and libraries that access the kernel ABI. Instead of hoping the ABI is understood, user-space programs would have a guarantee of behavior instead. The contract would be made usable for projects like strace that already have to work with the ABI. Validation code based on the specification could be added to glibc and other C libraries as well.

By generating the validation code and centralizing it, lots of code in both the kernel and user space will be removed. Fixes to the validation code will be shared on both sides of the ABI. In addition, backward compatibility problems will be detected more easily and quickly. It will be difficult to implicitly break the ABI.

There are also benefits for users of the stable and long-term support (LTS) kernels. Right now, some are afraid to upgrade their kernels because they are worried about a new kernel breaking their user-space application. The contract would provide more assurance that those applications will still run correctly. Even though he is a maintainer of LTS kernels, he thinks they are a "pretty bad idea" overall; if you keep older kernels alive for many years, "things are bound to go wrong". He is hoping that an ABI contract would help to kill off LTS kernels to some extent by increasing the frequency that users are willing to do kernel upgrades.

There are security benefits as well. Centralizing the code that is used by multiple user-space projects as well as the kernel will likely lead to more people scrutinizing that code. A fix that is found by one project will fix the others as well. Many of these kinds of bugs lead to CVE entries, so ultimately this could help reduce the number of vulnerabilities for Linux.

There are also academic and other researchers interested in a specification of the kernel ABI. For example, safety researchers are particularly interested as some government agencies will not allow certain industries (nuclear power, for example) to run Linux because there is no specification to describe the limits of what the ABI provides.

Plans

The "hard part" is what is being worked on now, Levin said, which is to determine the format for the specification. The open() and close() system calls are "pretty easy to describe", but other system calls are more complicated and have a lot of interactions with other system calls. There is a need to start documenting all of the system calls and ioctl() commands and to go beyond what is listed in the man pages. The man page may say that EINVAL is returned for a bad flag value, but the specification needs to say exactly what flag values will cause that return. That needs to be written by someone who is familiar with the system call, he said.

Then those specifications need to be tested. That needs to be done without breaking existing user-space programs, but still providing user space a way to test the code. If the feature is governed by a kernel configuration option, most users won't change their distribution's setting, which may limit testing. There needs to be a way to do user-space testing that still allows existing applications to work while the specification is incomplete and changing.

Levin was asked if he knew of other, similar efforts. He said that the Linux Foundation started a project and paid a company to create a specification of some kind back in 2008. That effort crashed when the economy tanked around the same time and he has never seen any of the results of that work.

Another audience member asked about the performance of putting these checks into the system call path, which is considered to be a hot path in the kernel. Levin acknowledged that, but said the idea was to effectively move the checking out of the system call itself, so that would essentially shift the time spent. But there would be more checks and there would be some impact from jumping through another layer; he was hopeful that it would not be a blocker.

In answer to another question, Levin said he has been doing some preliminary work with the syzkaller developers, but that there is a need for more developers. It is just now starting to get to a point where it is worth getting something into the kernel, he said, but there is much more work still to be done.

[I would like to thank the Linux Foundation for travel assistance to Tokyo for Open Source Summit.]

Comments (17 posted)