This edition contains the following feature content:

This week's edition also includes these inner pages:

Brief items: Brief news items from throughout the community.

Announcements: Newsletters, conferences, security updates, patches, and more.

Please enjoy this week's edition, and, as always, thank you for supporting LWN.net.

Comments (none posted)

In many ways, Spectre variant 1 (the bounds-check bypass vulnerability) is the ugliest of the Meltdown/Spectre set, despite being relatively difficult to exploit. Any given code base could be filled with V1 problems, but they are difficult to find and defend against. Static analysis can help, but the available tools are few, mostly proprietary, and prone to false positives. There is also a lack of efficient, architecture-independent ways of addressing Spectre V1 in user-space code. As a result, only a limited effort (at most) to find and fix Spectre V1 vulnerabilities has been made in most projects. An effort to add some defenses to GCC may help to make this situation better, but it comes at a cost of its own.

Spectre V1, remember, comes about as the result of an incorrect branch prediction by the processor. Given code like:

if (index < structure->array_size) do_something_with(structure->array[index]);

The processor would likely predict that index would indeed be less than the given size since, in normal execution, it almost always is. It will then go on to speculatively execute the code that uses array[index] with an index value that may, instead, be far out of bounds. If this speculative access leaves traces elsewhere in the system (by pulling data into the cache, for example), it can be exploited to leak data that, in a correct execution of the code, would be protected.

In the kernel, the array_index_nospec() macro has been introduced as a way to prevent incorrect speculative loads of this type. These macro calls must be introduced manually, though, in places where somebody has determined that a Spectre V1 vulnerability may exist. That has been happening, but slowly; there are about 60 invocations in the 4.18-rc4 kernel. Less work has been done in user space, though, for a number of reasons, including the lack of a primitive like array_index_nospec() .

GCC may soon address that final problem, thanks to this patch set from Richard Earnshaw, based on a technique first published by Chandler Carruth. These patches add a new intrinsic that behaves much like array_index_nospec() :

__builtin_speculation_safe_value(value, fallback)

In the absence of speculation, this function will simply return value . When speculative execution is happening, instead, it might still return value , but it could also return the fallback value, which defaults to zero. It can thus be used to ensure that speculative execution cannot happen with out-of-range index values. A simple implementation would just use a barrier unconditionally to prevent speculation outright, but barriers can be expensive. It may be more efficient to just clamp the range of the index value while allowing speculation in general to continue.

Detecting incorrect speculation

A look at how this new intrinsic works yields some insight into why it is specified the way it is. The core of that implementation is a trick to detect when incorrect speculative execution is occurring and to prevent out-of-bounds accesses from happening in such situations. Doing so requires instrumenting the code as it is built by the compiler. In this scheme, the above if statement would be modified to look something like this:

void *all_ones = ~0; void *all_zeroes = 0; void *correct = all_ones; if (index < structure->array_size) { correct = (index >= structure->array_size) ? all_zeroes : correct; index &= correct; do_something_with(structure->array[index]); }

The key is the assignment of correct inside the body of the if :

correct = (index >= structure->array_size) ? all_zeroes : correct;

That assignment tests whether the inverse of the branch condition is true; if that is the case, the body is being speculatively executed when it should not be and evasive action is required. Since correct will have been set to zero if (and only if) incorrect speculation is taking place, said evasive action can take the form of using correct as a mask against index :

index &= correct;

In normal execution, this operation will change nothing; when incorrect speculative execution has been detected, instead, index will be reset to zero. At that point, it can no longer be used to speculatively access out-of-bounds memory.

The question that may come to mind here is: if the condition is mispredicted in the if statement, won't the same thing happen with the ternary expression used to set the value of correct ? As it happens, almost all architectures have some sort of compare-and-assign operation that (1) is a single instruction without a branch, so the branch predictor does not enter the picture, and (2) is defined by the architecture to not be subject to speculation in its own right. So the assignment of correct will be done with non-predicted values; it will be an accurate indicator of whether incorrect speculative execution is taking place.

Note that the correct flag is initialized once, but updated after every branch as shown above. It will, thus, carry the prediction state through multiple branches if need be. With enough cleverness, it can even be used to communicate this state across function calls. Since speculation can sometimes run hundreds of instructions ahead of anything known to be correct, this ability to track and communicate the state of execution is important.

Adding support to GCC

As noted above, implementing __builtin_speculation_safe_value() can be as simple as injecting a barrier into the generated code. But if the compiler could also add the ability to detect incorrect speculation, other possibilities would open up. To that end, the GCC patch set under consideration adds a new -mtrack-speculation option for compilation that turns on this mechanism. This patch, in particular, adds speculation tracking for the arm64 architecture. As described in that patch, a simple equality test might (after the comparison to set the condition code) look like:

B.EQ <dst> ... <dst>:

With -mtrack-speculation , that code would be made to look more like this:

B.EQ <dst> CSEL tracker, tracker, XZr, ne ... <dst>: CSEL tracker, tracker, XZr, eq

Here, tracker is the name of the register that has been dedicated to holding the correct flag. The CSEL instruction will set tracker to either itself or XZr (the register holding all zeroes) depending on the real value of the condition, without speculation. It is, in other words, implementing the ternary operator we saw in the example above.

This operation will cause the tracker register to be zero when incorrect speculation is happening. That allows it to be used to implement __builtin_speculation_safe_value() ; with the default fallback value of zero, a logical AND between the tracker register and the value in question will suffice. In the case of the arm64 architecture, though, it is possible to do a little better. When speculation tracking is turned on, the compiler will simply insert a CSDB speculation barrier when incorrect speculation is detected.

It's worth noting in passing that things become more complicated when function calls are involved. Speculative execution can involve function calls, so it is important to track incorrect speculation across those calls. If a register could be dedicated program-wide to the tracker value, life would be easy, but that would require a flag-day change to the arm64 ABI. Instead, the stack pointer is used in a tricky way to encode the correctness state on function call and return; see the above-linked patch for details.

Overall, this approach may seem like the best of all worlds; barriers can be expensive, so a mechanism that only executes them when they are known to be necessary would be ideal. The downside, of course, is that the speculation tracking itself is not cheap. It requires setting aside two registers to track the state and the instrumentation of every branch. No benchmark results have been posted with the code, but this level of overhead must have an impact. The cost is high enough to rule out otherwise interesting ideas like automatically protecting all bounds checks.

In any case, this sort of speculation tracking may come across as a strange mechanism; code running on the processor can detect that the processor has speculated incorrectly, but the processor itself still takes some time to figure that out. But that is the world we have found ourselves living in. The best that can be done is to find ways of protecting our code while minimizing the cost.

Comments (26 posted)

A recent query about the status of network security (TLS settings in particular) in Emacs led to a long thread in the emacs-devel mailing list. That thread touched on a number of different areas, including using OpenSSL (or other TLS libraries) rather than GnuTLS, what kinds of problems should lead to complaints out of the box, what settings should be the default, and when those settings could change for Emacs so as not to discombobulate users. The latter issue is one that lots of projects struggle with: what kinds of changes are appropriate for a bug-fix release versus a feature release. For Emacs, its lengthy development cycle, coupled with the perceived urgency of security changes, makes that question even more difficult.

Questions and concerns

In mid-June, Jimmy Yuen Ho Wong posted some "questions and concerns about Emacs network security" to the list. He had run some tests making TLS connections to web sites with bad certificates, but found that Emacs did not complain about the problems. The default values for some security settings were inadequate, he said; he proposed a set of better defaults, but there were still some failures. So he wondered:

Can we update the default network security settings? Now that `starttls.el` and `tls.el` are obsolete, and GnuTLS doesn't seem to be doing a very good job, can we link to something better maintained, such as OpenSSL/LibreSSL/BoringSSL/NSS?

The emacs-devel post came out of a much less temperate reddit post. In it, he noted that tls.el and starttls.el have been deprecated on the master branch. Using those, he could build Emacs without GnuTLS and use the OpenSSL command-line programs for his TLS connections, but that option is going away. In his mailing list post, he also asked about nsm.el , which is the Emacs Network Security Manager: It is "seemingly doing redundant checks if your TLS settings are reasonable, what's the history of it and why is it not obsolete when `tls.el` and `starttls.el` are?"

The settings that Wong complained about ( gnutls-verify-error , gnutls-min-prime-bits , and gnutls-algorithm-priority ) are actually for the lower-level gnutls.el library that is the interface to GnuTLS, Lars Ingebrigtsen explained. "The Emacs Network Security Manager does the user interface job handling various classes of (insecure) network access classes", he said. He is the developer of NSM and wrote a blog post about it when it was merged back in 2014. NSM was released as part of Emacs 25.1 in 2016. In a subsequent emacs-devel post, he clarified the role of NSM:

Network security management is done by nsm.el, not by the underlying libraries. The NSM provides, as the name implies, network security, and does stuff like certificate pinning and warns you if a STARTTLS connection has degraded to a non-TLS connection (which, of course, the gnutls.el functions can't do on its own).

But there are still more problems that Wong has found. He listed around two dozen badssl.com sites that did not give any errors when he accessed them from Emacs using the default settings. Three of the failures he rated as "very concerning" (revoked., pinning-test., and sha1-intermediate.) and another was "a bit concerning" (invalid-expected-sct.). All of those were run with the default value of network-security-level , which is " medium "; setting it to " high " only resulted in complaints from two of the sites with too small Diffie-Hellman key sizes (dh480. and dh512.).

Ingebrigtsen acknowledged some of those problems; he suggested that more of those sites should result in complaints on the high security level and that the Diffie-Hellman complaints should move from high to medium . He seemed less convinced on the certificate-pinning problem, but was interested in patches to implement the complaint for a revoked certificate. He also filed a bug to remind himself of a feature he has planned to make the protocol checks more extensible.

Security tradeoffs

Certificate pinning—really public-key pinning—is important, Perry E. Metzger said. In contrast to Ingebrigtsen's ambivalence, Metzger said that it can be a "matter of life or death for many people". Sites can request that their public keys be stored by the browser using HTTP Public Key Pinning (HPKP), not doing so could be disastrous:

Pinning is what is done by sites like gmail to prevent third world dictatorships from using stolen certificate credentials to spy on their citizens. People who have been victims of this have had their email read, been arrested by state security forces for dissent, and have been tortured to death for lack of certificate pinning working in their browsers.

But Emacs co-maintainer Eli Zaretskii is concerned that adopting such measures will result in extra prompts and other inconveniences for those who are not living under such conditions:

It isn't the Emacs way to second-guess our users' needs, definitely not to decide for them what is and what isn't a matter of life and death for them. We provide options with some reasonable defaults, and then let users make informed decisions which defaults are not good enough for them. It is IMO unreasonable to make our defaults match what happens in dictatorships that you describe, because that would unnecessarily inconvenience the majority of the users. Let's not follow the bad example of the TSA (whose rationale is, unsurprisingly, also matters of life and death).

The settings of various web browsers were often discussed in the thread, at least as a starting point for what the Emacs defaults should be. But NSM is used for more than just web browsing (IMAP, for example) in Emacs, so the fit is not exact. The Emacs Web Wowser (EWW) is used to browse the web, though, so Emacs can learn from the other browsers. As Paul Eggert put it:

However, when Emacs is used to browse the web there's a powerful argument to model its security practices on those of other web browsers. A lot of practical experience has gone into the Firefox and Chrome security models, and it would be much, much more efficient for us Emacs developers to reuse those wheels instead of reinventing them.

But Zaretskii thinks there is more to it since Emacs does more with TLS than just web browsing:

NSM is used for every net connection, not just those on behalf of EWW. If you are arguing that NSM should apply different settings to EWW connections and to the other kind, then this is a separate issue.

Richard Stallman weighed in as well; he disclaimed knowledge of the details, but suggested a "possible avenue for choosing a good response". He wants to ensure that, early on, users get a way to at least consider the question: "might thugs with torture chambers be spying on me" (as he put it in another message).

The idea is that we make sure users see a chance to choose between the alternatives (convenience and safety) early enough that they won't be unsafe. The choice should come with an explanation of each option, first stating what situations it is recommended for, then what it does.

No one really disagreed with that idea, but it is also not really the focus of ongoing efforts. The idea seems reasonable, but the implementation—wording in particular—may be harder than it sounds.

Prime bits

The current setting in Emacs for the minimum number of bits in the prime number used as part of the Diffie-Hellman key exchange algorithm is 256, which is woefully small, according to Wong and others. Various kinds of attacks on key exchanges with small primes are known and TLS-downgrade attacks can be used to force the key exchange to use these smaller prime numbers. Currently, browsers set their minimum to 1024 bits, which is still thought to be susceptible to state-level attackers. Given that, Noam Postavsky asked if that default could be bumped to 1024 on the release branch.

Zaretskii was opposed to doing so on the release branch, since the amount of testing it would require would likely delay the Emacs 26.2 bug-fix release. Part of the changes that Ingebrigtsen has planned would change the medium security level so that NSM would complain about small primes, but it is not clear that those changes would make it into 26.2. Zaretskii is concerned about "unintended breakage" from this kind of change, but others, including Postavsky think "it's unfortunate that we're still going to release a 26.2 which silently accepts primes smaller than 1024 bits by default".

It turns out, however, that NSM does complain about primes that are smaller than 1024 bits, according to Ingebrigtsen. That is a bit inconsistent with the gnutls.el setting, but it does at least solve the problem. Users who have a need to connect to servers with 256-bit Diffie-Hellman primes can still do so without changing gnutls-min-prime-bits , albeit with a warning. Wong got a bit heated in his response, though Ingebrigtsen thought it all makes sense: the lower layer can still do things that the upper layer will warn the user about. He also pointed out that Wong may not be looking at the big picture:

Both here and in other places in this thread you seem to fixate on the particular use cases you're interested in to the extent that you say that other use cases are wrong, somehow. People have different needs and different approaches, and Emacs should empower them to get their work done, and not pressure them into doing it the way we think they should do it.

Along the way, Wong also posted his plans for handling some of the areas where NSM is not up to date with the latest TLS best practices. That includes support for Certificate Transparency (CT), which is being deployed by some browsers instead of HPKP, better support for HPKP, better cipher suite and TLS extension checks, and so on. While Ingebrigtsen has quibbles and has not actually looked at the code, he thinks Wong "is very much on the right track". It would seem that before long Emacs will have a better network security story—though it may not get out to users until Emacs 27, whenever that might be.

Switching TLS libraries

The suggestion to switch to OpenSSL is popular with some, but there are some barriers as well. One, which likely will occur to anyone who has been keeping an eye on OpenSSL over the years, is licensing. OpenSSL has its own unique license that is not compatible with the GPL; other projects have run into that incompatibility along the way. But OpenSSL has been actively working to relicense itself under the Apache Software License v2. As of March 1, the project is close to being able to switch; it plans to do so with its next release.

Another potential issue is the "Gnu" in GnuTLS, but that is something of an accident of history. GnuTLS moved out of the GNU project back in 2012. Normally, GNU projects (such as Emacs) are strongly encouraged (or more) to support and use other GNU projects, but that is no longer a consideration here. However, Zaretskii noted that Emacs had "just switched to GnuTLS as the primary means in Emacs 26.1" and that maintaining both a GnuTLS and an OpenSSL version of the TLS code would be a maintenance burden. He was firmly opposed to the idea.

In the end, Wong more or less agreed. In a lengthy message, which covered a lot of other topics, he said:

In addition, these days I try to live by a motto - Don't let the perfect be the enemy of the good. Replacing GnuTLS with OpenSSL is significantly harder than writing a couple C functions to get the most out of GnuTLS. Replacing it without major regressions even harder. Besides, most of Emacs' network security problems do not come from [its] use of GnuTLS, but deficiencies in NSM. I believe Emacs' network security can get much better faster if we focus our efforts on the current design.

It was a long and scattered thread, but the end result looks like a nice improvement in Emacs network security that comes from a new Emacs developer—Wong went through the process of assigning copyright to the FSF in one of the sub-threads. There are a lot of "fiddly bits" when it comes to network protocols, especially those involving encryption, key exchange, and so on. Having a new expert interested in working on those problems will be a boon to Emacs and its users.

Comments (20 posted)

Large data centers routinely use control groups to balance the use of the available computing resources among competing users. Block I/O bandwidth can be one of the most important resources for certain types of workloads, but the kernel's I/O controller is not a complete solution to the problem. The upcoming block I/O latency controller looks set to fill that gap in the near future, at least for some classes of users.

Modern block devices are fast, especially when solid-state storage devices are in use. But some workloads can be even faster when it comes to the generation of block I/O requests. If a device fails to keep up, the length of the request queue(s) will increase, as will the time it takes for any specific request to complete. The slowdown is unwelcome in almost any setting, but the corresponding increase in latency can be especially problematic for latency-sensitive workloads.

The kernel has a block I/O controller now, but it has a number of shortcomings. It regulates bandwidth usage, not latency; that can be good in settings where users are being charged for higher bandwidth limits, but it is less useful for workloads where latency matters. If some groups do not use their full bandwidth allocations, a block I/O device may go idle even though other groups, which have hit their limits, have outstanding I/O requests. The block I/O controller also depends heavily on the CFQ I/O scheduler and loses functionality in its absence. It doesn't work at all with multiqueue block devices — the type of devices most likely to be in use in settings where the I/O controller is needed.

The I/O latency controller, written by Josef Bacik, addresses these problems by regulating latency (instead of bandwidth) at a relatively low level in the block layer. When it is in use, each control group directory has an io.latency file that can be used to set the parameters for that group. One writes a line to that file following this pattern:

major:minor target=target-time

Where major and minor identify the specific block device of interest, and target-time is the maximum latency that this group should experience (in milliseconds).

The controller tracks the actual latency seen by each group, using a relatively short (100ms) window. If a given group starts to miss its target, all other peer groups with larger targets are throttled to free up some bandwidth; the group with the tightest latency target is thus given the highest priority for access to the device. If all groups are meeting their targets, no throttling is done, so no bandwidth should go to waste if there is a need for it.

On its face, throttling block I/O seems like a straightforward task: if a process needs to be slowed down, simply don't dispatch as many of its requests to the device. But block I/O is a bit strange in that much of it is initiated outside of the context of the process that is ultimately responsible for its creation. One example is filesystem metadata I/O, which is generated by the filesystem itself at a time of its own choosing. Slowing down that I/O may interfere with the filesystem's ordering decisions and create locking problems — without slowing down the target process at all. I/O generated by swapping is another example; it is generated when the kernel needs to reclaim memory, which may not be when the process being swapped is actually running. Slowing down swap I/O will slow down the freeing of memory for other uses — not a particularly good idea when the system is short of memory.

Kernel developers who introduce that kind of behavior have a relatively high likelihood of needing to look for openings in the fast-food industry in the near future. So the latency controller does no such thing. It will delay I/O dispatch for I/O that is generated directly by a process running inside a control group that is to be throttled. So a process reading rapidly from a file may find that its reads start taking longer when throttling goes into effect, for example.

A different approach is needed for indirectly generated block I/O, though. In such cases, the latency controller will record the amount of needed delay in the control group itself. Whenever a process running within that group returns from a system call — a setting where it is known that no locks are held — that process will be put to sleep for a period to pay back some of that delay. The sleep period can be as long as 250ms in severe cases. If I/O traffic eases up and throttling is no longer necessary, any remaining delays will be forgotten.

In the patch introducing the controller itself, Bacik notes that using it results in a slightly higher number of requests per second (RPS) overall, and a significant reduction in variability of RPS rates over time. There is another interesting result, in that this controller can help to protect the system against runaway processes:

Another test we run is a slow memory allocator in the unprotected group. Before this would eventually push us into swap and cause the whole box to die and not recover at all. With these patches we see slight RPS drops (usually 10-15%) before the memory consumer is properly killed and things recover within seconds.

The throttling, seemingly, slows the allocating process enough to allow the OOM killer to do its job before the system runs completely out of memory.

This patch set has been through six revisions as of this writing, with some significant changes in the implementation happening along the way. That work appears to be coming to a close, though. It earned the elusive Quacked-at-by tag from Andrew Morton, and block maintainer Jens Axboe has indicated that it has been applied for the 4.19 development cycle. So the latency for the delivery of the block I/O latency controller would appear to be three or four months at this point.

Comments (9 posted)

lirc

In the 4.18 kernel, a new feature was merged to allow infrared (IR) decoding to be done using BPF . Infrared remotes use many different encodings; if a decoder were to be written for each, we would end up with hundreds of decoders in the kernel. So, currently, the kernel only supports the most widely used protocols. Alternatively, thedaemon can be run to decode IR. Decoding IR can usually be expressed in a few lines of code, so a more lightweight solution without many kernel-to-userspace context switches would be preferable. This article will explain how IR messages are encoded, the structure of a BPF program, and how a BPF program can maintain state between invocations. It concludes with a look at the steps that are taken to end up with a button event, such as a volume-up key event.

Infrared remote controls emit IR light using a simple LED. The LED is turned on and off for shorter or longer periods, which is interpreted somewhat akin to morse code. When infrared light has been detected for a period, the result is called a "pulse". The time between pulses when no infrared light is detected is called a "space".

Whenever a pulse or space is detected by an IR receiver, a BPF program will be executed (if one is attached). This program consists of a single function entry point that takes a pointer to a context. For IR decoders, this context is an unsigned int value. For a packet filter, the context would instead be a packet. In our case, the lower 24 bits of the int value contain the duration of the pulse or space, in microseconds. The top eight bits define the type of the event, which can either be LIRC_MODE2_PULSE , LIRC_MODE2_SPACE , or LIRC_MODE2_TIMEOUT . The return value of the BPF program is ignored.

If a space between two pulses gets excessively long, it could delay the decoding of a button press. For example, we might want to know that the IR message has really ended by measuring the space after the last pulse has occurred. Since a space is a time between two pulses, we would have to wait for the next pulse from the next IR message to occur before we would get this value. So, for this reason, there is a timeout. If a space lasts longer than the timeout, it is reported as LIRC_MODE2_TIMEOUT . This is typically set at 125ms.

A BPF program can be written in a number of different ways, but the easiest way is to use clang with the target BPF. This allows the BPF program to be written in a sort of restricted C that does not allow the use of C-library functions or loops, for example.

To create an IR decoder in BPF, we start with:

static int eq_margin(int duration, int expected, int margin) { return (duration >= (expected - margin)) && (duration <= (expected + margin)); } int bpf_decoder(unsigned int *sample) { int duration = *sample & LIRC_VALUE_MASK; bool pulse = (*sample & LIRC_MODE2_MASK) == LIRC_MODE2_PULSE; if (pulse && eq_margin(duration, 300, 100) { // seen short pulse of about 300 microseconds } }

Typically, IR receivers have a precision of 50µs at most. I would recommend checking for durations of at least 100µs around the value you expect.

Now we can parse a single pulse or space, but every IR message consists of several pulses and spaces in quick succession. In a regular C program, we would use a static variable, a global variable, or some heap memory to maintain our state while waiting for the next event. Unfortunately none of those options are available in BPF. Instead, we use BPF maps, which are a generic key-value store where the key is always an unsigned int and the value is a generic blob; we can store whatever we want. This is how we declare a BPF map to hold the IR-decoding state:

struct decoder_state { unsigned int bits; unsigned int count; }; struct bpf_map_def SEC("maps") decoder_state_map = { .type = BPF_MAP_TYPE_ARRAY, .key_size = sizeof(unsigned int), .value_size = sizeof(struct decoder_state), .max_entries = 1, };

There are a few different types of BPF maps, the main ones being "array" and "hash". Since we are only looking to store one structure, an array is more than sufficient; we thus specify max_entries as one. The key_size has to be the size of an unsigned int , no other key size is supported. The value_size is the size of our blob of data. We've declared a struct for this purpose, and we use sizeof() to ensure we have the right storage for it.

There are a number functions available to use BPF maps from our BPF code. For example, to get an our entry in decoder_state_map BPF map, we can call:

int key = 0; struct decoder_state *s = bpf_map_lookup_elem(&decoder_state_map, &key);

Unfortunately, if we try to use the pointer to the map, we will get an error when we load our BPF program: "R6 invalid mem access 'map_value_or_null'". This is the kernel's BPF verifier complaining; it checks to ensure that a BPF program does not do anything it should not, like try to access out-of-bounds memory. It also checks for other conditions, like relying on undefined behavior or loops.

The problem here is that bpf_map_lookup_elem() , the function used to obtain a value from a BPF map, might return NULL if the key is beyond the last element. The elements of an array are pre-allocated, and we are looking for element zero out of a total of one, so this lookup should never fail. However, the BPF verifier is not aware of this so, in order to keep the verifier happy, we have to add:

if (!s) return 0;

The pointer we get from bpf_map_lookup_elem() is a direct pointer to the array, so we do not have to call bpf_map_update_elem() after making changes. The BPF verifier will check that we only use our pointer with the right offsets within our array entry; otherwise our program will not load.

Now that we have memory to store state, we can implement decoding. When we have decoded the IR to a button event, we can submit that event to the input subsystem using the BPF function bpf_rc_keydown() . It takes four arguments, being the BPF context, the protocol, the scancode, and the toggle bit:

The context for BPF is the pointer that was passed to the main BPF function; so we simply pass sample here.

here. The IR protocol can be used by user space to determine which protocol produced any given scancode; at the moment, nothing uses it.

The scancode is the value that was decoded. IR protocols generally encode some sort of value, and that value does not necessarily represent a key or a button. A particular remote might assign particular values with buttons; so, we need a mapping from scancode to key code. This is done using remote-control keymaps, which usually live in /lib/udev/rc_keymaps/ if the v4l-utils package is installed (or the ir-keytable package on Ubuntu or Debian).

if the package is installed (or the package on Ubuntu or Debian). Some IR protocols include a toggle bit. Since the IR message is repeated every 90ms or so, it is impossible to distinguish a key being held from a key released and pressed again (toggled). In the latter case, the toggle bit will change value, so rc-core knows to generate both key-up and key-down events.

So those are the four arguments to bpf_rc_keydown() . Now, we can show a complete example of a fictional IR decoder.

#include <linux/lirc.h> #include <linux/bpf.h> #include "bpf_helpers.h" enum state { STATE_INACTIVE, STATE_FIRST_PULSE, STATE_SECOND_PULSE }; struct decoder_state { enum state state; unsigned int space; }; struct bpf_map_def SEC("maps") decoder_state_map = { .type = BPF_MAP_TYPE_ARRAY, .key_size = sizeof(unsigned int), .value_size = sizeof(struct decoder_state), .max_entries = 1, }; SEC("fictional_ir") int decode(unsigned int *sample) { int key = 0; struct decoder_state *s = bpf_map_lookup_elem(&decoder_state_map, &key); if (!s) return 0; int duration = LIRC_VALUE(*sample); switch (s->state) { case STATE_INACTIVE: if (LIRC_IS_PULSE(*sample) && duration == 500) { s->state = STATE_FIRST_PULSE; } break; case STATE_FIRST_PULSE: if (LIRC_IS_SPACE(*sample)) { s->space = duration; s->state = STATE_SECOND_PULSE; } else { s->state = STATE_INACTIVE; } break; case STATE_SECOND_PULSE: if (LIRC_IS_PULSE(*sample) && duration == 500) { bpf_rc_keydown(sample, 64, s->space / 100, 0); } s->state = STATE_INACTIVE; break; } return 0; } char _license[] SEC("license") = "GPL";

Several operations are multiplexed through the bpf() system call for managing BPF programs and BPF maps, and for attaching them to devices. To create a BPF program, the BPF_PROG_LOAD is used. We have to provide a pointer to the BPF instructions, the instruction count, and a program name. If the system call is successful, we will get a file descriptor.

We can create BPF maps with the BPF_MAP_CREATE command, which also returns a file descriptor on success. Once we have the program and maps created, we can attach the program to a LIRC device (e.g. /dev/lirc0 ) using the BPF_PROG_ATTACH command. We have to provide a file descriptor for the LIRC device and the BPF program file descriptor. Once the file descriptor is attached, we can safely exit our process and the BPF program won't be freed when its file descriptor is closed.

Currently there is a hard-coded limit of 64 BPF programs that may be attached to one LIRC device. Any more, and BPF_PROG_ATTACH will return E2BIG . Every time a new pulse or space occurs, all the BPF programs will be executed. This makes it possible to load multiple BPF decoders, so that different remotes can be used at the same time.

As you might expect there are also commands for querying and detaching BPF programs.

The BPF example above can be compiled it with:

clang --target=bpf -O2 -c foobar.c

You'll need to compile it with kernel headers from 4.18 (or later), and the bpf_helpers.h from the same tree. This produces foobar.o , an ELF object file.

Using ir-keytable , you can load this BPF program. You'll need the BPF patches, which have not been merged yet at the time of writing. In order to simulate this, the rc-loopback pseudo-receiver can be used, so no IR hardware is needed. Here are the steps to make this work:

modprobe rc-loopback ir-keytable -p ./foobar.o

In order to test this setup, create a file test with the following contents:

pulse 500 space 1500 pulse 500

Now, run:

ir-keytable -k 15:KEY_VOLUMEUP -t

in one terminal, and:

ir-ctl -s test

in another. You should get this output:

855.168999: lirc protocol(64): scancode = 0xf 855.169009: event type EV_MSC(0x04): scancode = 0x0f 855.169009: event type EV_KEY(0x01): key_down: KEY_VOLUMEUP 855.169009: event type EV_SYN(0x00).

The ir-keytable patches above also include a Python script that converts lircd remote configuration so that it can be used with ir-keytable . This should make it possible to do without the lirc daemon. However, some protocol decoders require very basic loops, which currently the BPF verifier does not allow at all.

Even with all lircd remote configurations supported, that would still not cover all possible remote controls. A possible solution can be found in IRP notation, a general form of description for IR protocols; it would be nice to generate BPF from that, and have support for a very broad array of remotes, without having to open-code each one. Lastly, other things than button presses are encoded in IR, for example target temperatures in air conditioning remote controls, or some remote controls include a directional pad. Supporting such devices with BPF decoders will require some further work.

Comments (37 posted)

The compromise of the Gentoo's GitHub mirror was certainly embarrassing, but its overall impact on Gentoo users was likely fairly limited. Gentoo and GitHub responded quickly and forcefully to the breach, which greatly limited the damage that could be done; the fact that it was a mirror and not the master copy of Gentoo's repositories made it relatively straightforward to recover from. But the black eye that it gave the project has led some to consider ways to make it even harder for an attacker to add malicious content to Gentoo—even if the distribution's own infrastructure were to be compromised.

Unlike other distributions, Gentoo is focused on each user building the software packages they want using the Portage software-management tool. This is done by using the emerge tool, which is the usual interface to Portage. Software "packages" are stored as ebuilds, which are sets of files that contain the information and code needed by Portage to build the software. The GitHub compromise altered the ebuilds for three packages to add malicious content so that users who pulled from those repositories would get it.

Ebuilds are stored in the /usr/portage directory on each system. That local repository is updated using emerge --sync (which uses rsync under the hood), either from Gentoo's infrastructure or one of its mirrors. Alternatively, users can use emerge-webrsync to get snapshots of the Gentoo repository, which are updated daily. Snapshots are individually signed by the Gentoo infrastructure OpenPGP keys, while the /usr/portage tree is signed by way of Manifest files that list the hash of each file in a directory. The top-level Manifest is signed by the infrastructure team, so following and verifying the chain of hashes down to a particular file (while also making sure there are no unlisted files) ensures that the right files are present in the tree.

Another mechanism to get a Portage tree is to clone a Git repository that contains one. These Git mirrors (such as the one at GitHub) can be used to create a local /usr/portage tree by doing an emerge --sync while pointing at the clone as the Portage source. Finally, there is also the canonical Portage tree Git repository, which is somewhat less convenient to use, since it does not have everything that is needed. It needs some data repositories and for the Portage cache to be updated; those things are handled by the infrastructure team for the Git mirrors. On the other hand, all commits to the canonical tree are signed by Gentoo developers directly, so the infrastructure keys need not be trusted.

Trustless

Jason A. Donenfeld posted an idea for a "trustless infrastructure" to the gentoo-dev mailing list on July 2. The core of his suggestion is that, instead of having the Gentoo infrastructure team sign the Portage tree that the distribution provides, developers of the ebuilds would sign them directly. That way, if the infrastructure was compromised, there would be no signing keys available to be abused.

His proposal is that every file in an ebuild would be signed by the developer responsible, so that each file would have a corresponding .asc file that would be distributed with the tree as usual. He also suggested that files not end up in /usr/portage until they have had their signatures verified; instead, they should be copied into a shadow directory to do the verification, then put into /usr/portage if it succeeds. A keyring of the public keys of Gentoo developers would be created and disseminated; eventually, the corresponding private keys would hopefully be stored by the developers on some kind of hardware token.

Signatures are made by developers, not by infra.

Portage doesn't see any files that haven't yet been verified. This is very similar to what Arch Linux is doing, and AFAIK, it works well there. I'm sure this list will want to bikeshed over the particulars of the implementation, but the two design goals from this are:

The reaction to the proposal was somewhat mixed but generally on the negative side. Rich Freeman pointed out that a change of this sort would require a flag day of sorts; it could not easily be added slowly and "grow organically". But he also noted that using the existing Git signatures would provide much of what Donenfeld is looking for. Freeman also thinks that syncing using Git, rather than rsync should be considered:

In general I do advocate giving serious consideration to the benefits of syncing via git. If you sync frequently (which most Gentoo users probably do, and which we generally advocate), then it tends to be a lot more efficient than rsync. It naturally tracks changes over time as well, so it fits in very well with merging untrusted changes into a known-good tree, as only the changes need to be verified.

Donenfeld's first reply is a bit dismissive; it complains about the length of Freeman's reply, for example, which is not much larger than the proposal itself. Similarly, when Michał Górny asked about how the keyring would be distributed and protected, Donenfeld's reply was terse: "Same model as Arch." He did eventually elaborate on that somewhat, but it did not convince Górny:

In other words, I see no purpose in adding a lot of complexity in order to shift the weakest link from one Infra machine handling the signing to another single point of failure in distributing the keys.

Others also poked holes in the proposal, mostly with regard to key management. Hanno Böck posted a number of questions on key and signature management, particularly with regard to expired, revoked, and newly untrusted keys. Is there some kind of re-signing process that would have to be done? How would that be handled? He concluded:

I don't want to say this is unworkable. But these are challenges and imho fixing them all is really, really tricky. Either you break stuff regularly or you have procedures that someone has to do regularly in order to avoid breakage (more work for gentoo devs) or you expand the scope of accepted signatures very excessively. And I believe these challenges are one of the reasons the old attempts to have a signed Gentoo never went anywhere. I'm glad we have some form of signed Gentoo now, even if it relies on some centralized infrastructure.

Kristian Fiskerstrand was more pointed: "I'll say it, it is unworkable". He said that there was always going to be a need for some centralized keys to ensure the integrity of the repositories. Ulrich Mueller also said that Donenfeld's proposal was unworkable because it would violate the Gentoo Package Manager Specification: "we cannot change that retroactively, because it would break existing implementations". Furthermore, Mueller wondered whether adding another 100,000 files to the tree made sense; it would result in 400MB of extra space on a 4KB-block filesystem, he said.

Overall, it doesn't seem like the proposal is going anywhere, though there are elements of it that are attractive. In particular, removing the infrastructure-key bottleneck and, thus, danger from a compromise of those keys (and/or repositories) is of interest, but there is a lot of work to be done to get there. And, as always, key management is a difficult problem to solve.

Git versus rsync

In a related thread, William Hubbs picked up on Freeman's thinking and asked why Gentoo still relied on rsync rather than using Git directly. It comes down to a number of factors that Freeman summarized. Currently, doing an emerge --sync from a Git clone will leave the tree in a corrupted state if it doesn't verify. Also, rsync is more bandwidth efficient for less-frequent updates; it is not clear where the crossover point is, but he guessed Git would be more efficient if updates were done more often than weekly. There are more rsync mirrors, as well, though he is not sure that makes much of a difference in practice.

Beyond that, Freeman noted that Git history makes for more disk-space usage. He personally uses Git, and others would like to do so, but the disk-space issue makes that harder. Matt Turner said that he has set aside a 1GB partition for the tree, which works fine for the roughly 600MB needed by rsync , but not for Git. A shallow clone of the Git repository is roughly the same (around 660MB), but each pull adds to that, so without some kind of "auto-trimming", Git will grow quickly, Freeman said

All of the key-management issues are still present for the Git tree, as well. Even though the commits are signed by the developers, those keys need to be distributed and managed over time.

The GitHub mirror compromise has clearly led to some thinking (and rethinking) within the project about its practices and how they might be improved. It is not clear that there are any real conclusions that have been reached, much less plans made, but considering the various parts of the problem is certainly to the good. One concrete thing that has come out of this incident is a Portage security page on the Gentoo wiki. It explains how to "dispel doubts regarding the security of the portage tree on my system". There are sections for each of the four ways to keep a Portage tree updated that shows what needs to be trusted for each (e.g. keys, web of trust, good security practices) and how to test to ensure the integrity of the Portage tree.

Comments (15 posted)