Fixing getrandom()

Did you know...? LWN.net is a subscriber-supported publication; we rely on subscribers to keep the entire operation going. Please help out by buying a subscription and keeping LWN on the net.

A report of a boot hang in the 5.3 series has led to an enormous, somewhat contentious thread on the linux-kernel mailing list. The proximate cause was some changes that made the ext4 filesystem do less I/O early in the boot phase, incidentally causing fewer interrupts, but the underlying issue was the getrandom() system call, which was blocking until the /dev/urandom pool was initialized—as designed. Since the system in question was not gathering enough entropy due to the lack of unpredictable interrupt timings, that would hang more or less forever. That has called into question the design and implementation of getrandom() .

Ahmed S. Darwish reported the original problem and tracked it down to the GNOME Display Manager (GDM), which handles graphical logins. It turns out that GDM was calling getrandom() in order to generate the "MIT magic cookie" that is used for authorization by the X Window System. As was pointed out by several in the mega-thread, using cryptographic-strength random numbers for the cookie (or much of anything in terms of X Window security) is well beyond the pale—a much weaker random number generator could have been used with no loss of security. Darwish noted that the call "only" requests a small number of random bytes (five calls requesting 16 bytes each) but, as Theodore Y. Ts'o said, that doesn't matter: by default getrandom() will not return anything until the cryptographic random number generator (CRNG) is initialized—which requires entropy.

When Darwish originally bisected the problem, he pinpointed an ext4 commit that had the effect of reducing the amount of disk I/O that was being done early in the boot process. That performance enhancement also, unfortunately, turned out to reduce the amount of entropy gathered on Darwish's laptop—to the point it would not boot. That change has been reverted for now.

getrandom()

Back in 2014, getrandom() was added at least partly in response to a complaint from the LibreSSL project that Linux lacked a way to get random numbers in the face of file-descriptor exhaustion. The "approved" mechanism was to read from /dev/urandom , but if an attacker arranged that all of the file descriptors were already open, that method could fail. So getrandom() was created to provide a way to get random numbers without a file descriptor or, even, a visible /dev/urandom (e.g. from a container or chroot() ). In fact, getrandom() was intentionally designed to block until the /dev/urandom pool is initialized; prior to getrandom() there was no way for user space to be sure that enough entropy had been gathered to properly initialize the pool. Since the behavior of /dev/urandom is part of the kernel ABI, it could not change; a new system call was under no such constraints, of course.

Initializing the CRNG requires 512 bits of estimated entropy, or 4096 interrupts using the current calculations, which Ts'o said were conservatively chosen. getrandom() is clearly documented to block until that happens, but it has not stopped user space from sometimes using it incorrectly. Ts'o said that is going to be a problem moving forward:

Ultimately, though, we need to find *some* way to fix userspace's assumptions that they can always get high quality entropy in early boot, or we need to get over people's distrust of Intel and RDRAND. Otherwise, future performance improvements in any part of the system which reduces the number of interrupts is always going to potentially result in somebody's misconfigured system or badly written applications to fail to boot. :-(

Linus Torvalds noted that the RDRAND instruction does not exist everywhere, so it is no panacea. He is also concerned that problems stemming from fewer interrupts will only get worse:

The interrupt thing is only going to get worse as disks turn into ssd's and some of them end up using polling rather than interrupts.. So we're likely to see _fewer_ interrupts in the future, not more.

Error return

Ts'o suggested adding a kind of "fail safe" flag that would let callers request that getrandom() only block for, say, two minutes; after that, the best available random numbers would be returned. But Torvalds believes that blocking by default is simply wrong. He said that any new flag should request the blocking behavior explicitly so that unthinking users get what they expect. Or perhaps an error could be returned:

An alternative might be to make getrandom() just return an error instead of waiting. Sure, fill the buffer with "as random as we can" stuff, but then return -EINVAL because you called us too early.

Several seemed in agreement with that approach; Darwish posted an RFC patch along those lines. Alexander E. Patrakov reworked the commit message, but also complained about the idea of returning an error and forcing user space to deal with the problem ("the whole result looks like shifting the responsibility/blame without achieving anything useful"). Ts'o clearly thinks it is a bad idea, overall, but somewhat waspishly further modified the patch so that the blocking behavior was configurable at build time (or via a kernel command-line parameter). Darwish took that one step further and increased the length of commit message again, adding even more background and details. In addition, at Torvalds's request, the -EINVAL return was removed, so that getrandom() effectively reverted to the same behavior as reading from /dev/urandom : callers get the "best" randomness available at the time of the call.

Lennart Poettering disagreed with that approach, calling it "sticking your head in the sand" by providing bad random numbers to potentially sensitive key-generation operations early in the boot process. He suggested that the problem is not in the kernel at all and that it should be solved in user space:

Let the people putting together systems deal with this. Let them provide a creditable hw rng, and let them pay the price if they don't.

Part of the problem may be that once the GNU C Library (glibc) got around to adding a wrapper for getrandom() , an OpenBSD-like getentropy() call was also added. However the OpenBSD version does not block, while the glibc version is implemented using getrandom() and, thus, can block indefinitely in the early boot process. Developers calling getentropy() might well be unaware of this little "gotcha"—though it is documented in the man page. As Torvalds and others mentioned in the thread, another problem is that once the system has blocked waiting for entropy, said entropy is likely to never arrive. User space needs to cause things to happen (e.g. keys pressed, disks accessed) to produce the interrupts necessary to get the CRNG initialized.

Yet another part of the problem that Torvalds sees is that there are (at least) two different kinds of users of getrandom() who are passing 0 for the flags value (which he calls " getrandom(0) "): those that actually want/need to block in order to get random numbers only after the CRNG has been initialized and those who are just after "good" random numbers and didn't think too hard about it. Callers of glibc's getentropy() could also fall into that latter category.

Limiting delays

Unsurprisingly, Torvalds was not in favor of a configuration option; his first solution was to limit the wait time of getrandom() to 15 seconds on the first call when the CRNG is not initialized, reducing the delay on each subsequent call so that the maximum possible delay is 30 seconds. The code returns -EAGAIN in that case so that user space can detect it. In a comment in the code (repeated in the email message), he said: "Just asking for blocking random numbers is completely and fundamentally wrong, and the kernel will not play that game."

That set off another huge sub-thread. Poettering once again said that the problem should not be solved by the kernel in a "never trust userspace" fashion. Darwish posted another version of his patch set that proposed a getrandom2() system call, which used new flag names to be ever more explicit about the intentions of the caller. But there are still plenty of flag bits available for getrandom() , Torvalds said, so introducing a new system call seems unnecessary.

Instead, he suggested reworking the flag values to better represent what was being asked for. The GRND_EXPLICIT flag would be used to indicate that user space "knows what it is doing", so if it explicitly asks to block forever, that will be honored. The GRND_SECURE and GRND_INSECURE values would ask for blocking and non-blocking behavior respectively, but both would also set the GRND_EXPLICIT bit. The patch left the getrandom(0) case alone, but Torvalds has plans for that as well:

In particular, this still leaves the semantics of that nasty "getrandom(0)" as the same "blocking urandom" that it currently is. But now it's a separate case, and we can make that perhaps do the timeout, or at least the warning. And the new cases are defined to *not* warn. In particular, GRND_INSECURE very much does *not* warn about early urandom access when crng isn't ready. Because the whole point of that new mode is that the user knows it isn't secure.

Ts'o wondered why the getrandom(0) case was not simply mapped to the same behavior as GRND_SECURE , which would effectively be the same as it is today, but Torvalds was adamant that was the wrong approach; he is concerned that Ts'o is getting overly caught up in what Torvalds sees as theoretical attacks and is missing the real getrandom(0) problem. Torvalds intends the patch to be backported to the stable kernels, so any change to getrandom(0) will be in separate, mainline-only patch.

Jitter entropy

Patrakov asked about using "jitter entropy" as is done by the haveged entropy daemon. Using haveged is being suggested by some distributions as a way to ensure that there is enough entropy early in the system boot. He noted that the technique is controversial as some are concerned that it is not truly random data. Torvalds said that he is one of the skeptics, but thought it might provide a solution to the current mess:

I would perhaps be willing to just put my foot down, and say "ok, we'll solve the 'getrandom(0)' issue by just saying that if that blocks too much, we'll do the jitter entropy thing". Making absolutely nobody happy, but working in practice. And maybe encouraging the people who don't like jitter entropy to use GRND_SECURE instead.

But getrandom() has been in the kernel for five years and in glibc for more than two years, so it is clearly part of the kernel ABI. The behavior that some want when they call getrandom(0) should not be arbitrarily changed to provide "bad" random numbers in a way that breaks user-space programs. That was the upshot of Andy Lutomirski's argument in the thread. He agreed that the getrandom() call was poorly thought out before it was added, but that should not change now:

There are programs that call getrandom(0) *today* that expect secure output. openssl does a horrible dance in which it calls getentropy() if available and falls back to syscall(__NR_getrandom, buf, buflen, 0) otherwise. We can't break this use case. Changing the semantics of getrandom(0) out from under them seems like the worst kind of ABI break -- existing applications will *appear* to continue working but will, in fact, become insecure.

Lutomirski also believes it is a "straight up kernel bug" that blocking in getrandom(0) early in the boot deadlocks the system by waiting for entropy. He suggested actively fixing that problem: "How about we make getrandom() (probably actually wait_for_random_bytes()) do something useful to try to seed the RNG if the system is otherwise not doing IO." Torvalds is in agreement with that, though he seems to be leaning toward the jitter-entropy stopgap:

And yes, we'll have to block - at least for a time - to get some entropy. But at some point we either start making entropy up, or we say "0 means jitter-entropy for ten seconds". That will _work_, but it will also make the security-people nervous, which is just one more hint that they should move to GRND_SECURE[_BLOCKING].

The goal is to ensure that callers are really aware that they are asking to block (and potentially deadlock) waiting for the CRNG to be properly initialized. In the ambiguous default case, that may well not be the case, so Torvalds is determined to find a way to make that not block:

So we absolutely _will_ come up with some way 0 ends the wait. Whether it's _just_ a timeout, or whether it's jitter-entropy or whatever, it will happen.

He is concerned about the amount of time it might take to gather enough jitter entropy to initialize the CRNG, however. He suggested that he was willing to block as long as 15 seconds, but thought that might require some kind of accelerated jitter-entropy technique. Patrakov said that acceleration was not needed as the existing technique can generate plenty of entropy in two seconds. In addition, as had also been noted elsewhere in the thread, Matthew Garrett pointed out that the Zircon kernel for the Fuchsia operating system initializes its CRNG using jitter entropy, which may lend some credibility to the technique.

ABI

In a departure from his usual stance, Torvalds seems fairly unconcerned about changing the kernel ABI in this case. He said that any breakage from changing getrandom(0) to time out was theoretical, but that the boot deadlock problem was real. In order for the generation of keys to fail under that scheme, he said, they would have to be generated at boot on idle machines that are not doing anything that would allow entropy to be collected. As Garrett noted, though, that is the exact scenario for which the getrandom(0) behavior was designed. Torvalds does not see that kind of key generation as anything other than a hypothetical, it seems.

The main difference between the proposals from Torvalds and Lutomirski is whether or not to actually provide some way for getrandom() callers to block, possibly forever, or not. Torvalds is willing to have that as a non-default option to getrandom() , while Lutomirski would prefer to simplify getrandom() (though the patch text calls it " getentropy() "), while also removing all of the machinery behind the /dev/random blocking pool. The net effect would be that users who truly need today's getrandom(0) behavior could still get it by reading /dev/random .

The thread is long and twisty; Torvalds's final decision is not yet clear. It does seem that something will be done to getrandom(0) , but whether it times out or switches to jitter entropy in the problematic case is unclear. It does also seem that the blocking random number pool's days are numbered, as well, based on Torvalds's statements in the thread. But the final shape of those changes is not yet apparent.

It would seem that, once again, the kernel development community has failed in the design of an API/ABI. According to Torvalds and others, the default for getrandom() should never have been "block forever", but that information comes five years too late. API/ABI review is an area that the kernel has struggled with over the years; hopefully situations like this will provide enough incentive to take some extra time (and do some testing, though that probably would not have mattered here) before committing to an ABI that has to be supported, for the most part, anyway, forever.