OpenSSH and the dangers of unused code

Please consider subscribing to LWN Subscriptions are the lifeblood of LWN.net. If you appreciate this content and would like to see more of it, your subscription will help to ensure that LWN continues to thrive. Please visit this page to join up and keep LWN on the net.

Unused code is untested code, which probably means that it harbors bugs—sometimes significant security bugs. That lesson has been reinforced by the recent OpenSSH "roaming" vulnerability. Leaving a half-finished feature only in the client side of the equation might seem harmless on a cursory glance but, of course, is not. Those who mean harm can run servers that "implement" the feature to tickle the unused code. Given that the OpenSSH project has a strong security focus (and track record), it is truly surprising that a blunder like this could slip through—and keep slipping through for roughly six years.

The first notice of the bug was posted by Theo de Raadt on January 14. He noted that an update was coming soon and that users could turn off the experimental client roaming feature by setting the undocumented UseRoaming configuration variable to "no". The update was announced by Damien Miller later that day. It simply disabled the roaming feature entirely, though it also fixed a few other security bugs as well. The problems have been present since the roaming feature was added to the client (but not the server) in OpenSSH 5.4, which was released in March 2010.

The bug was found by Qualys, which put out a detailed advisory that described two separate flaws, both of which were in the roaming code. The first is by far the most dangerous; it is an information leak that can provide the server with a copy of the client system's private SSH keys (CVE-2016-0777). The second is a buffer overflow (CVE-2016-0778) that is "unlikely to have any real-world impact" because it relies on two non-default options being used by the client ( ProxyCommand and either ForwardAgent or ForwardX11 )

The private keys of an SSH client are, of course, the most important secret that is used to authenticate the client to a server where the corresponding public key has been installed. An attacker who has that private key can authenticate to any of the servers authorized by the user, assuming that there is no second authentication factor required. So they can effectively act as that user on the remote host(s). It should be noted that password-protected private keys are leaked in their encrypted form, which would still allow an attacker to try to break the passphrase offline. Also, if an agent such as ssh-agent is used, no key material is leaked.

The Qualys advisory includes patches to the OpenSSH server that implement a proof of concept of what a malicious server could do. The proof of concept is incomplete as there are environment-variable parameters used in the examples in the advisory that are not present in that code (notably, "heap_massaging:linux").

At its core, the problem in the client code (aside from still being present long after the server side was removed) is that it uses a server-supplied length to determine the size of a buffer to allocate—without much in the way of sanity checks. It also allocates the buffer using malloc() , which doesn't clear the memory being allocated.

The roaming feature is meant to handle the case when the SSH connection is lost (due to a transient problem of some sort) and allow the client to reconnect transparently. The client stores data that it has sent, but may not yet have been received by the server (and might get lost during the interruption). After the reconnect, the server can request that the client "resend" a certain number of bytes—even if the client never sent that many bytes. The server-controlled offset parameter can be used to trick the client into sending the entire contents of the buffer even though it has not written anything to it, thus leaking the data that was previously stored there.

So malicious servers can offer roaming to clients during the key-exchange phase, disconnect the client, then request a whole buffer's worth of data be "resent" after reconnection. There are some conditions that need to be met in order to exploit the flaw that are described in the advisory, such as "heap massaging" to force malloc() to return sensitive data and guessing the client send buffer size. But Qualys was able to extract some private key information from clients running on a number of different systems (including OpenBSD, FreeBSD, CentOS, and Fedora).

Qualys initially believed that the information leak would not actually leak private keys for a few different reasons. For one, the leak is from memory that has been freed, but is recycled in a subsequent allocation, rather than reading data beyond the end of a buffer, such as in a more-typical buffer overflow. In addition, OpenSSH took some pains to clear the sensitive data from memory.

It turns out that some of those attempts to clear sensitive information (like private keys) out of memory using memset() and bzero() were optimized away by some compilers. Clang/LLVM and GCC 5 use an optimization known as "dead store elimination" that gets rid of store operations to memory that is never read again. Some of the changes in the OpenSSH update are to use explicit_bzero() to avoid that optimization in sensitive places.

But a much bigger factor in disclosing the key information is the use of the C library's standard I/O functions—in this case fopen() and friends. The OpenSSH client uses those functions to read in the key files from the user's .ssh directory; they do buffered I/O, which means they have their own internal buffers that are allocated and freed as needed. On Linux, that's not a problem because the GNU C library (Glibc) effectively cleanses the buffers before freeing them. But on BSD-based systems, freed buffers will contain data from previous operations.

It is not entirely clear why Qualys was able to extract key information on Linux systems given the Glibc behavior. The advisory does note that there may be other ways for the key material to leak "as suggested by the CentOS and Fedora examples at the end of this section".

Beyond that, OpenSSH versions from 5.9 onward read() the private key in 1KB chunks into a buffer that is grown using realloc() . Since realloc() may return a newly allocated buffer, that can leave partial copies of the key information in freed memory. Chris Siebenmann has analyzed some of the lessons to be learned from OpenSSH's handling of this sensitive data.

Interactive SSH users who were communicating with a malicious server might well have noticed a problem, though. The OpenSSH client prints a message, "[connection suspended, press return to resume]", whenever a server disconnect is detected. Since causing a disconnect is part of tickling the bug, that message will appear. It would likely cause even a non-savvy user to wonder—and perhaps terminate the connection with Ctrl-C, which would not leak any key information.

But a large number of SSH sessions are not interactive. Various backup scripts and the like use SSH's public-key authentication to authenticate to the server and do their jobs, as does the SSH-based scp command. As Qualys showed, those can be tricked into providing the needed carriage return to resume the connection. Thus they are prime targets for an attack using this vulnerability.

While the bug is quite serious, it is hard to believe it wouldn't have been found if both sides of the roaming feature had been rolled out. Testing and code inspection might have led the OpenSSH developers to discover these problems far earlier. It was presumably overlooked because there was no server code, so it "couldn't hurt" to have the code still present in the client. Enabling an experimental feature by default is a little harder to understand.

For a project that "is developed with the same rigorous security process that the OpenBSD group is famous for", as the OpenSSH security page notes, it is truly a remarkable oversight. It also highlights a lack of community code review. We are sometimes a bit smug in the open-source world because we can examine all of the security-sensitive code running on our systems. But it appears that even for extremely important tools like OpenSSH, the "can" does not always translate into "do". It would serve us well to change that tendency.

Companies and organizations like Qualys are likely to have done multiple code audits on the OpenSSH code over the last six years. Attackers too, of course. The latter are not going to publish what they find, but security researchers generally do. A high-profile bug like this in a security tool that is in widespread use is exactly the kind of bug they are looking for, so it is surprising this was missed (in white hat communities, anyway) for so long. In hindsight, leaving the unused code in the client seems obviously wrong—that's a lesson we can all stand to relearn.

