A story of three kernel vulnerabilities

Did you know...? LWN.net is a subscriber-supported publication; we rely on subscribers to keep the entire operation going. Please help out by buying a subscription and keeping LWN on the net.

Software developers vary greatly in their ability to respond and patch zero-day vulnerabilities. In this study, the Linux platform had the worst response time, with almost three years on average from initial vulnerability to patch.

A security-oriented firm called Trustwave recently sent out a preview of an upcoming report [PDF] that features some focused criticism of how the Linux community handles security vulnerabilities. Indeed, it says: "" Whether or not one is happy with how security updates work with Linux, three years sounds like a rather longer response time than most of us normally expect. Your editor decided to examine the situation by focusing on two vulnerabilities that are said to be included in the Trustwave report and one that is not.

Three years?

As of this writing, Trustwave's full report is not available, so a detailed look at its claims is not possible. But, according to this ZDNet article, the average response time was calculated from these two "zero-day" vulnerabilities:

CVE-2009-4307: a divide-by-zero crash in the ext4 filesystem code. Causing this oops requires convincing the user to mount a specially-crafted ext4 filesystem image.

CVE-2009-4020: a buffer overflow in the HFS+ filesystem exploitable, once again, by convincing a user to mount a specially-crafted filesystem image on the target system.

The ext4 problem was reported on October 1, 2009 by R.N. Sastry, who had been doing some filesystem fuzz testing. The report included the filesystem image that triggered the bug — that is the "exploit code" that Trustwave used to call this bug a zero-day vulnerability. Since the problem was limited to a kernel oops, and since it required the victim's cooperation (in the form of mounting the attacker's filesystem) to trigger, the ext4 developers did not feel the need to drop everything and fix it immediately; Ted Ts'o committed a fix toward the end of November. SUSE was the first distributor to issue an update containing the fix; that happened on January 17, 2010. Red Hat did not put out an update until the end of March — nearly five months after the problem was disclosed — and Mandriva waited until February of 2011.

One might argue that things happened slowly, even for an extremely low-priority bug, but where does "three years" come from? It turns out that the fix did not work properly on the x86 architecture; Xi Wang reported the problem's continued existence on December 26, 2011, and sent a proper fix on January 9, 2012. A new CVE number (CVE-2012-2100) was assigned for the problem and the fix was promptly committed into the mainline. Distributors were a bit slow to catch up, though; Debian issued an update in March, Ubuntu in May, and Red Hat waited until mid-November — nearly eleven months after disclosure — to ship the fix to its users. The elapsed time from the initial disclosure until Red Hat's shipping an update that fixes the problem properly is, indeed, just over three years.

The story for the HFS/HFS+ vulnerability is similar. An initial patch fixing a buffer overflow in the HFS filesystem was posted by Amerigo Wang at the beginning of December, 2009. The fix was committed by Linus on December 15, and distributor updates began with Red Hat's on January 19, 2010. Some distributors were rather slower, but it was another hard-to-exploit bug that was deemed to have a low priority.

The problem is that the kernel supports another (newer) filesystem called HFS+. It is a separate filesystem implementation, but it contains a fair amount of code that was cut-and-pasted from the original HFS implementation, much like ext4 started with a copy of the ext3 code. The danger of this type of code duplication is well known: developers will fix a bug in one copy but not realize that the same issue may be present in the other copy as well. Naturally enough, that was the case here; the HFS+ filesystem had the same buffer overflow vulnerability, but nobody thought to do anything about it until Timo Warns quietly told a few kernel developers about it at the end of April 2012. Greg Kroah-Hartman committed a fix on May 4, and the problem was publicly disclosed a few days after that. Once again, a new CVE number (CVE-2012-2319) was assigned, and, once again, distributors dawdled with the fixes; openSUSE sent an update in June, while Red Hat waited until October, five months after the problem became known. The time period from the initial disclosure of the HFS vulnerability until Red Hat's update for the HFS+ problem was just short of three years.

One could look at this situation two ways. On one hand, Trustwave has clearly chosen its vulnerabilities carefully, then applied an interpretation that yielded the longest delay possible. Neither story above describes a zero-day vulnerability knowingly left open for three years; for most of that time, it was assumed that the problems had been fixed. That is doubly true for the HFS+ filesystem, for which the vulnerability was not even disclosed until May, 2012. Given the nature of the vulnerabilities, it is highly unlikely that the black hats were jealously guarding them in the meantime; the odds are good that no system has ever been compromised by exploiting either one of them. Trustwave's claims, if they are indeed built on these two vulnerabilities, are dubious and exaggerated at best.

On the other hand, even low-priority vulnerabilities requiring the victim's cooperation should be fixed — and fixed properly — in a timely manner, and it is not at all clear that happened with these problems. The response to the ext4 problem was arguably fast enough given the nature of the problem, but the fact that the problem persisted on the obscure x86 architecture suggests that the testing applied to that fix was, at best, incomplete. In the HFS/HFS+ case, one could argue that somebody should have thought to check for copies of the bug elsewhere. The fact that the HFS and HFS+ filesystems are nearly unused and nearly unmaintained did not help in this case, but attackers do not restrict themselves to well-maintained code. And, for both bugs, distributors took their time to get the fixes out to their users. We can do better than that.

Meanwhile, in 2013

Perhaps the slowness observed above is the natural response to vulnerabilities that nobody is actually all that worried about. Had they been something more serious, it could be argued, the response would have been better. As it happens, there is an open issue at the time of this writing that can be examined to see how well we do respond; the answer is a bit discouraging.

On January 20, a discussion on the private kernel security list went public with this patch posting by Oleg Nesterov. It seems that the Linux implementation of the ptrace() system call contains a race condition: a traced process's registers can be changed in a way that causes the kernel to restore that process's stack contents to an arbitrary location. The end result is the ability to run arbitrary code in kernel mode. It is a local attack, in that the attacker needs to be able to run an exploit program on the target system. But, given the ability to run such a program, the attacker can obtain full root privileges. That is the kind of vulnerability that needs quick attention; it puts every system out there at the mercy of any untrusted users that may have accounts there — or at the mercy of any attacker that may be able to compromise a network service to run an arbitrary program.

On February 15, the vulnerability was disclosed as such, complete with handy exploit code for those who do not wish to write their own. Most victims are unlikely to apply the kernel patch included with the exploit that makes the race condition easier to hit; the exploit also needs the ability to run a process with real-time priority to win the race more reliably. But, even without the patch or real-time scheduling, a sufficiently patient attacker should be able to time things right eventually. Solar Designer reacted to the disclosure this way:

I haven't looked into this closely yet, but at first glance it looks like the worst Linux kernel vulnerability in a few years. For distro vendor kernels (rather than mainline, which was patched almost a month ago), this is a 0-day.

Arguably this should not be a zero-day vulnerability: the public discussion of the fix is nearly one month old, and the private discussion had been going on for some time before. But, as of this writing, no distributors have issued updates for this problem. That leads to some obvious questions; quoting Solar Designer again:

The mainline commits from January are by Oleg Nesterov of Red Hat. Why wasn't(?) the issue handled with due severity within Red Hat, then - such that Red Hat would at the very least have a statement on whether and which of their kernels are affected by now.

One assumes that such a statement will be forthcoming in the near future. In the meantime, users and system administrators worldwide need to be worried about whether their systems are vulnerable and who might be exploiting the problem.