Ripples from Stack Clash

Did you know...? LWN.net is a subscriber-supported publication; we rely on subscribers to keep the entire operation going. Please help out by buying a subscription and keeping LWN on the net.

In one sense, the Stack Clash vulnerability that was announced on June 19 has not had a huge impact: thus far, at least, there have been few (if any) stories of active exploits in the wild. At other levels, though, this would appear to be an important vulnerability, in that it has raised a number of questions about how the community handles security issues and what can be expected in the future. The indications, unfortunately, are not all positive.

A quick review for those who are not familiar with this vulnerability may be in order. A process's address space is divided into several regions, two of which are the stack and the heap areas. The stack contains short-lived data tied to the running program's call chain; it is normally placed at a high address and it grows automatically (toward lower addresses) if the program accesses memory below the stack's current lower boundary. The heap, instead, contains longer-lived memory and grows upward. The kernel, as of 2010, places a guard page below the stack in an attempt to prevent the stack from growing into the heap area.

The "Stack Clash" researchers showed that it is possible to jump over this guard page with a bit of care. The result is that programs that could be fooled into using a lot of stack space could be made to overwrite heap data, leading to their compromise; setuid programs are of particular concern in this scenario. The fix that has been adopted is to turn the guard page into a guard region of 1MB; that, it is hoped, is too much to be readily jumped over.

Not a new problem

There are certain developers working in the security area who are quite fond of saying "I told you so". In this case, it would appear that they really told us so. An early attempt to deal with this problem can be found in this 2004 patch from Andrea Arcangeli, which imposed a gap of a configurable size between the stack and the heap. Despite being carried in SUSE's kernels for some time, this patch never found its way into the mainline.

In 2010, an X server exploit took advantage of the lack of isolation between the stack and the heap, forcing a bit of action in the kernel community; the result was a patch from Linus Torvalds adding a single guard page (with no configurability) at the bottom of the stack. It blocked the X exploit, and many people (and LWN) proclaimed the problem to be solved. Or, at least, it seemed solved once the various bugs introduced by the initial fix were dealt with.

In the comments to the above-linked LWN article (two days after it was published), Brad Spengler and "PaX Team" claimed that a single-page gap was insufficient. More recently, Spengler posted a blog entry in his classic style on how they told us about this problem but it never got fixed because nobody else knows what they are doing. The thing they did not do, but could have done if they were truly concerned about the security of the Linux kernel, was to post a patch fixing the problem properly.

Of course, nobody else posted such a patch either; the community can only blame itself for not having fixed this problem. Perhaps LWN shares part of that blame for presenting the problem as being fixed when it was not; if so, we can only apologize and try to do better in the future. But we might argue that the real problem is a lack of people who are focused on the security of the kernel itself. There are few developers indeed whose job requires them to, for example, examine and address stack-overrun threats. Ensuring that this problem was properly fixed was not anybody's job, so nobody did it.

The corporate world supports Linux kernel development heavily, but there are ghetto areas that, seemingly, every company sees as being somebody else's problem; security is one of those. The situation has improved a little in recent times, but the core problem remains.

Meanwhile, one might well ask: has the stack problem truly been fixed this time? One might answer with a guarded "yes" — once the various problems caused by the new patch are fixed, at least; a 1MB gap is likely to be difficult for an attacker to jump over. But it is hard to be sure, anymore.

Embargoes

Alexander "Solar Designer" Peslyak is the manager of both the open oss-security and the closed "distros" list; the latter is used for the discussion of vulnerabilities that have not yet been publicly disclosed. The normal policy for that list is that a vulnerability disclosed there can only be kept under embargo for a period of two weeks; it is intended to combat the common tendency for companies to want to keep problems secret for as long as possible while they prepare fixes.

As documented by Peslyak, the disclosure of Stack Clash did not follow that policy. The list was first notified of a problem on May 3, with the details disclosed on May 17. The initial disclosure date of May 30 was pushed back by Qualys until the actual disclosure date of June 19. Peslyak made it clear that he thought the embargo went on for too long, and that the experience would not be repeated in the future.

The biggest problem with the extended embargo, perhaps, was that it kept the discussion out of the public view for too long. The sheer volume on the (encrypted) distros list was, evidently, painful to deal with after a while. But the delay also kept eyes off the proposed fix, with the result that the patches merged by the disclosure date contained a number of bugs. The urge to merge fixes as quickly as possible is not really a function of embargo periods, but long embargoes fairly clearly delay serious review of those fixes. Given the lack of known zero-day exploits, it may well have been better to disclose the problem earlier and work on the fixes in the open.

That is especially true since, according to Qualys, the reason for the embargo extension was that the fixes were not ready. The longer embargo clearly did not result in readiness. There was a kernel patch of sorts, but the user-space side of the equation is in worse shape. A goal like "recompile all userland code with GCC's -fstack-check option" was never going to happen in a short period anyway, even if -fstack-check were well suited to this application — which it currently is not.

There is a related issue in that OpenBSD broke the embargo by publicly committing a patch to add a 1MB stack guard on May 18 — one day after the private disclosure of the problem. This has raised a number of questions, including whether OpenBSD (which is not a member of the distros list) should be included in embargoed disclosures in the future. But perhaps the most interesting point to make is that, despite this early disclosure, all hell stubbornly refused to break loose in its aftermath. Peslyak noted that:

This matter was discussed, and some folks were unhappy about OpenBSD's action, but in the end it was decided that since, as you correctly say, the underlying issue was already publicly known, OpenBSD's commits don't change things much.

As was noted above, the "underlying issue" has been known for many years. A security-oriented system abruptly making a change in this area should be a red flag for those who follow commit streams in the hope of finding vulnerabilities. But there appears to be no evidence that this disclosure — or the other leaks that apparently took place during the long embargo — led to exploits being developed before the other systems were ready. So, again, it's not clear that the lengthy embargo helped the situation.

Offensive CVE assignment

Another, possibly discouraging, outcome from this whole episode was a demonstration of the use of CVE numbers as a commercial weapon. It arguably started with this tweet from Kurt Seifried, reading: "CVE-2017-1000377 Oh you thought running GRsecurity PAX was going to save you?". CVE-2017-1000377, filed by Seifried, states that the grsecurity/PaX patch set also suffers from the Stack Clash vulnerability — a claim which its developers dispute. Seifried has not said whether he carried out these actions as part of his security work at Red Hat, but Spengler, at least, clearly sees a connection there.

Seifried's reasoning appears to be based on this text from the Qualys advisory sent to the oss-security list:

In 2010, grsecurity/PaX introduced a configurable stack guard-page: its size can be modified through /proc/sys/vm/heap_stack_gap and is 64KB by default (unlike the hard-coded 4KB stack guard-page in the vanilla kernel). Unfortunately, a 64KB stack guard-page is not large enough, and can be jumped over with ld.so or gettext().

The advisory is worth reading in its entirety. It describes an exploit against sudo under grsecurity, but that exploit depended on a second vulnerability and disabling some grsecurity protections. With those protections enabled, Qualys says, a successful exploit could take thousands of years.

It is thus not entirely surprising that Spengler strongly denied that the CVE number was valid; in his unique fashion, he made it clear that he believes the whole thing was commercially motivated; "this taints the CVE process," he said. Seifried defended the CVE as "legitimate" but suggested that he was getting tired of the whole show and might give up on it.

Meanwhile Spengler, not to be outdone, filed for a pile of CVE numbers against the mainline kernel, to the befuddlement of Andy Lutomirski, the author of much of the relevant code. Spengler made it appear that this was a retaliatory act and suggested that Lutomirski talk to Seifried about cleaning things up. "I am certain he will treat a member of upstream Linux the same as I've been treated, as he is a very professional and equitable person."

The CVE mechanism was created as a way to make it easier to track and talk about specific vulnerabilities. Some have questioned its value, but there does seem to be a real use for a unique identifier for each problem. If, however, the CVE assignment mechanism becomes a factory for mudballs to be thrown at the competition, it is likely to lose whatever value it currently has. One can only hope that the community will realize that turning the CVE database into a cesspool of fake news will do no good for anybody and desist from this kind of activity.

In conclusion

Our community's procedures for dealing with security issues have been developed over decades and, in many ways, they have served us well over that time. But they are also showing some signs of serious strain. The lack of investment in the proactive identification and fixing of security issues before they become an emergency has hurt us a number of times and will continue to do so. The embargo processes we have developed are clearly not ideal and could use improvement — if we only knew what form that improvement would take.

It is becoming increasingly apparent to the world as a whole that our industry's security is not at the level it needs to be. Hopefully, that will create some commercial incentives to improve the situation. But it also creates incentives to attack others rather than fixing things at home. That is going to lead to some increasingly ugly behavior; let us just hope that our community can figure out a way to solve problems without engaging in divisive and destructive tactics. Our efforts are much better placed in making Linux more secure for all users than in trying to take other approaches down a notch.

