Throwing one away

LWN.net needs you! Without subscribers, LWN would simply not exist. Please consider signing up for a subscription and helping to keep LWN publishing

One of the (many) classic essays inby Frederick Brooks is titled "Plan to throw one away." Our first solution to a complex software development problem, he says, is not going to be fit for the intended purpose. So we will end up dumping it and starting over. The free software development process is, as a whole, pretty good at the "throwing it away" task; some would argue that we'regood at it. But there are times when throwing one away is hard; the current discussion around control groups in the kernel shows how situation can come about.

What Brooks actually said (in the original edition) was:

In most projects, the first system built is barely usable. It may be too slow, too big, awkward to use, or all three. There is no alternative but to start again, smarting but smarter, and build a redesigned version in which these problems are solved. The discard and redesign may be done in one lump, or it may be done piece-by-piece. But all large-system experience shows that it will be done.

One could argue that free software development has taken this advice to heart. In most projects of any significant size, proposed changes are subjected to multiple rounds of review, testing, and improvement. Often, a significant patch set will go through enough fundamental changes that it bears little resemblance to its initial version. In cases like this, the new subsystem has, in a sense, been thrown away and redesigned.

In some cases it's even more explicit. The 2.2 kernel, initially, lacked support for an up-and-coming new bus called USB. Quite a bit of work had gone into the development of a candidate USB subsystem which, most people assumed, would be merged sometime soon. Instead, in May 1999, Linus looked at the code and decided to start over; the 2.2.7 kernel included a shiny new USB subsystem that nobody had ever seen before. That code incorporated lessons learned from the earlier attempts and was a better solution — but even that version was eventually thrown away and replaced.

Brooks talks about the need for "pilot plant" implementations to turn up the problems in the initial implementation. Arguably we have those in the form of testing releases, development trees, and, perhaps most usefully, early patches shipped by distributors. As our ability to test for performance regressions grows, we should be able to do much of our throwing-away before problems in early implementations are inflicted upon users. For example, the 3.6 kernel was able to avoid a 20% regression in PostgreSQL performance thanks to pre-release testing.

But there are times when the problem is so large and so poorly understood that the only way to gain successful "pilot plant" experience is to ship the best implementation we can come up with and hope that things can be fixed up later. As long as the problems are internal, this fixing can often be done without creating trouble for users. Indeed, the history of most software projects (free and otherwise) can be seen as an exercise in shipping inferior code, then reimplementing things to be slightly less inferior and starting over again. The Linux systems we run today, in many ways, look like those of ten years or so ago, but a great deal of code was replaced in the time between when those systems were shipped.

But what happens when the API design is part of the problem? User interfaces are hard to design and, when they turn out to be wrong, they can be hard to fix. It turns out that users don't like it when things change on them; they like it even less if their programs and scripts break in the process. As a result, developers at all levels of the stack work hard to avoid the creation of incompatible changes at the user-visible levels. It is usually better to live with one's mistakes than to push the cost of fixing them onto the user community.

Sometimes, though, those mistakes are an impediment to the creation of a proper solution. As an example, consider the control groups (cgroups) mechanism within the kernel. Control groups were first added to the 2.6.24 kernel (January, 2008) as a piece of the solution to the "containers" problem; indeed, they were initially called "process containers." They have since become one of the most deeply maligned parts of the kernel, to the point that some developers routinely threaten to rip them out when nobody is looking. But the functionality provided by control groups is useful and increasingly necessary, so it's not surprising that developers are trying to identify and fix the problems that have been exposed in the current ("pilot") control group implementation.

As can be seen in this cgroup TODO list posted by Tejun Heo, lot of those problems are internal in nature. Fixing them will require a lot of changes to kernel code, but users should not notice that anything has changed at all. But there are some issues that cannot be hidden from users. In particular: (1) the cgroup design allows for multiple hierarchies, with different controllers (modules that apply policies to groups) working with possibly different views of the process tree, and (2) the implementation of process hierarchies is inconsistent from one controller to the next.

Multiple hierarchies seemed like an interesting feature at the outset; why should the CPU usage controller be forced to work with the same view of the process tree as, say, the memory usage controller? But the result is a more complicated implementation that makes it nearly impossible for controllers to coordinate with each other. The block I/O bandwidth controller and the memory usage controller really need to share a view of which control group "owns" each page in the system, but that cannot be done if those two controllers are working with separate trees of control groups. The hierarchy implementation issues also make coordination difficult while greatly complicating the lives of system administrators who need to try to figure out what behavior is actually implemented by each controller. It is a mess that leads to inefficient implementations and administrative hassles.

How does one fix a problem like this? The obvious answer is to force the use of a single control group hierarchy and to fix the controllers to implement their policies over hierarchies in a consistent manner. But both of those are significant, user-visible API and behavioral changes. And, once again, a user whose system has just broken tends to be less than appreciative of how much better the implementation is.

In the past, operating system vendors have often had to face issues like this. They have responded by saving up all the big changes for a major system update; users learned to expect things to break over such updates. Perhaps the definitive example was the transition from "Solaris 1" (usually known as SunOS 4 in those days) to Solaris 2, which switched the entire system from a BSD-derived base to one from ATT Unix. Needless to say, lots of things broke in the process. In the Linux world, this kind of transition still happens with enterprise distributions; RHEL7 will have a great many incompatible changes from RHEL6. But community distributions tend not to work that way.

More to the point, the components that make up a distribution are typically not managed that way. Nobody in the kernel community wants to go back to the pre-2.6 days when major features only got to users after a multi-year delay. So, if problems like those described above are going to be fixed in the kernel, the kernel developers will have to figure out a way to do it in the regular, approximately 80-day development cycle.

In this case, the plan seems to be to prod users with warnings of upcoming changes while trying to determine if anybody really has difficulties with them. So, systems where multiple cgroup hierarchies are in use will emit warnings to the effect that the feature is deprecated and inviting email from anybody who objects. Similar warnings will be put into specific controllers whose behavior is expected to change. Consider the memory controller; as Tejun put it: "memcg asked itself the existential question of to be hierarchical or not and then got confused and decided to become both." The plan is to get distributors to carry a patch warning users of the non-hierarchical mode and asking them to make their needs known if the change will truly be a problem for them. In a sense, the distributors are being asked to run a pilot for the new cgroup API.

It is possible that the community got lucky this time around; the features that need to be removed or changed are not likely to be heavily used. In other cases, there is simply no alternative to retaining the older, mistaken design; the telldir() system call, which imposes heavy implementation costs on filesystems, is a good example. We can never preserve our ability to "throw one away" in all situations. But, as a whole, the free software community has managed to incorporate Brooks's advice nicely. We throw away huge quantities of code all the time, and we are better off for it.

