Thoughts on the ext4 panic

This article brought to you by LWN subscribers Subscribers to LWN.net made this article — and everything that surrounds it — possible. If you appreciate our content, please buy a subscription and make the next set of articles possible.

In just a few days, a linux-kernel mailing list report of ext4 filesystem corruption turned into a widely-distributed news story; the quality of ext4 and its maintenance, it seemed, was in doubt. Once the dust settled, the situation turned out to be rather less grave than some had thought; the bug in question only threatened a very small group of ext4 users using non-default mount options. As this is being written, a fix is in testing and should be making its way toward the mainline and stable kernels shortly. The bug was obscure, but there is value in looking at how it came about and the ripples it caused.

The timeline

On October 23, user "Nix" was trying to help track down an NFS lock manager crash when he ran into a little problem: the crash kept corrupting his filesystem, making the debugging task rather more difficult than it would otherwise have been. He reported the problem to the linux-kernel mailing list; he also posted a warning for other LWN readers. The ext4 developers moved quickly to find the problem, coming up with a hypothesis within a few hours of the initial report. Unfortunately, the hypothesis turned out to be wrong.

Before that became clear, though, a number of news outlets had posted articles on the problem. LWN was not the first to do so ("first" is not at the top of our list of priorities), but, late on the 24th, we, too, posted an item about the issue. It quickly became clear, though, that the original hypothesis did not hold water, and that further investigation was in order. That investigation, as it turns out, took a few days to play out.

Eric Sandeen eventually tracked the problem down to this commit which found its way into the mainline during the 3.4 merge window. That change was meant to be a cleanup, gathering the inode allocation logic into a single function and removing some duplicated code. The unintended result was to cause the inode bitmap to be modified outside of a transaction, introducing unchecksummed data into the journal. If the system crashed during that time, the next mount would encounter checksum errors and refuse to play back the journal; the filesystem was then seen as being corrupt.

The interesting thing is that, on most systems, this problem will never come about because, on those systems, the journal checksums do not actually exist. Journal checksumming is an optional feature, not enabled by default, and, evidently, not widely used. Nix had turned on the feature somewhat inadvertently; most other users do not turn it on at all, even if they are aware it exists. Anybody who has journal checksums turned off will not be affected by this bug, so very few ext4 users needed to be concerned about potential data corruption.

As an interesting aside, checksums on the journal are a somewhat problematic feature; as seen in this discussion from 2008, it is not at all clear what the best response should be when journal checksums fail to match. The journal checksum may not be information that the system can reasonably act upon; indeed, as in this case, it may create problems of its own.

Eric's patch appears to fix the problem; corrupted journals that were easily observed before its application do not happen afterward. There will naturally be a period of review and testing before this change is merged into the mainline — nobody wants to create a new problem through undue haste — but kernel releases with a version of the fix (it has already been revised once) should be available to users in short order. But most users will not really care, since they were not affected by the problem in the first place. They may care more about the plans to improve the filesystem test suites so that regressions of this nature can be more easily caught in the future.

Analysis

In retrospect, the media coverage of this bug was clearly out of proportion to that bug's impact. One might attribute that to a desire for sensational stories to drive traffic, and that may well be part of what was going on. But there are a couple of other factors that are worth keeping in mind before jumping to that judgment:

Many media outlets employ editors and writers who, almost beyond belief, are not trained in kernel programming. That makes it very hard for them to understand what is really going on behind a linux-kernel discussion even if they read that discussion rather than basing a story on a single message received in a tip. They will see a subject like "Apparent serious progressive ext4 data corruption," along with messages from prominent developers seemingly confirming the problem, and that is what they have to go with. It is hard to blame them for seeing a major story in this thread.

Even those who understand linux-kernel discussions (LWN, in its arrogance, places itself in this category) can be faced with an urgent choice. If there were a data corruption bug in recent kernels, then we would be beyond remiss to fail to warn our readers, many of whom run the kernels in question. There comes a point where, in the absence of better information, there is no alternative to putting something out there.

The ext4 developers certainly cannot be faulted for the way this story went. They did what conscientious developers do: they dropped everything to focus on what appeared to be a serious regression affecting their users. They might have avoided some of the splash by taking the discussion private and not saying anything until they were certain of having found the real problem, but that is not the way our community works. It is hard to imagine that pushing development discussions out of the public view is going to make things better in the long run.

Thus, one might conclude that we are simply going to see an occasional episode like this, where a bug report takes on a life of its own and is widely distributed before its impact is truly understood. Early reports of software problems, arguably, should be treated like early software: potentially interesting, but likely to be in need of serious review and debugging. That's simply the world we live in.

A more serious concern may apply to the addition of features to the ext4 filesystem. Ext4 is viewed as the stable, production filesystem in the Linux kernel, the one we're supposed to use while waiting for Btrfs to mature. One might well question the addition of new features to this filesystem, especially features that prove to be rarely used or that don't necessarily play well with existing features. And, sure enough, Linux filesystem developers have raised just this kind of worry in the past. In the end, though, the evolution of ext4 is subject to the same forces as the rest of the kernel; it will go in the directions that its developers drive it. There is interest in enhancing ext4, so new features will find their way in.

Before getting too worried about this prospect, though, it is worth thinking about the history of ext4. This filesystem is heavily used with all kinds of workloads; any problems lurking within will certainly emerge to bite somebody. But problems that have affected real users have been exceedingly rare and, even in this case, the number of affected users appears to be countable without running out of fingers. Ext4, in other words, has a long and impressive record of stability, and its developers are determined to keep it that way; this bug can be viewed as the sort of exception that proves the rule. One should never underestimate the value of good backups, but, with ext4, the chances of having to actually use those backups remain quite small.

