KS2012: Kernel build/boot testing

LWN.net needs you! Without subscribers, LWN would simply not exist. Please consider signing up for a subscription and helping to keep LWN publishing

The presentation given by Fengguang Wu on day 1 of the 2012 Kernel Summit was about testing for build and boot regressions in the Linux kernel. In the presentation, Fengguang described the test framework that he has established to detect and report these regressions in a more timely fashion.

To summarize the problem that Fengguang is trying to resolve, it's simplest to look at things from the perspective of a maintainer making periodic kernel releases. The most obvious example is of course the mainline tree maintained by Linus, which goes through a series of release candidates on the way to the release of a stable kernel. The linux-next tree maintained by Stephen Rothwell is another example. Many other developers depend on these releases. If for some reason, those kernel releases don't successfully build and boot, then the daily work of other kernel developers is impaired while they resolve the problem.

Of course, Linus and Stephen strive to ensure that these kinds of build and boot errors don't occur: before making kernel releases, they do local testing on their development systems, and ensure that the kernel builds, boots, and runs for them. The problem comes in when one considers the variety of hardware architectures and configuration options that Linux provides. No single developer can test all combinations of architectures and options, which means that, for some combinations, there are inevitably build and boot errors in the mainline -rc and linux-next releases. These sorts of regressions appear even in the final releases performed by Linus; Fengguang noted the results found by Geert Uytterhoeven, who reported that (for example) in the Linux 3.4 release, his testing found around 100 build error messages resulting from regressions. (Those figures are exaggerated because some errors occur on obscure platforms that see less maintainer attention. But they include a number of regressions on mainstream platforms that have the potential to disrupt the work of many kernel developers.) Furthermore, even when a build problem appears in a series of kernel commits but is later fixed before a mainline -rc release, this still creates a problem: developers performing bisects to discover the causes of other kernel bugs will encounter the build failures during the bisection process.

As Fengguang noted, the problem is that it takes some time for these regressions to be detected. By that time, it may be difficult to determine what kernel change caused the problem and who it should be reported to. Many such reports on the kernel mailing list get no response, since it can be hard to diagnose user-reported problems. Furthermore, the developer responsible for the problem may have moved on to other activities and may no longer be "hot" on the details of work that they did quite some time ago. As a result, there is duplicated effort and lost time as the affected developers resolve the problems themselves.

According to Fengguang, these sorts of regressions are an inevitable part of the development process. Even the best of kernel developers may sometimes fail to test for regressions. When such regressions occur, the best way to ensure they are resolved is to quickly and accurately determine the cause of the regression and promptly notify the developer who caused the regression.

Fengguang's solution to this problem is to automate a solution that detects these regressions and then informs kernel developers by email that their commit X triggered bug Y. Crucially, the email reports are generated nearly immediately (1-hour response time) after commits are merged into the tested repositories. (For this reason, Fengguang calls his system a "0-day kernel test" system.) Since the relevant developer is informed quickly, it's more likely they'll be "hot" on the technical details, and able to fix the problem quickly.

Fengguang's test framework at the Intel Open Source Technology Center consists of a server farm that includes five build servers (three Sandy Bridge and two Itanium systems). On these systems, kernels are built inside chroot jails. The built kernel images are then boot tested inside over 100 KVM instances on another eight test boxes. The system builds and boots each tested kernel configuration, on a commit-by-commit basis for a range of kernel configurations. (The system reuses build outputs from previous commits so as to expedite the build testing. Thus, the build time for the first commit of an allmodconfig build is typically ten minutes, but subsequent commits require two minutes to build on average.)

Tests are currently run against Linus's tree, linux-next , and more than 180 trees owned by individual kernel maintainers and developers. (Running tests against individual maintainers trees helps ensure that problems are fixed before they taint Linus's tree and linux-next .) Together, these trees produce 40 new branch heads and 400 new commits on an average working day. Each day, the system build tests 200 of the new commits. (The system allows trees to be categorized as "rebasable" or "non-rebasable". The latter are usually big subsystem trees for which the maintainers take responsibility to do bisectability tests before publishing commits. Rebaseable trees are tested on a commit-by-commit basis. For non-rebaseable trees, only the branch head is built; only if that fails does the system go though the intervening commits to locate the source of the error. This is why not all 400 of the daily commits are tested.)

The current machine power allows the build test system to test 140 kernel configurations (as well as running sparse and coccinelle) for each commit. Around half of these configurations are randconfig , which are regenerated each day in order to increase test coverage over time. ( randconfig builds the kernel with randomized configuration options, so as to find test unusual kernel configurations.) Most of the built kernels are boot tested, including the randconfig ones. Boot tests for the head commits are repeated multiple times to increase the chance of catching less-reproducible regressions. In the end, 30,000 kernels are boot tested in each day. In the process, the system catches 4 new static errors or warnings per day, and 1 boot error every second day.

The responses from the kernel developers in the room were extremely positive to this new system. Andrew Morton noted he'd received a number of useful reports from the tool. "All contained good information, and all corresponded to issues I felt should be fixed." Others echoed Andrew's comments.

One developer in the room asked what he should do if he has a scratch branch that is simply too broken to be tested. Fengguang replied that his build system maintains a blacklist, and specific branches can be added to that blacklist on request. In addition, a developer can include a line containing the string Dont-Auto-Build in a commit message; this causes the build system to skip testing of the whole branch.

Many problems in the system have already been fixed as a consequence of developer feedback: the build test system is fairly mature; the boot test system is already reasonably usable, but has room for further improvement. Fengguang is seeking further input from kernel developers on how his system could be improved. In particular, he is asking kernel developers for runtime stress and functional test scripts for their subsystems. (Currently the boot test system runs a limited set of tools—trinity, xfstests, and a handful of memory management tests—for catching runtime regressions.)

Fengguang's system has already clearly had a strong positive impact on the day-to-day life of kernel developers. With further feedback, the system is likely to provide even more benefit.

