Kernel maintenance, Brillo style

LWN.net needs you! Without subscribers, LWN would simply not exist. Please consider signing up for a subscription and helping to keep LWN publishing

The "Internet of things" has become a buzzword for a whole raft of new products, many of questionable value, that the industry would like to sell to us. It has also come to symbolize many of the problems afflicting the software development and delivery process. In a brief, fast-paced talk during the 2016 Linux Plumbers Conference's Android microconference, Kees Cook described how the Brillo project is trying to fix some of those problems, especially with regard to kernel maintenance.

Brillo, he said, is a software stack for the Internet of things based on the Android system. These deployments bring a number of challenges, starting with the need to support a different sort of hardware than Android normally runs on; target devices may have no display or input devices, but might well have "fun buses" to drive interesting peripherals. The mix of vendors interested in this area is different; handset vendors are present, but many more traditional embedded vendors can also be found there. Brillo is still in an early state of development.

There are a number of longstanding problems endemic to this area. Each device has its own special, static kernel version mixing changes from multiple trees, including the mainline, the common Android tree, and any vendor-specific trees. Fixes and new features must be backported to this kernel, and out-of-tree drivers must be carried forward for future products. As the number of products increases, the number of combinations of kernels, patch sets, and hardware configurations grows exponentially, leading to maintenance problems. This growth is relatively manageable when the problems are small, but one of Brillo's requirements is device support for at least five years after the last unit is sold. On that sort of time scale, exponential growth in maintenance issues is simply not sustainable.

The solution to this problem, according to Cook, is a simple matter of making two changes:

Maintain a single kernel for all systems, reducing patch combinations and backporting work.

Keep everything in the mainline kernel, reducing forward-porting work when a new kernel comes along.

Cook allowed as to how those principles might scare some vendors but, he said, if this approach seems too scary, "you're not testing enough."

Brillo is thus built in a single kernel tree containing the Android patches and all necessary vendor patches. This adds an interesting constraint, as it requires the vendors to all play well together with their own patches. These vendor patches should preferably be upstream anyway but, in any case, they must have been sent upstream for consideration. The kernel itself is the latest long-term support kernel from Greg Kroah-Hartman, and it follows the -stable updates as they are released. When a new long-term support kernel comes out, everything moves forward to that release.

Part of making this idea work is reducing the delta between the Brillo kernel and mainline. There are about 600 patches in the Android common kernel currently, Cook said; that has been reduced to less than 150 in the Brillo kernel. That was done by consolidating small patches, tweaking the Android user-space code to not need the patches in the first place, and upstreaming the patches that are easy to get merged.

The upgrade process has been tested once, in the move from the 4.1 to the 4.4 kernel. It went relatively easily and, happily, the list of add-on patches got quite a bit shorter, thanks to the upstreaming of a fair amount of vendor code. It was also possible to drop a whole bunch of backported patches, thanks to the newer kernel. This test may have only been run once so far but, Cook said, it demonstrates that the idea is "not entirely crazy."

For vendors who are afraid of regressions from kernel upgrades, Cook had some advice: get your code upstream. Then, create a better set of automatic tests to verify that everything is working. All vendors should be thinking about just what they fear might break and write tests to detect that when it happens. It is hard work, but it has to be done anyway to verify that things work in the first place; it also only has to be done once. Then perform regular testing on linux-next to catch problems before they end up in the next long-term support kernel.

Will this approach work? He certainly hopes so, he said. Something has to be done to get out of the "backport treadmill" that vendors are on now. Most vendors, he said, have already agreed to this approach, and they are becoming more proactive about upstreaming their code. Some vendors fear the five-year support rule but, for many in the embedded world, five years looks relatively short and doesn't bother them at all. "Handset vendors panic" at the idea, he said, but, in the end, they are going to have to decide between paying the up-front costs of upstreaming their code or the long-term costs of supporting old code for far longer than they have been accustomed to.

[Thanks to LWN subscribers for supporting our travel to the event.]

