Booting Debian in 14 seconds

Posted by endecotp on Mon 10 Nov 2008 at 22:42

Many readers will have heard about Arjan van de Ven and Auke Kok's work to boot an ASUS Eee 901 in 5 seconds. Inspired by this work, and because I have the same laptop, I decided to try to reproduce their results. So far I have not come very close to their 5 seconds, but I have made some significant improvements compared to the default boot time for Debian on that machine; this article describes what I've done.

Although some of what follows is specific to the Eee 901, most of it isn't and could be applied to other laptops and PCs in general.

This article assumes that you're already familiar with things like building kernels, applying patches and so on. The target audience is the "advanced end user", and also the Debian developers responsible for the packages concerned who I hope will be motivated to incorporate some of this work.

It's worth noting that many of the things that are described here are already making their way into the upstream sources, so the lazy reader might decide simply to wait for all this fast-booting goodness to arrive in its own good time.

Instrumenting the boot process

Your first step should be to measure how the time is currently being spent while your machine boots. Then optimise the slow bits, and don't worry about the bits that are already fast.

A couple of tools are available for measuring the time taken during boot and visualising the results. I suggest that you install these tools first and save their results somewhere safe: I have not done so, and so I can no longer show you how slowly my machine booted before I started fixing it, which is a shame The total time was, IIRC, 33 seconds from the end of Grub to the xdm login dialog being visible; I've knocked 19 seconds off that.

bootchart

bootchart is available as a Debian package. Install it and boot with "init=/sbin/bootchartd" added to the kernel command line. (In Grub, select the kernel using the cursor keys, press e, select the line with the kernel command line, press e, edit, press return, and then press b.) Then run the bootchart utility which reads the log written during boot and creates an SVG graph. You can view the resulting file using most web browsers, or you can try "see" which will probably launch inkscape.

The bootchart will show you which processes took the most time, and you can also see how much time was spent waiting for I/O and how much time was CPU-limited. If the results don't seem to make much sense, try running bootchart with its -n option; this makes the results more verbose.

bootgraph

This similarly-named utility plots a graph showing how the kernel spent its time during initialisation, i.e. the blank period at the beginning of the bootchart. The script is included in the scripts/ directory of the kernel source, but I believe it is only in Linus' tree since 2.6.28-rc1. If you have an earlier kernel you can probably download the script alone; there is one kernel patch (to init/main.c) but I don't think it's vital unless you're also using asynchronous init calls, as described below.

To use bootgraph, boot with "initcall_debug" added to the kernel command line and then run "dmesg|perl scripts/bootgraph.pl > bootgraph.svg".

Fix the really obvious things

Before spending time on the hard stuff, fix these easy and obvious things:

Minimise the time that Grub waits before booting its default kernel by adjusting the timeout paramter in /boot/grub/menu.lst. I believe that the Debian default is 5 seconds.

Remove anything that takes time at boot that you're not using. (Personally I find it's easier to not install such things in the first place...)

If you're using a cpufreq governor, make sure that boot runs at full speed. (I load the powersave governor mainly because it makes it unlikely that the fan will ever come on - I don't like fans. However, when booting from cold it's unlikely that the fan will be needed even at full speed. So I load the cpufreq governor at S99.)

Now on to the more complex stuff.

Building a fast-booting kernel

There are a number of things that you can do to the kernel to make it boot faster:

You can eliminate the initrd or initramfs. These features make it possible for Debian to ship a kernel that will boot on a lot of different hardware without the bloat of building-in drivers for everyone's root disks. But it results in slower boot. If you build in the essential drivers for your root filesystem an initrd is not needed.

You can build in drivers for all of your hardware, rather than having udev load modules for them afterwards. Again this conflicts with a distribution's desire to provide a kernel package that works with all hardware, but by avoiding all the work that udev does loading modules this can make boot faster.

There are a few patches that reduce unnecessary delays during boot, described below.

Configuring a kernel with built-in drivers

I have been thinking about how a distribution like Debian could make it easier for users to create custom kernels that build in all of the drivers needed for their hardware. What I've come up with is the following:

The user boots a conventional Debian all-modular kernel, checking first that they don't have any extraneous USB devices or similar hardware attached.

The conventional udev startup will load all of the modules needed to drive their hardware.

lsmod will report which modules were loaded By some means we map from the module names to the kernel config settings that enable them, and change them from "m" to "y" so that they will be built in.

They then build and install a kernel with this new config.

The hard bit is the third step above. Luckily I found a script by Steven Rostedt that did almost what was needed - it did the hard part of mapping from module names to config settings - and I adapted it to buildin_used_mods.pl (local copy). Run this at the root of your kernel tree; it will write the new .config to stdout.

This script seems to do a good job, but it's not perfect. The particular problem that I found was that although it determines the correct config setting for the IDE hardware and sets it to "y", it doesn't know that it must also set the higher-level setting CONFIG_IDE to "y". Furthermore, when you "make menuconfig" it will detect this inconsistency and fix it in the wrong way by changing the IDE driver back to a module. The solution to this is to "make menuconfig" before running the script and to change CONFIG_IDE to "y". There may be other such problems; is there a way to automatically resolve them correctly?

A further useful but non-essential step, since it makes the kernel build more quickly, would be to disable all of those modules that are for internal hardware that we don't have, so that we only build modular drivers for things like USB devices.

So, could we have a Debian kernel package that did all of that automagically?

Kernel patches for faster booting

I have applied the following patches to improve boot time:

This patch, which I believe is in 2.6.28-rc1, eliminates some unnecessary locking in the driver-to-device matching code. Believe it or not, without this patch the pc speaker driver will wait until the mouse has been initialised (which may take several seconds) in order to check whether it is actually a speaker. Now, it still does the check but it doesn't take the lock before doing so. Of course it's not only that particular pair of devices but rather every pair of devices on every bus; it just happened to be that pair that wasted the most time in my bootgraph.

The Eee 901 uses PCI Express hotplug (pciehp) to toggle the Wifi power. This driver had a number of 1-second pauses which slow boot and also suspend/resume; all of them have now been eliminated for this hardware thanks to a couple of patches, this one which has made it into Linus' tree and I believe 2.6.28-rc2, and this one which hasn't.

One of Arjan's main innovations to achieve his fast boot time was to introduce more concurrency during the kernel startup: specifically, some drivers that are not on the critical path to getting the root filesystem mounted are initialised on an asynchronous thread. In particular, USB seems to take a while to initialise, as does the Eee's ACPI battery monitor. This work can be found in its own git tree. I'm not sure when we can expect to see this merged; for example, someone will have to decide which drivers should be on the async thread and which not, and the answer might be "it depends" in a lot of cases. Anyway, Arjan's choices are good for the Eee 901 and I have saved a bit of time by using it.

Eliminating coldplugging

In most modern Linux systems, whether or not they have modular kernels, soon after the kernel has booted the udev daemon performs "coldplugging". This enumerates all of the devices present at boot time and loads kernel modules, creates /dev entries, and does anything else necessary to get the device working. It's called coldplugging because these are the same operations that are done for hotplugged devices, except that they're not in response to hotplugging events.

Looking at bootcharts it's clear that this takes quite some time. Building all of the drivers in to the kernel, rather than having modules, makes some difference but that is not where all the time goes: even when the drivers are built in, the udev daemon will still run modprobe which wastes some time before realising that it's a no-op.

It may be possible to speed this up by making the udev system smarter in some way. But I've followed Arjan's approach and used a pre-populated /dev. For this to work, you need to be sure that:

The only action that udev would do for the devices is to create /dev entries. Often udev would load modules, but we don't have to worry about that as everything is built in. In principle, udev rules can carry out arbitrary actions though this is rare.

The device major/minor numbers aren't going to change from one boot to the next. I'm unclear about this and would welcome advice! For example, if the order in which disks appear is non-deterministic (as it is with USB devices) then this is broken.

I've also been told that HAL relies on udev and that X version 1.5 relies on HAL; since I use neither of these I don't know the whole story and it may be that the touchpad is the only affected device. Can anyone shed any light on this?

It's important to note that pre-populating /dev and not doing coldplugging does not mean that you have to give up hotplugging. The approach that I describe here still starts the udev daemon to handle hotplugged devices, and also removeable devices that are attached at boot.

It is relatively simple to use a fixed /dev on a "locked down" system, but it's more of a challenge to do it on a system like Debian which can run on different hardware. I have therefore used the following method:

Initially the system is booted with an unmodified udev system which does conventional coldplugging to populate /dev.

Immediately that coldplugging is finished, tar is used to record the contents of /dev.

On subsequent boots, the tar file is detected and coldplugging is not done but instead the tar file is extracted to create the contents of /dev. udevd is still used to handle hotplugging and coldplugging of removeable devices.

If at any time it's necessary to update the contents of /dev, perhaps because new hardware has been added to a desktop machine or if a new kernel has been installed, the tar file can be removed and the process is repeated.

I've implemented this by modifying the standard Debian /etc/initd/udev script; my modified version can be downloaded here (local copy). As you'll see if you diff that against your regular script, my changes are quite limited in scope and more than a bit hacky. No doubt the implementation could be improved, but first we need to decide whether this is the right strategy.

Disk read-ahead

Bootchart shows that the system spends quite a lot of its time at well below 100% CPU utilisation, waiting for the disk. A technique that Arjan and Auke used to alleviate this is read-ahead, i.e. to prefetch from the disk those files (or parts of files?) that it's known will be needed later in the boot. Debian already packages another readahead program, but Arjan and Auke have invented Super ReadAhead. I'm not aware of how it differs and it seems to lack documentation; however, I was able to get it to work by following the instructions posted on the download page by John Lamb.

The improvement resulting from read-ahead is worth having, but is not spectacular. It's a technique that's worth applying as well as everything else described here, but by itself I think you're unlikely to notice the improvement unless you use a stopwatch.

Setting the clock

Setting the clock, i.e. reading the hardware battery-backed clock into the kernel, seemed to be taking an inordinate amount of time. There turned out to be about 3 factors involved in this:

Debian sets the clock twice, via the hwclock.sh and hwclockfirst.sh init scripts. I'm still unsure why this is; see Debian bug 327584. I've removed one of the scripts and nothing seems to have broken.

On some systems, including the Eee until a recent kernel fix, hwclock's --directisa option was used. This option causes hwclock to use more CPU, so you should not enable it unless you believe that your combination of hardware and kernel needs it.

Most seriously, hwclock waits until the seconds in the hardware clock tick over; this will take on average half a second, except that in the case where hwclock is run twice (see above) the second invocation will take nearer a second. Fix this and the other problems don't matter any more.

The underlying issue with the last point is that the hardware doesn't tell us fractional seconds. So if we want our clock to be accurate we need to wait for the hardware to tick over. But do we actually need our clock to be that accurate? (And if we later run ntp, the inaccuracy will only be temporary.) If you're happy with your clock being wrong by up to plus or minus half a second, this patch that I knocked together adds a --notickwait option to hwclock. This makes hwclock almost instantaneous.

An alternative might be to run hwclock in parallel with other initialisation. The problem with this is that it can't start until /dev/rtc has been created and it needs to be done by the time fsck runs, and this is a fairly small window.

NFS

If you don't run NFS you can ignore this section - though you might like to double-check that you don't have any unused NFS packages installed that are slowing down your boot.

In my case, I use NFS with autofs on my Eee to access filesystems on other local machines. But this is something that I use only rarely, and certainly only when I'm at home. It turns out that there's a significant boot delay that can be avoided unless NFS was in use when the machine was last shut down.

The process to look out for is sm-notify, and it took up a big chunk of my bootchart with a very large associated peak in disk activity. It seems that the purpose of sm-notify is to send a message to those NFS servers that the machine was using before shutdown to tell them that it is now back up. But before starting to send these messages, it does something which has the side-effect of invoking sync() and causing all pending writes to be flushed out to disk. That takes ages.

This is especially wasteful in the case where you didn't use NFS at all during the last session, so there are no servers to notify. For me this is the common case. So I have written this patch against nfs-utils version 1.13 which detects the case when there are no servers to communicate with and terminates early, before the sync(). This patch has now been applied upstream and is included in nfs-utils 1.14 - however, there is a some doubt about whether it is really safe in all cases. You might want to review this thread to see if this has been resolved.

Starting X sooner

X takes a long time to start. At some point there should be a significant improvement to this when "kernel modesetting" is introduced - perhaps in 2.6.29. If you're keen you could try to use this now - you'll need kernel patches and a new X server - but I'm going to wait.

Some of the X startup time can be hidden by running it in parallel with other activity. At present, Debian starts xdm as the very last thing (at S99). gdm starts earlier at S30, but that's still quite late in the boot process. I now start X at S04.

Quite how early you're prepared to start it depends on what other services X depends on. In particular, does X need that the network is up? In some cases it makes sense to wait; an example would be when home directories are on NFS. However even in that case it would still be possible to start xdm and let the user type their username and password; if necessary it could wait for the network at that point. On a laptop, however, it's very unlikely that X (or anything much) will depend on the network being up. Perhaps something in the X packages could automatically detect or ask the user about these dependencies and start X at the earliest safe opportunity.

Note that if you start X early you may not want to shut it down late. Typically, startup and shutdown scripts are symetrical but you might want to make an exception in this case. The example that was pointed out to me was taking away networked filesystems before the programs that are using them have terminated. I've left xdm at K99.

Starting networking later

As noted above, on a laptop in particular it's unlikely that very much depends on networking being up during boot. And startng networking can be slow, especially if DHCP is involved. So I postpone starting the network until late in the boot where it will run in parallel with X starting up.

There are a couple of subtleties:

The driver for the Eee 901's wifi is an out-of-tree module that can't be built in to the kernel.

Network devices are a case where udevd does do more than just load modules and create /dev nodes.

I have therefore adopted the following scheme:

During initial coldplugging I skip network devices. When I'm using the pre-populated /dev I skip them anyway because I only coldplug USB devices, but when I'm not using the pre-populated /dev for some reason I still skip network devices. I have to match the wifi device by its PCI id since at that point the kernel hasn't recognised that it is a network device. This is in my modified udev script linked above

I have a coldplug_networking script that runs at S09, i.e. after xdm. This coldplugs the wifi device and the other network devices

Conclusions and future work

Using the methods described above, the boot time for my Eee 901 from the end of Grub to the xdm login dialog being visible has been reduced from about 33 seconds to about 14 seconds. Here are the bootgraph and bootchart for the system as it is now. Perhaps also of interest to Eee 901 users is my kernel config (local copies).

The "sore thumb" that still stands out in those 14 seconds is the startup time for X. (However, it doesn't stand out in the bootchart as that stops when the rc scripts have finished, which is several seconds before the login dialog appears.) But there is hope there, and I'm happy to wait for a few months and see how the kernel modesetting stuff pans out.

In addition to those 14 seconds, there's also the time taken by the BIOS before Grub runs; that seems to vary a bit, maybe 4 seconds when rebooting up to 10 seconds when powering on. It would be great to reduce that; maybe Intel are secretly working on this, or if not perhaps we could use Coreboot (AKA LinuxBIOS). I note that CoreBoot has recently announced support for some of the chips in the '901. This isn't something I'm planning to work on myself, but if someone would like to post a recipe for how to put CoreBoot on an Eee without bricking it, I'd love to see it!

I hope that this article inspires some other users to see what can be done on their own machines. Also, I hope that the Debian developers responsible for some of the affected packages can think about what they can do. So, over to you...