Wakelocks and the embedded problem

Please consider subscribing to LWN Subscriptions are the lifeblood of LWN.net. If you appreciate this content and would like to see more of it, your subscription will help to ensure that LWN continues to thrive. Please visit this page to join up and keep LWN on the net.

The relationship between embedded system developers and the kernel community is known for being rough, at best. Kernel developers complain about low-quality work and a lack of contributions from the embedded side; the embedded developers, when they say anything at all, express frustrations that the kernel development process does not really keep their needs in mind. A current discussion involving developers from the Android project gives some insight into where this disconnect comes from.

Android, of course, is Google's platform for mobile telephones. The initial Android stack was developed behind closed doors; the code only made it out into the world when the first deployments were already in the works. The Android developers have done a lot of kernel work, but very little code has made made the journey into the mainline. The code which has been merged all went into the staging tree without a whole lot of initiative from the Android side. Now, though, Android developer Arve Hjønnevåg is making an effort to merge a piece of that project's infrastructure through the normal process. It is not proving to be an easy ride.

The most controversial bit of code is a feature known as "wakelocks." In Android-speak, a "wakelock" is a mechanism which can prevent the system from going into a low-power state. In brief, kernel code can set up a wakelock with something like this:

#include <linux/wakelock.h> wake_lock_init(struct wakelock *lock, int type, const char *name);

The type value describes what kind of wakelock this is; name gives it a name which can be seen in /proc/wakelocks . There are two possibilities for the type: WAKE_LOCK_SUSPEND prevents the system from suspending, while WAKE_LOCK_IDLE prevents going into a low-power idle state which may increase response times. The API for acquiring and releasing these locks is:

void wake_lock(struct wake_lock *lock); void wake_lock_timeout(struct wake_lock *lock, long timeout); void wake_unlock(struct wake_lock *lock);

There is also a user-space interface. Writing a name to /sys/power/wake_lock establishes a lock with that name, which can then be written to /sys/power/wake_unlock to release the lock. The current patch set only allows suspend locks to be taken from user space.

This submission has not been received particularly well. It has, instead, drawn comments like this from Ben Herrenschmidt:

looks to me like some people hacked up some ad-hoc trick for their own local need without instead trying to figure out how to fit things with the existing infrastructure (or possibly propose changes to the existing infrastructure to fit their needs).

or this one from Pavel Machek:

Ok, I think that this wakelock stuff is in "can't be used properly" area on Rusty's scale of nasty interfaces.

There's no end of reasons to dislike this interface. Much of it duplicates the existing pm_qos (quality of service) API; it seems that pm_qos does not meet Android's needs, but it also seems that no effort was made to fix the problems. The scheme seems over-engineered when all that is really needed is a "do not suspend" flag - or, at most, a counter. The patches disable the existing /sys/power/state interface, which does not play well with wakelocks. There is no way to recover if a user-space process exits while holding a wakelock. The default behavior for the system is to suspend, even if a process is running; keeping a system awake may involve a chain of wakelocks obtained by various software components. And so on.

The end result is that this code will not make it into the mainline kernel. But it has been shipped on large numbers of G1 phones, with many more yet to go. So users of all those phones will be using out-of-tree code which will not be merged, at least not in anything like its current form. Any applications which depend on the wakelock sysfs interface will break if that interface is brought up to proper standards. It's a bit of a mess, but it is a very typical mess for the embedded systems community. Embedded developers operate under a set of constraints which makes proper kernel development hard. For example:

One of the core rules of kernel development is "post early and often." Code which is developed behind closed doors gets no feedback from the development community, so it can easily follow a blind path for a long time. But embedded system vendors rarely want to let the world know about what they are doing before the product is ready to ship; they hope, instead, to keep their competitors in the dark for as long as possible. So posting early is rarely seen as an option.

Another fundamental rule is "upstream first": code goes into the mainline before being shipped to customers. Once again, even if an embedded vendor wants to send code into the mainline, they rarely want to begin that process before the product ships. So embedded kernels are shipped containing out-of-tree code which almost certainly has a number of problems, unsupportable APIs, and more.

Kernel developers are expected to work with the goal of improving the kernel for everybody. Embedded developers, instead, are generally solving a highly-specific problem under tight time constraints. So they do not think about, for example, extending the existing quality-of-service API to meet their needs; instead, they bash out something which is quick, dirty, and not subject to upstream review.

One could argue that Google has the time, resources, and in-house kernel development knowledge to avoid all of these problems and do things right. Instead, we have been treated to a fairly classic example of how things can go wrong.

The good news is that Google developers are now engaging with the community and trying to get their code into the mainline. This process could well be long, and require a fair amount of adjustment on the Android side. Even if the idea of wakelocks as a way to prevent the system from suspending is accepted - which is far from certain - the interface will require significant changes. The associated "early suspend" API - essentially a notification mechanism for system state changes - will need to be generalized beyond the specific needs of the G1 phone. It could well be a lot of work.

But if that work gets done, the kernel will be much better placed to handle the power-management needs of handheld devices. That, in turn, can only benefit anybody else working on embedded Linux deployments. And, crucially, it will help the Android developers as they port their code to other devices with differing needs. As the number of Android-based phones grows, the cost of carrying out-of-tree code to support each of them will also grow. It would be far better to generalize that support and get it into the mainline, where it can be maintained and improved by the community.

Most embedded systems vendors, it seems, would be unwilling to do that work; they are too busy trying to put together their next product. So this sort of code tends to languish out of the mainline, and the quality of embedded Linux suffers accordingly. Perhaps this case will be different, though; maybe Google will put the resources into getting its specialized code into shape and merged into the mainline. That effort could help to establish Android as a solid, well-supported platform for mobile use, and that should be good for business. Your editor, ever the optimist, hopes that things will work out this way; it would be a good demonstration of how embedded community can work better with the kernel community, getting a better kernel in return.

