Bug#727708: init system other points, and conclusion

To: 727708@bugs.debian.org

Subject: Bug#727708: init system other points, and conclusion

From: Russ Allbery <rra@debian.org>

Date: Sun, 29 Dec 2013 16:10:10 -0800

Message-id: <[🔎] 87r48vnqql.fsf@windlord.stanford.edu>

Reply-to: Russ Allbery <rra@debian.org>, 727708@bugs.debian.org

In-reply-to: <[🔎] 21183.21942.130759.900867@chiark.greenend.org.uk> (Ian Jackson's message of "Sat, 28 Dec 2013 22:50:30 +0000")

References: <[🔎] 21183.21942.130759.900867@chiark.greenend.org.uk>

We seem to be at the point of the process where at least those of us who did early investigation are stating conclusions. I think I have enough information to state mine, so will attempt to do so here. This is probably going to be rather long, as there were quite a few factors that concerned me and that I wanted to investigate. The brief summary is that I believe Debian should adopt systemd as its default init system on Linux. There are two separate conceptual areas in which I think systemd offers substantial advantages over upstart, each of which I would consider sufficient to choose systemd on its own. Together, they make a compelling case for systemd. This position would have substantial implications for upgrade paths and for non-Linux ports; I'll discuss a bit of that below, but most of it in the separate branch of this bug report that Ian opened on that topic. Below, I first discuss the other choices before us besides systemd and upstart. Then I look at a straight technical comparison between the two init systems, and finally look at issues of maintenance, community, ecosystem, and portability. The three main criteria on which I was evaluating both systems were technical capabilities, surrounding ecosystem, and portability. The latter two turned out to be deeply entangled, so I discuss them together. 1. Other Choices First, other choices besides systemd and upstart. There were three replacement init systems proposed to the Technical Committee to replace sysvinit, plus the existing status quo. The third option, OpenRC, is a more conservative and less revolutionary change than either systemd or upstart. It continues to use the existing sysvinit init process but replaces the startup script management with a more robust shell library and additional features. I think the OpenRC developers are great people and I wish them all the success in the world with their project, but I just don't think it's ambitious enough for Debian's needs. If we're going to the effort of replacing init systems and changing our startup scripts, a bare minimum requirement for me is that we at least address the known weaknesses of the sysvinit mechanism, namely: * Lack of integration with kernel-level events to properly order startup. * No mechanism for process monitoring and restarting beyond inittab. * Heavy reliance on shell scripting rather than declarative syntax. * A fork and exit with PID file model for daemon startup. My impression of OpenRC is that it is not attempting to solve these issues in the same way that systemd and upstart are. To the extent that these issues are on the OpenRC roadmap, it's not as far along as either systemd or upstart is. It's difficult to evaluate since the OpenRC documentation is rather sparse and lacks the comprehensive manual available to both systemd and upstart, which is itself a sign of a lack of project maturity. I don't think that switching to OpenRC offers enough clear benefit over the status quo. That raises the other obvious option: sticking with sysvinit. I've made my position on this fairly clear in other threads, so I won't reiterate it here at length. The short version is that I turned to other tools to manage daemons years ago because sysvinit was simply inadequate, and my feeling on that hasn't changed. The model of fork and exit without clear synchronization points is inherently racy, the boot model encoded into sysvinit doesn't reflect a modern system boot, and maintaining large and complex init scripts as conffiles has been painful for years. Nearly every init script, including the ones in my own packages, have various edge-case bugs or problems because it's very hard to write robust service startup in shell, even with the excellent helper programs and shell libraries that Debian has available. A quick perusal of /etc/init.d/skeleton and the complex case statements and careful attention to status codes required for a proper init script makes this case clear. I think the choice of a default init system for Linux is a choice between systemd and upstart. We would be doing ourselves and our users a disservice to stick with the status quo, or even a moderate update of the status quo to add a simpler service definition. The limitations have been well-known for years, and I think it's telling that most other operating systems, even fairly conservative ones, have moved away from the System V init script model. The last option that was before us was supporting multiple init systems. I consider this a variation on a transition plan, with a possibly infinite time horizon, and will discuss this separately when I talk about transition plans. 2. Core Service Management Functionality As reported to this bug, I did a fairly extensive evaluation of both upstart and systemd by converting one of my packages, which provides a network UDP service, to a native configuration with both systems. While doing so, I tried to approach each init system on its own terms and investigate what full, native support of that init system would look like, both from a Debian packaging perspective and from an upstream perspective. I also tried to work through the upgrade path from an existing init script with an external /etc/default configuration file and how that would be handled with both systemd and upstart. I started this process with the expectation that systemd and upstart would be roughly evenly matched in capabilities. My expectation was that I would uncover some minor differences and variations, and some different philosophical approaches, but no deeply compelling differentiation. To my surprise, that's not what happened. Rather, I concluded that systemd has a substantial technical advantage over upstart, primarily in terms of available useful features, but also in terms of fundamental design. 2.1. General Impressions systemd feels like a software package that has been used and pounded on in a wide variety of real-world situations, and has grown the flexibility and adaptibility that is required to make a wide variety of use cases work. upstart, on the other hand, has a minimal design and a ready escape to shell scripting, which may have discouraged directly tackling a broader array of use cases. Regardless, there are a bunch of cases that systemd handles cleanly with simple configuration that would require shell script fragments or other workarounds in Ubuntu, which in turn makes the startup configurations less reliable and harder to debug. I was quite impressed throughout the process of developing systemd unit files. Every time I realized I needed some piece of functionality to configure the daemon properly, systemd already had it. 2.2. Major Functionality Gaps Here are the major pieces of functionality that I think would have to be added to upstart for rough feature parity: * Socket activation, by which I don't mean lazy start of daemons, although it enables that, but init management of socket setup so that daemons can start in parallel. This has been discussed elsewhere on the thread, but I want to note here that systemd's approach is bold and innovative. We've had multiple discussions in Debian lists in the past where people have felt somewhat depressed or discouraged about Debian's lack of innovation or unwillingness to tackle sweeping improvements. After having studied and implemented socket activation, I think this is one of those opportunities, and we should not pass it by. There are a variety of advantages to socket activation that have been discussed elsewhere, and I'm not going to repeat them all here. But one I want to call out is the advantage for an enterprise systems administration environment. Right now, in order to configure bind addresses or IPv6 behavior for my services, I have to dig into the individual configuration syntax or command-line flags of each separate daemon, and often there's no easy way to set these parameters without making intrusive changes to daemon startup. Socket activation lets me manage all of this through a simple configuration override that I drop into /etc via (for example) Puppet, and the syntax is the same for every service that uses it. It would obviously take quite some time to get there, but that's a really nice vision of the future, and one that would make a real difference for Debian use cases I care about. upstart has a socket activation protocol, but it would need an almost-complete redesign in order to be used the way that systemd's can be used. It doesn't support passing multiple sockets (required for complex daemons, some IPv6 scenarios, and binding to multiple but not all interfaces), it doesn't support IPv6 at all, it doesn't support UDP sockets, and its configuration syntax is inadequate to represent the parameters that would be useful in a real-world case. It also doesn't separate the socket configuration from the daemon configuration, which makes it harder for a local systems administrator to control binding behavior without changing other properties of daemon initialization. * Integrated daemon status. This one caught me by surprise, since the systemd journal was functionality that I expected to dislike. But I was surprised at how well-implemented it is, and systemctl status blew me away. I think any systems administrator who has tried to debug a running service will be immediately struck by the differences between upstart: lbcd start/running, process 32294 and systemd: lbcd.service - responder for load balancing Loaded: loaded (/lib/systemd/system/lbcd.service; enabled) Active: active (running) since Sun 2013-12-29 13:01:24 PST; 1h 11min ago Docs: man:lbcd(8) http://www.eyrie.org/~eagle/software/lbcd/ Main PID: 25290 (lbcd) CGroup: name=systemd:/system/lbcd.service └─25290 /usr/sbin/lbcd -f -l Dec 29 13:01:24 wanderer systemd[1]: Starting responder for load balancing... Dec 29 13:01:24 wanderer systemd[1]: Started responder for load balancing. Dec 29 13:01:24 wanderer lbcd[25290]: ready to accept requests Dec 29 13:01:43 wanderer lbcd[25290]: request from ::1 (version 3) Both are clearly superior to sysvinit, which bails on the problem entirely and forces reimplementation in every init script, but the systemd approach takes this to another level. And this is not an easy change for upstart. While some more data could be added, like the command line taken from ps, the most useful addition in systemd is the log summary. And that relies on the journal, which is a fundamental design decision of systemd. And yes, all of those log messages are also in the syslog files where one would expect to find them. And systemd can also capture standard output and standard error from daemons and drop that in the journal and from there into syslog, which makes it much easier to uncover daemon startup problems that resulted in complaints to standard error instead of syslog. This cannot even be easily replaced with something that might parse the syslog files, even given output forwarding to syslog (something upstart currently doesn't have), since the journal will continue to work properly even if all syslog messages are forwarded off the host, stored in some other format, or stored in some other file. systemd is agnostic to the underlying syslog implementation. * Security defense in depth. Both upstart and systemd support the basics (setting the user and group, process limits, and so forth). However, systemd adds a multitude of additional defense in depth features, ranging from capability limits to private namespaces or the ability to deny a job access to the network. This is just a simple matter of programming on the upstart side, but it still contributes to the general feature deficit; the capabilities in systemd exist today. I'm sure I'm not the only systems administrator who is expecting security features and this sort of defense in depth to become increasingly important over the next few years. Here again, I think we have an opportunity for Debian to be more innovative and forward-looking in what we attempt to accomplish in the archive by adopting frameworks that let us incorporate the principles of least privilege and defense in depth into our standard daemon configurations. There are also a plethora of minor features and tuning available in systemd but not in upstart. None of this is as significant as the points mentioned above, and none of it is as difficult to implement, but it's not currently implemented, and I think it speaks to systemd having been tested against a broader array of use cases. 2.3. Event vs. Dependency Model There is one UI design difference between systemd and upstart that's less clear-cut, but which I think will surprise people. systemd is built around familiar dependencies between services, and starts services in dependency order. There are some twists, such as allowing a service to create a reverse dependency (make another service depend on it), but it's the basic design that's familiar to any packager, or to users of languages like Puppet. upstart, on the other hand, uses a message bus model: services are started when particular events are received, and dependencies are expressed by listing the events required to trigger startup (or some other action). Conceptually, both of these designs are equivalent. They both construct a DAG that's used to order service startup. However, upstart complicates matters by having two types of messages on its message bus: signals and methods (technically, there are also hooks, but the distinction doesn't matter for this point). Signals behave like the typical asynchronous message bus event, or like a dependency: they trigger services to start, but the service issuing the signal does not care whether anyone listens or not. Methods do not; methods are effectively synchronous calls and the service issuing a method event waits until the method event has been acted on before continuing. The UI problem with this approach is that it creates a pitfall with rather noticable consequences. If someone ever confuses a signal event and a method event and starts a service on a method event instead, it is then very easy to block startup of some fundamental system service because its method event never completes due to deadlock. This is made somewhat more likely by the fact that method events are the default in initctl emit commands, whereas signal events require a flag. Again, this is not a fundamental issue with either system; either representation is mathematically convertable into the other. But it's difficult to mess up dependencies in quite the same way. One can create cycles, but unless one is modifying the dependencies of core services, it's hard to create a cycle that involves a core service. upstart provides a way to shoot oneself in the foot by blocking startup of a core service by listening to the wrong type of event. This model doesn't, so far as I could find, offer any clear advantages over a dependency structure in compensation. 2.4. Configuration File Model There is one place where I came into this evaluation preferring the upstart design over the systemd design, and came away with a continued preference, but a more mild one: the configuration file model. systemd uses an /etc overrides /lib model, where all unit configurations are installed in /lib and only local overrides and some configuration goes into /etc. upstart uses the (more familiar to Debian) model where the daemon configuration is a conffile in /etc. Both approaches have real advantages, but I think the upstart approach has slightly more. The systemd model means that one no longer has to add various guards to daemon configurations to allow for the possibility that the package has been uninstalled but not purged. Those continue to be necessary with upstart (and continue to be written in shell; systemd actually has a nicer language for doing this, even though it's not needed). However, the upstart approach makes it easier to preserve and merge local changes with upstream changes. In the systemd model, the local administrator has line-by-line granularity on overrides of systemd unit configurations, which while solving much of the problem does not help with the specific case of wanting to change the flags passed to the daemon. If the package later changes the flags in some orthogonal way, it's easy for the system to miss that change. This is something that, under systemd, will probably require development of new tools to warn the adminsitrator of what's happened. upstart avoids this problem by having the whole configuration be managed as a conffile. I think the upstart approach is better, but I think the drawbacks of the systemd approach could be mostly overcome with some fairly simple Debian tools. 2.5. Summary I think the technical comparison between upstart and systemd as both projects exist today substantially favors systemd, at both the feature and design level. When picking between both products as they currently exist on the basis of their current capabilities and future adaptibility, I have no qualms about picking systemd. 3. Ecosystem and Portability One of the primary concerns from the start of this conversation has been around portability of any new init system. One advantage of the extreme simplicity of sysvinit is that it's extremely portable; this advantage continues to be shared by OpenRC. Both of the more-functional init systems are Linux-specific. However, upstream attitudes towards portability differ. This ties directly into the development models of both systemd and upstart, the community momentum, and the larger surrounding ecosystem. 3.1. Ecosystem Reality Check One of the points that I think may have been obscured in the discussion, but which is important to highlight, is that basically all parties have agreed that Debian will adopt large portions of systemd. systemd is an umbrella project that includes multiple components, some more significant than others. Most of those components are clearly superior to anything we have available now on Linux platforms and will be used in the distribution going forward. In other words, this debate is not actually about systemd vs. upstart in the most obvious sense. Rather, the question, assuming one has narrowed the choices to those two contenders, is between adopting all the major components of systemd including the init system, or adopting most of the major components of systemd but replacing the init system with upstart. Either way, we'll be running udev, logind, some systemd D-Bus services, and most likely timedated and possibly hostnamed for desktop environments. I think this changes the nature of the discussion in some key ways. We're not really talking about choosing between two competing ecosystems. Rather, we're talking about whether or not to swap out a core component of an existing integrated ecosystem with a component that we like better. Now, I am generally on the side that says loose coupling of ecosystems is an inherent good. However, I don't agree that it's such an inherent good that we should disassemble things just for the sake of having disassembled things. At feature parity, and absent any compelling reason to swap components, I think we should take the path of least resistance and use the integrations that other people have already developed. Debian has more than enough hard integration problems to solve without creating new ones for ourselves unnecessarily. But that's the key word: unnecessarily. If we do have a reason for doing this, we should seriously consider it. Therefore, I believe the burden of proof is on upstart to show that it is a clearly superior init system along some axis, whether that be functionality or portability or flexibility or maintainability, to warrant going to the effort of disassembling a part of the systemd ecosystem and swapping in our own component. 3.2. Portability This is a difficult topic to clearly discuss, since it is, in essence, all future speculation at this point. I should state up front that, in making these sorts of decisions around free software projects, I have a relatively high future discount rate. In other words, I give substantially less credit to something that does not exist now but could exist in the future. I don't discount it to zero, but I do discount it relatively strongly. Others may not. I do this because free software projects and volunteer projects are inherently unpredictable. The free software world is stuffed to the gills with roadmaps that never actually happened, through no fault of any of the people involved. It's easy to agree that something would be a good idea, and another matter to actually drive it through to completion. Right now, neither systemd nor upstart work on non-Linux platforms. Therefore, right now, adopting either of them means that we either jettison our non-Linux ports or adopt a transition plan that retains support for sysvinit scripts. Right now, there is minimal difference between the two projects in terms of portability; they both make extensive use of Linux-specific APIs and have hooks for Linux-specific actions. However, there is a porting effort for upstart to kFreeBSD underway, and the current upstart maintainers have indicated more interest in portability than the systemd maintainers. That's been a point of significant friction over systemd (and was, in the past, also a point of friction with the previous upstart upstream, although that's subsequently changed). So there is a real advantage for upstart here, but it's one that has to be discounted because it's potential future work that could happen, but which is certainly not guaranteed to happen. Another point worth considering here is that the best way, from Debian's perspective, of porting either project to kFreeBSD or the Hurd is to implement the currently Linux-specific interfaces on those platforms in some fashion. (An inotify and epoll API that uses kqueue under the hood, for example.) To the extent that this is possible, it benefits both upstart and systemd equally, as well as many other programs in the archive that are written to currently Linux-specific APIs. This is an approach that's been common for years in different porting scenarios; I use it myself to maintain compatibility with both MIT Kerberos and Heimdal in the Kerberos-related packages I maintain. Finally, note the ecosystem point above. To maintain feature parity across Debian's ports, there already appears to be widespread agreement that components of systemd will have to be ported, particularly logind and possibly some of the other services. Now, that's not quite the same thing as porting the init system: it's possible those components use fewer Linux-specific interfaces (I've not checked), it's possible that alternative implementations of the same functionality can be provided (which IIRC is what happened with udev in some fashion), and not being able to run major desktop environments is not the same thing as not being able to boot. But I do think it blunts some of the porting argument. The non-Linux ports are going to have to port, fork, or replace systemd components anyway, regardless of the choice of init system, or drop out of feature parity with the Linux ports. So, in short, I consider portability to be a possible benefit of upstart, but I'm inclined to discount that advantage for several reasons. One, it's not yet actually materialized and still may not, and two, systemd porting looks like it's going to be on the table regardless. I therefore think that we should deal with this issue through how we structure a transition plan, rather than taking it as a reason to choose upstart over systemd. More on that in another message. 3.3. Project Momentum One of the reasons why I'm leery of the future portability argument for upstart, and one of the reasons why I'm leery of upstart in general, is that I'm quite worried upstart will prove to be a blind alley. I think there are several reasons to be concerned here. None of them is persuasive in isolation, but taken together I think they raise significant cause for concern: * Red Hat adopted upstart but never did a wholescale conversion, and then abandoned upstart in favor of systemd. Obviously, one should not put too much weight on this; Red Hat is a commercial company that has a wealth of reasons for its actions that do not apply to Debian. But I think it's still worth noting that the only non-Ubuntu major adopter of upstart backed away from it. * upstart is older than systemd but has significantly fewer features. Now, the danger of this sort of metric is that features can be added as "padding" without any real significance or advantage. But having spent serious time with both systems, I don't believe that's the case here. systemd is not adding extraneous features; rather, it's adding significant, useful functionality and real-world adaptability, and upstart is trailing despite being an older project. * systemd has a broader community. SuSE and Red Hat are both converting, there is significant interest across the general Linux community, major upstreams of Debian such as GNOME and KDE are adopting systemd support (and in some cases even requiring it), and systemd is tackling significant problems, such as logind, that everyone agrees need to be solved. By comparison, upstart is effectively used only by Ubuntu, and there isn't the same sort of enthusiasm or attempts to tackle broad problems happening at present in the upstart community so far as I can see. This is reasonable if upstart is mature and mostly complete software, but that was not my personal experience. * There appears to be some direct tension between GNOME upstream and upstart, not mostly due to upstart itself but because of corporate direction at Canonical. Again, this can easily be overstated. But I do think that Debian will want to continue to support GNOME going forward, and doing that with upstart will clearly require more work within the project than doing that with systemd. This is another case where we shouldn't shy away from the work if it's necessary, but we also shouldn't adopt unnecessary work. Over the past few months, I've also put out some feelers to other colleagues, and the uniform reaction I got in response is that systemd is a better technical solution than upstart. I think this speaks to the general momentum around systemd, and will directly affect our ease of integration in the future. I know that after my personal experience with both projects, I'm excited to add systemd support to my projects as upstream, and not particularly enthused about upstart from an upstream perspective since it doesn't offer me any clear benefits. 3.4. Summary I'm concerned that, if we adopt upstart, in two or three years we'll end up wanting to do the same thing that Red Hat did, back out, and switch to systemd. That would be a huge amount of wasted effort. Even worse would be to end up in that situation and decide that the conversion is too much work, and then just settle for an init system that is harder to integrate and provides less functionality. I remain unconvinced of the long-term growth curve of the upstart project. I don't think it's going to be abandoned completely, at least unless Ubuntu decides to switch (which seems unlikely at the moment) or Canonical dissolves (which also seems unlikely). I do think there's a significant danger that it will stagnate and fall behind in terms of desired features, particularly since this appears to already be happening. I don't have faith in the path that takes upstart from where it is now to something with feature parity with systemd as it is now, let alone something that's clearly better than systemd. And I think Debian as a project should be aiming for better, not merely sufficient. The portability issues are significant. However, I don't think they provide a clear advantage to upstart. It's possible that they will in the future, at which point the ecosystem argument becomes much more difficult and much narrower. But the fact remains that we'll be using large components of systemd across the distribution anyway, which means that swapping out the init system doesn't add as much portability as one might hope, and increases our integration burden. I think we should make wise decisions about which areas we want to invest project effort. I dislike investing significant project effort in catch-up efforts that, when complete, merely get us back to where we would have been if we'd chosen a different solution. I don't think that's wise stewardship of project resources. I want to see Debian focus its efforts on places where we can make a real difference, where we can be leaders. That means adopting the best-of-breed existing solutions and building on top of them, not reinventing wheels and thereby starting from a trailing position. 4. Conclusion If I'm correct in my analysis of the community and ecosystem dynamics, I think upstart needs to show that it is a significantly better technical choice than systemd in order to warrant the additional project work that will be required to build on top of upstart. Given feature parity, I believe we should adopt systemd so that we can focus our efforts on interesting new problems rather than on redoing integrations that other people have already done. My personal analysis did not show that upstart was significantly better than systemd, or even at feature parity. Rather, I believe it is currently trailing systemd substantially in multiple areas, some of which will require significant design changes. Given that, I believe systemd is the clear choice, despite the portability issues that we will incur by choosing it. However, I think that means we need to be very careful about how we handle a transition. I intend to comment on that in a separate message (which will probably be tomorrow given how long writing this message took). -- Russ Allbery (rra@debian.org) <http://www.eyrie.org/~eagle/>