Systemd programming, 30 months later

This article brought to you by LWN subscribers Subscribers to LWN.net made this article — and everything that surrounds it — possible. If you appreciate our content, please buy a subscription and make the next set of articles possible.

Some time ago, we published a pair of articles about systemd programming that extolled the value of providing high-quality unit files in upstream packages. The hope was that all distributions would use them and that problems could be fixed centrally rather than each distribution fixing its own problems independently. Now, 30 months later, it seems like a good time to see how well that worked out for nfs-utils, the focus of much of that discussion. Did distributors benefit from upstream unit files, and what sort of problems were encountered?

Systemd unit files for nfs-utils first appeared in nfs-utils-1.3.0, released in March 2014. Since then, there have been 26 commits that touched files in the systemd subdirectory; some of those commits are less interesting than others. Two, for example, make changes to the set of unit files that are installed when you run " make install ". If distributors maintained their unit files separately (like they used to maintain init scripts separately), this wouldn't have been an issue at all, so these cannot be seen as a particular win for upstreaming.

Most of the changes of interest are refinements to the ordering and dependencies between various services, which is hardly surprising given that dependencies and ordering are a big part of what systemd provides. With init scripts we didn't need to think about ordering very much, as those scripts ran the commands in the proper order. Systemd starts different services in parallel as much as possible, so it should be no surprise that more thought needs to be given to ordering and more bugs in that area are to be expected.

As hoped, the fixes came from a range of sources, including one commit from an Ubuntu developer that removed the default dependency on basic.target . That enabled the NFS service to start earlier, which is particularly useful when /var is mounted via NFS. Another, from a Red Hat developer, removed an ordering cycle caused by the nfs-client.target inexplicably being told to start before the GSS services it relies on, rather than after. A third, from the developer of OSTree, made sure that /var/lib/nfs/rpc-pipefs wasn't mounted until after the systemd-tmpfiles.service had a chance to create that directory. This is important in configurations where /var is not permanent.

Each of these changes involved subtle ordering dependencies that were not easy to foresee when the unit files were initially assembled. Some of them have the potential to benefit many users by improving robustness or startup time. Others have much narrower applicability, but still benefit developers by documenting the needs that others have. This makes it less likely that future changes will break working use cases and can allow delayed collaboration, as the final example will show.

rpcbind dependencies

There were two changes deserving of special note, partly because they required multiple attempts to get right and partly because they both involve dependencies that are affected by the configuration of the NFS services; they take quite different approaches to handling those dependencies. The first of these changes revised the dependency on rpcbind , which is a lookup service that maps an ONC-RPC service number into a Internet port number. When RPC services start, they choose a port number and register with rpcbind , so it can tell clients which port each service can be reached on.

When version 2 or version 3 of NFS is in use, rpcbind is required. It is necessary for three auxiliary protocols (MOUNT, LOCK, and STATUS), and is the preferred way to find the NFS service, though in practice that service always uses port 2049. When only version 4 of NFS is in use, rpcbind is not necessary, since NFSv4 incorporates all the functionality that was previously included in the three extra protocols and it mandates the use of port 2049. Some system administrators prefer not to run unnecessary daemons and so don't want rpcbind started when only NFSv4 is configured. There are two requirements to bear in mind when meeting this need; one is to make sure the service isn't started, the other is to ensure the main service starts even though rpcbind is missing.

As discussed in the earlier articles, systemd doesn't have much visibility into non-systemd configuration files, so it cannot easily detect if NFSv3 is enabled and start rpcbind only if it is. Instead it needs to explicitly be told to disable rpcbind with:

systemctl mask rpcbind

There is subtlety hiding behind this command. rpcbind uses three unit files: rpcbind.target , rpcbind.service , and rpcbind.socket . Previously, I recommended using the target file to activate rpcbind but that was a mistake. Target files can be used for higher-level abstractions as described then, but there is no guarantee that they will be. rpcbind.target is defined by systemd only to provide ordering with rpcbind (or equally "portmap"). This provides compatibility with SysV init, which has a similar concept. rpcbind.target cannot be used to activate those services, and so should be ignored by nfs-utils. rpcbind.socket describes how to use socket-activation to enable rpcbind.service , the main service. nfs-utils only cares about the sockets being ready to listen, so it should only have (and now does only have) dependencies on rpcbind.socket .

Masking rpcbind ensures that rpcbind.service doesn't run. The socket activation is not directly affected, but systemd sorts this out soon enough. Systemd will still listen on the various sockets at first but, as soon as some process tries to connect to one of those sockets, systemd will notice the inconsistency and will shut down the sockets as well. So this simple and reasonably obvious command does what you might expect.

Ensuring that other services cope with rpcbind being absent is as easy as using a Wants dependency rather than a Requires dependency. These ask the service to start, but won't fail if it doesn't. Some parts of NFS only "want" rpcbind to be running, but one, rpc.statd , cannot function without it, so it still Requires rpcbind . This has the effect of implicitly disabling rpc.statd when rpcbind is masked.

It's worth spending a while reflecting on why the command is " systemctl mask " rather than " systemctl disable ", as I've often come across the expectation that enable and disable are the commands to enable or disable a unit file. As a concrete example, Martin Pitt stated in Ubuntu bug 1428486 that they are "the canonical way to enable/disable a unit", but this was not the first place that I found this expectation.

The reality is that enable is the canonical way to request activation of a unit file. It doesn't actually start it (" systemctl start " will do that), and it isn't the only way to activate a unit file, as some other unit file can do so with a Requires directive. This may seem to be splitting hairs, but the distinction is more clear with the disable command, which does not disable a unit file. Instead, it only reverts any explicit request made by enable that a unit be activated. It is quite possible that a unit file will still be fully functional even after running " systemctl disable " on it.

If you want to be sure that a unit file will be activated, then " systemctl enable " is probably the right thing to do. If you want to be sure that it is not activated, then " systemctl disable " won't provide that guarantee; you need " systemctl mask " instead. This command ensures that the unit file won't run even if some other unit file Requires it. So that is the command that we use to ensure rpcbind isn't running, and it could also be used to ensure rpc.statd isn't running, though that isn't really needed as masking rpcbind effectively masked rpc.statd as mentioned.

Ordering nfsd with respect to filesystem mounting using a generator

One dependency for the NFS server, which is particularly obvious in hindsight, is that it should only be started after the filesystems that it is exporting have been mounted. Without this ordering, an NFS client might manage to mount the filesystem that is about to have something mounted on top of it, which can cause confusion — or worse. The default dependencies imposed by systemd will start services after local-fs.target , which ensures all local filesystems are mounted. When the commit mentioned above removed the default dependencies to allow NFS to start earlier, it explicitly added local-fs.target . So this seems well in hand.

For remote filesystems mounted over NFS, we need the reverse ordering. In particular, if a filesystem is NFS mounted from the local host (a "loopback" mount), the NFS server should be started before the filesystem is mounted. This is particularly important during system shutdown when ordering is reversed. If the NFS server is stopped before the loopback NFS filesystem is unmounted, that unmount can hang indefinitely.

To avoid this hang, Pitt added a dependency so that nfs-server.service would start before (and so be stopped after) remote-fs-pre.target . This ensures that the NFS server will be running whenever a loopback NFS filesystem might be mounted. This seems like it makes perfect sense, but there is a wrinkle: sometimes, filesystems that are considered by systemd to be "remote" can be exported by NFS. A particular example is filesystems mounted from a network-attached block device, such as one accessed over iSCSI.

Had I confronted the need to export iSCSI filesystems before Pitt had added the dependency on remote-fs-pre.service , I probably would have simply told systemd to start nfs-server.service " After remote-fs.target ". This would have solved the iSCSI situation, but broken the loopback NFS situation. Had the unit files not been upstream, this is undoubtedly what would have happened.

Instead, a more general solution was needed. The NFS server needs to start after the mounting of any filesystems that are exported, but before any NFS filesystem is mounted. Systemd is not able to make this determination itself, but fortunately it has a flexible extension mechanism so it can have the details explained to it. Using this extension mechanism isn't quite as easy as adding a script to /etc/init.d , but perhaps that is a good thing. It should probably only be used as a last resort, but it is good to have it when that resort is needed.

Before systemd reads all its unit files, either at startup or in response to " systemctl daemon-reload ", it will run any programs found in various "generator" directories such as /usr/lib/systemd/system-generators . These programs are run in parallel, are expected to complete quickly, and will normally read a foreign (i.e. non-systemd) configuration file and create new unit files or drop-ins (which extend existing unit files) in a directory given to the program, typically /run/systemd/generator . These will then be read when other unit files and drop-ins are read, so they can exercise a large degree of control over systemd.

For the nfs-server dependency, with respect to various mount points, we want to read /etc/exports and add a RequiresMountsFor= directive for each exported directory. Then we want to read /etc/fstab and add a Before=MOUNT_POINT.mount directive for each MOUNT_POINT of an nfs or nfs4 filesystem. As library code already exists for reading both of these files, this all comes to less than 200 lines of code. Once the problem is understood, the answer is easy.

Generators everywhere?

Having experienced the power of systemd generators, I immediately started to wonder how else I might use them. It is tempting to use a generator to automatically disable rpcbind when only NFSv4 is in use, but I think that is a temptation best avoided. rpcbind isn't only used by NFS. NIS, the Network Information Service (previously called "yellow pages") makes use of it, and sites could easily have their own local RPC services. It is best if disabling rpcbind remains a separate administrative decision, for which the "mask" function seems well suited.

In the earlier articles I described a modest amount of complexity required to pass local configuration through systemd to affect the parameters passed to various programs. Using a generator to process the configuration file could make all of that more transparent, or it might just replace one sort of complexity with another. While I don't agree with all the advice the systemd developers provide, this advice from the systemd.generator manual page is certainly worth considering:

Instead of heading off now and writing all kind of generators for legacy configuration file formats, please think twice! It is often a better idea to just deprecate old stuff instead of keeping it artificially alive.

Upstream now!

The evidence presented here supports the claim that keeping systemd unit files upstream can benefit all developers and users. The different experiences generated in different contexts were brought together into a single conversation so all could benefit from, and respond to, all the changes. This should not be surprising when one thinks of unit files as just another sort of code used to write the whole system. The only part that seems to be missing from upstream is a place to document the advice that " systemctl mask rpcbind " is the appropriate way to disable rpcbind and rpc-statd when only NFSv4 is in use. Maybe we need an nfs.systemd man page.