"dnf update" considered harmful

Did you know...? LWN.net is a subscriber-supported publication; we rely on subscribers to keep the entire operation going. Please help out by buying a subscription and keeping LWN on the net.

Updating a Linux distribution has historically been done from the command line (using tools like Debian's apt-get , openSUSE's zypper , or Fedora's yum —or its successor dnf ). A series of crashes during system updates on Fedora 24 led Adam Williamson to post a note to fedora-devel and other mailing lists warning people away from running " dnf update " within desktop environments. It turns out that doing so has never truly been supported—though it works the vast majority of the time. The discussion around Williamson's note, however, makes it clear that the command is commonly run that way and that at least some users are quite surprised (and unhappy) that it isn't a supported option.

The underlying problem is that when running an update in a graphical terminal under GNOME, KDE, or some other desktop environment, there are a number of components that could crash (or restart) due to the update process. If X, GNOME, or the terminal program crashes, they will take the update process with them—which may leave that process in an indeterminate state. Williamson reported that users were getting "duplicated packages" and other messages when trying to rerun the update. So he was blunt in his recommendations:

[...] but in the meantime - and this is in fact our standard advice anyway, but it bears repeating - DON'T RUN 'dnf update' INSIDE A DESKTOP. [...] If you're using Workstation, the offline update system is expressly designed to minimize the likelihood of this kind of problem, so please do consider using it. Otherwise, at least run 'dnf update' in a VT - hit ctrl-alt-f3 to get a VT console login prompt, log in, and do it there. Don't do it inside your desktop.

That led to several replies indicating that some had been doing updates that way frequently, for years, and with no problems. It also led some to wonder why the process could not be made more robust against this kind of problem, especially since it was a relatively common thing to do. Andrew Lutomirski asked:

How hard would it be to make dnf do the rpm transaction inside a proper system-level service (transient or otherwise)? This would greatly increase robustness against desktop crashes, ssh connection loss, KillUserProcs, and other damaging goofs.

But Stephen Gallagher thought that would be a waste of time, given that the offline update process has been available since Fedora 18. That process downloads the packages in the background, then lets the user choose when to reboot to install them. It then boots to a minimal environment, which is meant to minimize the possibility of the update breaking something and leaving the system in an indeterminate state.

But rebooting every time there are updates is a pretty heavy-handed approach. Lutomirski noted that he would rather avoid that step and that, for servers, it isn't even obvious how to trigger the offline update (for GNOME, the desktop simply gives the option to reboot and update). Gerald B. Cox seemed incredulous that the recommended path required a reboot: "As far as rebooting after every update? Huh? Who does that? Are we Windows?" In another message, Cox suggested that dnf could be made more robust:

Seems to me it would be more worthwhile to build in better error recovery within DNF than to always require "offline" - especially since the incidence of failure (at least anecdotally) just isn't that high. Instead of dealing with the problem (failed updates and error recovery) - this approach just tries to avoid it by always requiring a reboot.

But Chris Murphy strongly disagreed:

Sufficiently impractical that it's not possible. This is why offline updates exists. It's why work is being done on ostree>rpm-ostree>atomic host, which affects the entire build system, deployments, updates, and eventually all of the mirrors. It's why Microsoft and Apple don't allow anything other than offline updates. It's why openSUSE has spent a ton of resources, and a few bloody noses, getting completely atomic updates working with Btrfs and snapper, with very fine rollback capabilities. There's a reason why so many different experts at system updates have looked at this problem and just say, yeah no, not anymore of that.

Sam Varshavchik pointed out that tmux can already handle a crash of the X server, so it should be possible to make dnf itself more resistant to those kinds of problems using those techniques—others mentioned screen as a possibility as well. In order to do that, though, dnf would need to add the functionality of tmux / screen , but that's "far outside the scope of a package manager", Chris Adams said. Furthermore, the lack of a controlling TTY after a crash means that dnf would need to ignore the SIGPIPE that would result from writing its output, which is not something it should do. He suggested that those who want that functionality "run it under something that handles that, like tmux or screen".

Varshavchik pointed to Android as a system that can do application updates without a reboot: "The only time you need to reboot an Android device is for a kernel-level update." For Fedora, though, the problem is that it follows "the 'distribution is just a big pile of RPMs' model", Williamson said. Fedora cannot distinguish between updates that are "system level" versus those that aren't, but Android can (and even it requires reboots on more than just kernel updates):

No, in fact, it's for any *system level* update. Any change to the underlying system (as opposed to an app) requires the full reboot treatment. Only updates to app packages don't. The reason Android can do fairly good app updates is precisely because it does exactly what Flatpak and Snappy are trying to do for Linux: hard separation between app space and system space. Flatpak and Snappy didn't just spring fully formed from a vacuum, they're very obviously the product of someone using Android and/or iOS and going 'huh, maybe we should do that'.

Murphy expanded on that idea some:

Strictly speaking [rebooting is] not necessary for every update, there's just no mechanism for knowing for sure what updates entail more risk than others. You'll notice that once an application is installed, whether by dnf or Gnome Software, it's considered part of the system. There's no separation of OS upgrades from application updates.

But there is a misconception that dnf update completely updates the system without a reboot, Peter Larsen said:

People think that "yum/dnf update" leaves their system in a new updated stage. But it doesn't (completely). It never has. Only after a reboot are all your patches applied and active. Existing/running processes are rarely if ever reloaded. So when you update libraries, kernels etc. your system will keep running with the old versions of those libraries loaded. [...] The only real complete update you can do is one that does a full reboot. We do have a few tricks with DNF which will attempt to let you know what needs restarting. But you'll find that a good part of our updates requires a restart of most if not all your system, in order for the updates to become fully active.

Perhaps partly because it normally works just fine, it is surprising to find that dnf update is not really a supported way to update a Fedora system—even from a virtual terminal outside of the desktop. Many will probably keep on doing it, but once in a while may get bitten. As Williamson put it: "It works fine all the time until it doesn't, and then you're left with a pile of broken bits that you get to spend all afternoon fixing."

Meanwhile, though, Williamson and others were working on tracking down just what caused the spate of problems that led to the original warning. It turns out that an update to the systemd-udev package would restart the service, which would result in the video adapter(s) being plugged in again, which caused X to crash, but only for devices with multiple graphics adapters (for example, hybrid graphics in laptops). As detailed in Williamson's blog post, the problem will be fixed at both ends, which is clearly to the good.

As far as rebooting after every update goes, though, one guesses that many longtime users will make their own decisions on when to do that, while newer users will likely just do what GNOME suggests. Some will still end up with "broken bits" occasionally, but won't have to reboot as frequently—with an update stream as constant as Fedora's, that tradeoff may well be worth it.