Package managers all the way down

Did you know...? LWN.net is a subscriber-supported publication; we rely on subscribers to keep the entire operation going. Please help out by buying a subscription and keeping LWN on the net.

terrible

Package managers are at the core of Linux distributions, but they are currently engulfed in a wave of changes and it's not clear how things will end up. Kristoffer Grönlund started his 2017 linux.conf.au talk on the subject by putting up a slide saying that "everything isawesome". There are a number of frustrations that result from the current state of package management, but that frustration may well lead to better things in the future.

Grönlund started by asking a simple question: what is a package manager? There are, in fact, two types of package managers out there, and the intersection between them is leading to some interesting problems.

When most people think of package managers, they are thinking of a distribution package manager like zypper, DNF, or APT. These tools date back to the early days of Linux, when there were few boundaries between users, developers, and administrators; whoever we were, we had to do everything ourselves. Distribution package managers were construction kits that helped us to put our distributions together. They managed dependencies — both build and runtime dependencies, which are different things. They helped users install their software, administrators to keep systems up to date, and distributors to manage licenses.

There is another type of package manager out there: the language package manager. These tools are usually tied to a specific programming languages; examples include npm for JavaScript and Bundler for Ruby. They help non-Linux developers get the benefits of a package manager; their main role is to find and download dependencies so that users can build their software. Language package managers are useful, but they stress the distribution model in a number of ways.

The new dependency hell

Grönlund has been working on packaging Hawk, a Ruby-on-Rails application. It is unusual to create distribution packages from web applications, he said, but it will become more common as these applications become more common. Getting Hawk into an RPM package turns out to be challenging. Rails has about 70 dependencies that have to be installed; Hawk itself has 25 direct dependencies. Each of those has to be packaged as its own RPM file. That is not that bad of a job and, as a benefit, all of those dependencies can be shared with other openSUSE packages. But things get worse as soon as an update happens. Usually Rails breaks, so he has to go through the dependencies to figure out which one was updated incompatibly. Or Hawk breaks and has to be fixed to work with a new Rails version. It's a pain, but it's still manageable.

The next version of Hawk, though, is going to a more active JavaScript user interface. "Ruby dependency hell has nothing on JavaScript dependency hell," he said. A "hello world" application based on one JavaScript framework has 759 JavaScript dependencies; this framework is described as "a lightweight alternative to Angular2". There is no way he is going to package all 759 dependencies for this thing; the current distribution package-management approach just isn't going to work here.

Grönlund then changed the subject to the Rust language which, he said, is useful in many settings where only C or C++ would work before. Rust has its own package manager called "cargo"; its packages are called "crates", and there is a repository at crates.io. Cargo is "at the apex of usability" for language package managers, he said, adding that developing code in Rust is "a painless and beautiful process".

Rust has avoided dependency hell in an interesting way, he said. Imagine a dependency graph like the one shown on the right (taken from his slides). The application A depends on libraries B and C. But B depends on D 1.0, while C needs D 2.0. This is an arrangement that would be nearly impossible to achieve in the C world, but it's easy in the Rust environment. The compiler automatically incorporates the version number into symbol names, causing the two dependencies to look different from each other; the application, unaware, uses both versions of D at the same time. This makes it easy for developers to add a new dependency to a project.

Another useful tool is "rustup", which manages multiple versions of the compiler. The Rust compiler is released on a six-week cycle. Since Rust is a new and developing language, each release tends to include interesting new features. As a result, applications can end up being dependent on specific versions of the Rust compiler. Rustup handles those dependencies, ensuring that the proper compiler versions are available when building an application.

All of this is great for developers, he said, but it is making life difficult for distributors. It's not a problem if you don't care about relying on the Rust infrastructure and Mozilla's servers, and if you don't mind not being able to build your program without an Internet connection. If, instead, you need exact control over the versions of the software you are using, perhaps to track the associated licenses, the tools become harder to work with.

He found the process of packaging these applications frustrating, like swimming up a waterfall, and it led him to wonder: why are we trying to manage packages from one package manager with a different package manager? Should we be trying to build our software using such complicated algorithms? But the fact is that this kind of dependency management is increasingly expected as a part of what a programming language provides. The distributor package-manager role of tracking dependencies is not really needed anymore. Perhaps we don't need distributions anymore.

The way forward

One way to cope would be to complain and say that "kids these days are doing things the wrong way". He does that sometimes, but it won't get us anywhere in the long run. The only way to deal with change is to accept it and roll with it. Packaging libraries as we do today simply isn't going to work anymore. We need to explore what that means and how we should do things instead.

One thing that needs to be done is to realize that packaging and package management are not the same thing. The acts of building and distributing software need to be separated; tying them together is making it hard to progress on either side. Perhaps one step in that direction would be to focus on the creation of a protocol for package management rather than just building tools. We could have a metadata format that doesn't intermix the details of a package with how to build it.

Another thing that will have to happen is the separation of the management of the base system from the management of applications. They don't necessarily have to be packaged separately or use different tools, but we need to recognize that they are not the same thing. It is, he said, a "quirk of history" that the two got mixed up in Linux; we don't have to be prisoners of our history.

But, then, what is the role of distributions in this new world? None of this change is a threat to distributions, he said, as long as they figure out how to solve these issues. Distributors will still handle the overall look and feel, the selection of applications, support, security patches, and so on. Somebody will still have to provide those services. Distributors' lives may well become easier if they don't have to provide the entire universe for developers. Applications should be separated from the base system; perhaps we will see topic-specific distributions with no base at all. Implementing this separation calls for a protocol for app stores, to allow interaction between small app stores and distributions.

There are a lot of open questions, of course. How does one track software licenses when there are thousands of dependencies to deal with? How are security patches managed when applications carry their own copies of libraries, and perhaps multiple copies at that? It might be tempting to just dismiss the whole thing, saying that the older way was better, but that way is not going to remain viable for much longer. His experience with Rust made that clear; developing in that environment is just too nice.

Thinking about solving these problems is exciting, he said; the future is open. But we are still faced with the problem of inventing the future when we don't know what it will look like. We are going to have to play around with various ideas; that happens to be something that open source is particularly good at. There will be lots of people who disagree and set out to solve the problems in their own way, thus exploring the space. When the best solutions emerge, we are good at adopting them.

One particular issue that has to be addressed is managing state. One of the tenets of functional programming is avoiding managing state. Distributions, instead, have a great deal of state, and it is proving increasingly hard to manage. Time becomes a factor in everything, and that is hard to reason about. When you do the same thing twice, you may get two different results. Distributions have this problem all the time. Our software is full of weird quirks designed to mitigate the problems associated with managing state.

How can we fix that? One possibility is demonstrated by the Nix and Guix systems, which treats system composition as a Git-like tree of hashes. Packages are installed based on the hash of their source. There is no dependency hell, it just uses the exact required dependencies for everything. Configuration and data files are managed in the same way. If you need support, you get a hash of the system, which shows its exact state. It has some issues, though; updating the base forces updating everything else. There are ways of mitigating some of these problems. Nix is just scratching the surface of the possibilities, Grönlund said.

Containers are another interesting area. Projects like AppImage, snap, and Flatpak are just beginning to show progress in abstracting applications from the base system and sandboxing. None of them have reached the point where they need to be. Systemd is controversial, but it has reinvented what an init system is supposed to be and started a new discussion on how the base system should work. Systemd was adopted quickly across a wide range of distributions, showing that we can change the fundamental aspects of how the Linux system works. Qubes OS is focused on security and sandboxing, and is reimagining what an operating system can look like.

He concluded by saying that reproducible builds are an important goal. Figuring out how to make builds reproducible in this environment is going to be an interesting challenge. There are many challenges, but it is time to face them. Alan Kay once said that the future does not need to be incremental. Sometimes we need to think about where we want to be in ten years, then build that future.

Q&A

A member of the audience asked about multi-language dependencies. Language-specific package managers do not handle these dependencies well, while it's "bread and butter" for distributions. Grönlund agreed that this was an important issue, one that is going to start to hit people when distribution package managers go away. The exact-dependency approach used by Nix may point toward the solution there.

Another question had to do with "cloud rot"; how do you manage a situation where you depend on something that might not be there next Tuesday? This can even happen as a sort of deliberate sabotage; the FSF deleted the last GPLv2 versions of GCC and binutils from its site. How can this be tractable in the future when having things work depends on more and more people? Grönlund said he had no real answer for that problem, and that developers need to realize that the Internet is not permanent. Companies can fail, and if you rely on their infrastructure you're going to have issues. Some of these lessons have to be learned the hard way. Distributions might be able to help by caching things, but he doesn't have a real answer.

The video of this talk is available for readers wanting to see the whole thing.

[Your editor would like to thank linux.conf.au and the Linux Foundation for assisting with his travel to the event.]

