Another weekend, another big mailing list thread

This weekend, those interested in Debian development have been having a discussion on the debian-devel mailing list about "What can Debian do to provide complex applications to its users?". I'm commenting on that in my blog rather than the mailing list, since this got a bit too long to be usefully done in an email.

directhex's recent blog post "Packaging is hard. Packager-friendly is harder." is also relevant.

The problem

To start with, I don't think the email that started this discussion poses the right question. The problem not really about complex applications, we already have those in Debian. See, for example, LibreOffice. The discussion is really about how Debian should deal with the way some types of applications are developed upstream these days. They're not all complex, and they're not all big, but as usual, things only get interesting when n is big.

A particularly clear example is the whole nodejs ecosystem, but it's not limited to that and it's not limited to web applications. This is also not the first time this topic arises, but we've never come to any good conclusion.

My understanding of the problem is as follows:

A current trend in software development is to use programming languages, often interpreted high level languages, combined with heavy use of third-party libraries, and a language-specific package manager for installing libraries for the developer to use, and sometimes also for the sysadmin installing the software for production to use. This bypasses the Linux distributions entirely. The benefit is that it has allowed ecosystems for specific programming languages where there is very little friction for using libraries written in that language to be used by developers, speeding up development cycles a lot.

When I was young(er) the world was horrible

In comparison, in the old days, which for me means the 1990s, and before Debian took over my computing life, the cycle was something like this:

I would be writing an application, and would need to use a library to make some part of my application easier to write. To use that library, I would download the source code archive of the latest release, and laboriously decipher and follow the build and installation instructions, fix any problems, rinse, repeat. After getting the library installed, I would get back to developing my application. Often the installation of the dependency would take hours, so not a thing to be undertaken lightly.

Debian made some things better

With Debian, and apt, and having access to hundreds upon hundreds of libraries packaged for Debian, this become a much easier process. But only for the things packaged for Debian.

For those developing and publishing libraries, Debian didn't make the process any easier. They would still have to publish a source code archive, but also hope that it would eventually be included in Debian. And updates to libraries in the Debian stable release would not get into the hands of users until the next Debian stable release. This is a lot of friction. For C libraries, that friction has traditionally been tolerable. The effort of making the library in the first place is considerable, so any friction added by Debian is small by comparison.

The world has changed around Debian

In the modern world, developing a new library is much easier, and so also the friction caused by Debian is much more of a hindrance. My understanding is that things now happen more like this:

I'm developing an application. I realise I could use a library. I run the language-specific package manager (pip, cpan, gem, npm, cargo, etc), it downloads the library, installs it in my home directory or my application source tree, and in less than the time it takes to have sip of tea, I can get back to developing my application.

This has a lot less friction than the Debian route. The attraction to application programmers is clear. For library authors, the process is also much streamlined. Writing a library, especially in a high-level language, is fairly easy, and publishing it for others to use is quick and simple. This can lead to a virtuous cycle where I write a useful little library, you use and tell me about a bug or a missing feature, I add it, publish the new version, you use it, and we're both happy as can be. Where this might have taken weeks or months in the old days, it can now happen in minutes.

The big question: why Debian?

In this brave new world, why would anyone bother with Debian anymore? Or any traditional Linux distribution, since this isn't particularly specific to Debian. (But I mention Debian specifically, since it's what I now best.)

A number of things have been mentioned or alluded to in the discussion mentioned above, but I think it's good for the discussion to be explicit about them. As a computer user, software developer, system administrator, and software freedom enthusiast, I see the following reasons to continue to use Debian:

The freeness of software included in Debian has been vetted. I have a strong guarantee that software included in Debian is free software. This goes beyond the licence of that particular piece of software, but includes practical considerations like the software can actually be built using free tooling, and that I have access to that tooling, because the tooling, too, is included in Debian. There was a time when Debian debated (with itself) whether it was OK to include a binary that needed to be built using a proprietary C compiler. We decided that it isn't, or not in the main package archive. These days we have the question of whether "minimised Javascript" is OK to be included in Debian, if it can't be produced using tools packaged in Debian. My understanding is that we have already decided that it's not, but the discussion continues. To me, this seems equivalent to the above case.

I have a strong guarantee that software in a stable Debian release won't change underneath me in incompatible ways, except in special circumstances. This means that if I'm writing my application and targeting Debian stable, the library API won't change, at least not until the next Debian stable release. Likewise for every other bit of software I use. Having things to continue to work without having to worry is a good thing. Note that a side-effect of the low friction of library development current ecosystems sometimes results in the library API changing. This would mean my application would need to change to adapt to the API change. That's friction for my work.

I have a strong guarantee that a dependency won't just disappear. Debian has a large mirror network of its package archive, and there are easy tools to run my own mirror, if I want to. While running my own mirror is possible for other package management systems, each one adds to the friction. The nodejs NPM ecosystem seems to be especially vulnerable to this. More than once packages have gone missing, resulting other projects, which depend on the missing packages, to start failing. The way the Debian project is organised, it is almost impossible for this to happen in Debian. Not only are package removals carefully co-ordinated, packages that are depended on on by other packages aren't removed.

I have a strong guarantee that a Debian package I get from a Debian mirror is the official package from Debian: either the actual package uploaded by a Debian developer or a binary package built by a trusted Debian build server. This is because Debian uses cryptographic signatures of the package lists and I have a trust path to the Debian signing key. At least some of the language specific package managers fail to have such a trust path. This means that I have no guarantees that the library package I download today, was the same code uploaded by library author. Note that https does not help here. It protects the transfer from the package manger's web server to me, but makes absolutely no guarantees about the validity of the package. There's been enough cases of the package repository having been attacked that this matters to me. Debian's signatures protect against malicious changes on mirror hosts.

I have a reasonably strong guarantee that any problem I find can be fixed, by me or someone else. This is not a strong guarantee, because Debian can't do anything about insanely complicated code, for example, but at least I can rely on being able to rebuild the software. That's a basic requirement for fixing a bug.

I have a reasonably strong guarantee that, after upgrading to the next Debian stable release, my stuff continues to work. Upgrades may always break, but at least Debian tests them and treats it as a bug if an upgrade doesn't work, or loses user data.

These are the reasons why I think Debian and the way it packages and distributes software is still important and relevant. (You may disagree. I'm OK with that.)

What about non-Linux free operating systems

I don't have much personal experience with non-Linux systems, so I've only talked about Linux here. I don't think the BSD systems, for example, are actually all that different from Linux distributions. Feel free to substitute "free operating system" for "Linux" throughout.

What is it Debian tries to do, anyway?

The previous section is one level of abstraction too low. It's important, but it's beneficial take a further step back and consider what it is Debian actually tries to achieve. Why does Debian exist?

The primary goal of Debian is to enable its users to use their computers using only free software. The freedom aspect is fundamentally important and a principle that Debian is not willing to compromise on. The primary approach to achieve this goal is to produce a "distribution" of free software, to make installing a free software operating system and applications, and to maintain such a computer, a feasible thing for our users.

This leads to secondary goals, such as:

Making it easy to install Debian on a computer. (For values of easy that should be compared to toggling boot sector bytes manually.) We've achieved this, though of course things can always be improved.

Making it easy to install applications on a computer with Debian. (Again, compared to the olden days, when that meant configuring and compiling everything from scratch, with no guidance.) We've achieved this, too.

A system with Debian installed is reasonably secure, and easy to keep reasonably secure. This means Debian will provide security support for software it distributes, and has ways in which to install security fixes. We've achieved this, though this, too, can always be improved.

A system with Debian installed should keep working for extended periods of time. This is important to make using Debian feasible. If it takes too much effort to have a computer running Debian, it's not feasible for many people to that, and then Debian fails its primary goal. This is why Debian has stable releases with years of security support. We've achieved this.

The disconnect

On the one hand, we have Debian, which pretty much has achieved what I declare to be its primary goal. On the other hand, a lot of developers now expect much less friction than what Debian offers. This is a disconnect that is cause, I believe, the debian-devel discussion, and variants of that discussion all over the open source landscape.

These discussions often go one of two ways, depending on which community is talking.

In the distribution and more old-school communities, the low-friction approach of language-specific package managers is often considered to be a horror, and an abandonment of all the good things that the Linux world has achieved. "Young saplings, who do they think they are, all agile and bendy and with no principles at all, get off our carefully cultivated lawn."

In the low-friction communities, Linux distributions are something only old, stodgy, boring people care about. "Distributions are dead, they only get in the way, nobody bothers with them anymore."

This disconnect will require effort by both sides to close the gap.

On the one hand, so much new software is being written by people using the low-friction approach, that Linux distributions may fail to attract new users and especially new developers, and this will hurt them and their users.

On the other hand, the low-friction people may be sawing the tree branch they're sitting on. If distributions suffer, the base on which low-friction development relies on, will wither away, and we'll be left with running low-friction free software on proprietary platforms.

Things for low-friction proponents to improve

Here's a few things I've noticed that go wrong in the various communities oriented towards the low-friction approach.

Not enough care is given to copyright licences. This is a boring topic, but it's the legal basis that all of free software and open source is based on. If copyright licences are violated, or copyrights are not respected, or copyrights or licences are not expressed well enough, or incompatible licences are mixed, the result is very easily not actually either free software or open source. It's boring, but be sufficiently pedantic here. It's not even all that difficult.

Do provide actual source. It seems quite a number of Javascript projects only distribute "minimised" versions of code. That's not actually source code, any more than, say, Java byte code is, even if a de-compiler can make it kind of editable. If source isn't available, it's not free software or open source.

Please try to be careful with API changes. What used to work should still work with a new version of a library. If you need to make an API change that breaks compatibility, find a way to still support those who rely on the old API, using whatever mechanisms available to you. Ideally, support the old API for a long time, years. Two weeks is really not enough.

Do be careful with your dependencies. Locking down dependencies on a specific version makes things difficult for distributions, because they often can only provide one or a very small number of versions of any one package. Likewise, avoid embedding dependencies in your own source tree, because that explodes the amount of work distributions have to do to patch security holes. (No, distributions can't rely on tends of thousands of upstream to each do the patching correctly and promptly.)

Things for Debian to improve

There are many sources of friction that come from Debian itself. Some of them are unavoidable: if upstream projects don't take care of copyright licence hygiene, for example, then Debian will impose that on them and that can't be helped. Other things are more avoidable, however. Here's a list off the top of my head:

A lot of stuff in Debian happens over email, which might happen using a web application, if it were not for historical reasons. For example, the Debian bug tracking system (bugs.debian.org) requires using email, and given delays caused by spam filtering, this can cause delays of more than fifteen minutes. This is a source of friction that could be avoided.

Likewise, Debian voting happens over email, which can cause friction from delays.

Debian lets its package maintainers use any version control system, any packaging helper tooling, and packaging workflow they want. This means that every package is, to some extent, a new territory for someone other than its primary maintainers. Even when the same tools are used, they can be used in variety of different ways. Consistency should reduce friction.

There's too little infrastructure to do things like collecting copyright information into debian/control . This really shouldn't be a manual task.

Debian packaging uses arcane file formats, loosely based on email headers. More standard formats might make things easier, and reduce friction.

There's not enough automated testing, or it's too hard to use, making it too hard to know if a new package will work, or a modified package doesn't break anything that used to work.

Overall, making a Debian package tends to require too much manual work. Packaging helpers like dh certainly help, but not enough. I don't have a concrete suggestion how to reduce it, but it seems like an area Debian should work on.

Maybe consider supporting installing multiple versions of a package, even if only for, say, Javascript libraries. Possibly with a caveat that only specific versions will be security supported, and a way to alert the sysadmin if vulnerable packages are installed. Dunno, this is a difficult one.

Maybe consider providing something where the source package gets automatically updated to every new upstream release (or commit), with binary packages built from that, and those automatically tested. This might be a separate section of the archive, and packages would be included into the normal part of the archive only by manual decision.

There's more, but mostly not relevant to this discussion, I think. For example, Debian is a big project, and the mere size is a cause of friction.

Comments?

I don't allow comments on my blog, and I don't want to debate this in private. If you have comments on anything I've said above, please post to the debian-devel mailing list. Thanks.

Baits

To ensure I get some responses, I will leave these bait here:

Anyone who's been programming less than 12332 days is a young whipper-snapper and shouldn't be taken seriously.

Depending on the latest commit of a library is too slow. The proper thing to do for really fast development is to rely on the version in the unsaved editor buffer of the library developer.

You shouldn't have read any of this. I'm clearly a troll.