The problem

Package and managers

Packages are collections of software: data files, binaries, executables, source archives, etc. These are published, resolved, downloaded, and installed with package managers.

Unfortunately, in 2017, almost every package manager has the the ability, the default, or even the common practice of creating nondeterminism in the form of version ranges.

Bundler gem 'octo-json-parser', '>=1.8.0' pip octo-json-parser>=1.8.0 Cabal octo-json-parser >= 1.8.0 npm "octo-json-parser": ">=1.8.0" Maven <dependency>

<groupId>octo</groupId><artifactId>json-parser</artifactId><version>[1.8.0,)</version>

</dependency> Ivy <dependency org="octo" name="json-parser" revision="[1.8.0,)"/> apt Depends: octo-json-parser (>= 1.8.0) rpm requires: octo-json-parser >= 1.8.0

This problem is illustrated in this conversation between two developers:

Bob: It looks like the tests fail.

Alice: Really? I ran them yesterday and they work fine for me.

Bob: Yeah. Here’s the error…

Alice: Try X, Y, and Z.

Bob: Still doesn’t work.

Alice: Okay, well try…

(hours later)

Bob: I think it’s octo-json-parser.

Alice: What version do you have?

Bob: Oh, I have 2.0.0.

Alice: Oh, that was just released this morning.

Floating software dependencies do all of the following:

Produce different builds. Affect test results. Change which bugs are present.

They inhibit building and using older versions of the project, complicate bug reports, and make CI results unreliable. A version number or commit hash is no longer sufficient information to describe the state—the rest of the universe must be considered as well.

In the wild

This practice is discouragingly common. Consider some of the most popular Javascript (NPM) packages. Express has 83 edges in its transitive dependency graph, 50 of which have floating (not-fixed) versions. Grunt has 89 floating of 93. Gulp has 621 of 633. React has 17 of 17.

Even relatively stable Java-land is not immune to poor software dependency decisions. For years, aws-java-sdk-core depended on joda-time [2.2,). aws-java-sdk-core version 1.9.40 will evidently be compatible with whatever version of joda-time is released in 2040. It will depend on the latest version of joda-time until the end of time.

The fallacy of harmless upgrades

Having established that 1.8.0 – Infinity is a bad version range, perhaps we can be more discriminating and only take the latest 1.8.x version. We’ll use 1.8.0, 1.8.1, 1.8.2, etc.

This notion is so common that some package managers dedicate syntax to this practice.

npm "octo-json-parser": "~1.8.0" Bundler gem 'octo-json-parser' '~> 1.8.0' Ivy <dependency org="octo" name="json-parser" revision="1.8.+"/>

If everyone follows semantic versioning, it’s safe, right? Wrong.

The hypothetical story of the trailing comma bug:

Version 1.8.0 of octo-json-parser provides a function parseJson() , which is documented to parse JSON for valid input and produce an error for invalid input. A bug is reported where parseJson() considers trailing commas to be okay, like ["abc",123,] . This is a bug, since the behavior differs from the documented API. The bug is corrected, so trailing commas produce an error. Since the documented API of octo-json-parser has not changed, semantic versioning classifies this as a patch release. And so the maintainers release version 1.8.1.

Wonderful. Bugs getting patched, clients getting 1.8.x updates…the machine of software humming along.

But wait—a client of octo-json-parser has been validating user input with parseJson() and persisting the input to a database. The data is retrieved on demand and reparsed and processed with octo-json-parser. Upgrading to version 1.8.1 suddenly corrupts (makes unreadable) their months of stored data. The 1.8.0 -> 1.8.1 upgrade has failed.

Certainly some changes are safer than others, but there’s no such thing as a 100% innocuous change. Any dissenting opinion lacks imagination (and experience).

The fallacy of security

Some justify "~1.8.0" because it includes important security updates automatically. The reality is that no clever versioning scheme fixes security. In the event of a vulnerability, the exposure must be assessed, the fixed version must be deployed, and penetration and damage must be analyzed. Using version ranges for security is like bringing a Band-Aid onto an airplane in case there is a mechanical problem.

A note about external VCS

Some package managers (e.g., npm and Go) can depend on other VCS respos. These can have similar problems. If using Git, never depend on HEAD or a branch. Always use a tag or commit hash.

Solutions

Package management in 2017 shouldn’t be so messy. Changes should be limited to a very well-defined scope, which is almost always a version-controlled repository. A reliable build must be hermetic. It shouldn’t depend on some bits moving around half a world away. Violating this principle will leave your builds as a thing of wax, molded by the current bit of weather.

There are several ways to overcome version ranges.

Don’t allow version range

Some forward-thinking package managers don’t allow version ranges at all, e.g., Nix and Guix. (In fact, these two take reproducible versioning to another level by using the package hash as the version.)

Don’t use version ranges

Even if version ranges are allowed, you don’t need to use them. Naturally, this only works if your transitive dependencies play by the same rules. For Java, this is a workable strategy. For Ruby, perhaps not.

And some poorly made build tools can be non-deterministic even using fixed ranges, e.g., npm.

Use version control sources

Copy the sources for dependencies into your repo, e.g., Go vendoring. This requires you to incorporate your dependencies’ build processes into your own, including the same build tools, header files, etc. Doing so can quickly get complicated. For example, on the JVM alone, projects could be using Ant, Maven, Gradle, SBT, Make, scalac, groovyc, jythonc, Clojure, etc.

Use version control outputs

Copy the binaries for dependencies into your repo. However, this bloats the size of your repo. While DCVS is awesome, hauling around 100GB of bygone binaries in the history is not. You may need Git LFS or another strategy for dealing with large repos.

Copy repos

This is an operational solution to the problem. Create a copy of artifact repos—either plan copy or something more sophisticated like Artifactory—in a way that allows you to control when updates are made available. The downside is that this solution takes place outside version control.

Use version lock file

Bundlr has Gemfile.lock; npm has shrinkwrap; Go has Glock. These solutions generate complete version info for transitive dependencies, which is put under version control. They make updates more complicated than a simple one-line change, and complete version info now involves hundreds of lines. But it’s better than non-determinism.

Conclusion

Upgrade your dependencies. Security updates are important. Functional fixes are important. Performance improvements are important. But always upgrade intelligently and deliberately, under version control.

The important thing is to ensure that builds, deployments, tests, and versions are as reliable and reproducible as possible. If your package management solution hinders that, you should fix it. Just as you wouldn’t begin a Python script with from __future__ import * , don’t use version ranges in software dependencies. Software is hard enough—don’t make it any flakier than it already is.