You'll frequently hear references to a "software ecosystem" on various platforms, but it's relatively rare to see someone take that sort of terminology seriously. A group of evolutionary biologists, however, has now used the tools of ecosystem analysis to look at the evolution of Debian releases, examining things like package dependencies and software incompatibility.

The team went back to 1993 and compiled statistics on every major stable release, noting the number of packages in each release and comparing it to the previous version. This allowed them to track the life history of packages, watching as new ones were introduced and older ones got deprecated. In addition to compiling the statistics, the team also compiled the x86 version of the operating system and installed packages at random, which gave them a statistical measure of the frequency of dependencies and incompatibilities.

Several trends were apparent in the data. For example, the modularity of the system was increasing exponentially up until the 3.0 release, after which there was a sharp drop. From that point on, modularity held steady with successive releases. This had a major effect on functionality, defined as the rate at which randomly chosen packages would successfully install on a Debian system—the value started rising significantly with the version 3.1 release. The authors ascribe this to the large time gap between releases that occurred at this time.

Over time, software modules (clusters of packages with high interdependency) also increased in both size and number. As these trends continued, the number of software conflicts between modules went down; however, the number of conflicts within a module rose. "Therefore, there is a trade-off between reusing many pieces of existing code and the emergence of incompatibilities among software packages," the authors conclude.

They also showed that it's possible to model this trade-off using standard ecological tools: dependencies between packages look like predator-prey interactions, while conflicts looked like species that have a competitive exclusion relationship.

Overall, the key feature of the modularity the team identified seems to be that the decreasing number of conflicts across modules means that more of the software available for the operating system can install, since it's rare that a conflict will completely block an entire module from installing and running. The authors suggest that we might learn something about biology by studying software, but they don't actually provide examples of how this might work; at this stage, then, it's not an especially compelling argument.

PNAS, 2011. DOI: 10.1073/pnas.1115960108 (About DOIs).