Request for feedback on spec/proposal for distributing package collections via hackage

Hi folks, I'd like to get feedback on a spec/proposal for distributing package collections via hackage. This is currently somewhere beyond vapourware but certainly not a fait accompli and hopefully it is at an appropriate point to get feedback. The basic idea is that package collections are: * useful (IMHO, one of the top two solutions to dependency hell, alongside nix-style package management); and * just as we distribute packages via hackage, we should also be able to easily distribute package collections. One would then use them with tools like cabal and stack. Distributing via hackage (both in the sense of the format/protocol and in the sense of the central community hackage instance) seems natural, and allows taking advantage of much of the infrastructure we have for packages already like: * existing user accounts and management infrastructure on the hackage website * allowing anyone to host collections on their own servers, just as they can host their own package archives currently (either as static file sets or with smart servers) * low barrier for distribution, potentially encouraging more collections to be created potentially covering more use cases * security infrastructure (currently in alpha) * automatic mirroring (currently in alpha) Two obvious examples are stackage-lts and stackage-nightly but if we lower the barrier for distribution then there may well be many more. For example, the existing Linux distros put a lot of effort into selecting and maintain package collections, and some of these collections could be distributed via hackage. In fast several Linux distributions already use Hackage's "distro" feature to advertise which versions of packages are provided by that distro. One can also imagine special-purpose collections, and there's probably cases we've not thought of yet. Package collections are different things from packages, not like "meta packages" that one gets in some package systems. A package collection at it's simplest is just a set of source package identifiers (ie names-version pairs). Like packages, package collections have names and versions and are immutable once distributed. The intention is that users can configure their tool to use collection(s), either by nailing down a specific collection version, or by not specifying a version it would default to the latest version of the named collection. (But the specific behaviour is up to the tool) Use cases: * versioned collections. For some collections the policy by which it's defined naturally uses meaningful versions. * daily collections. These can have a date-form version number imposed on them. * "live" "rolling" collections. These could have a simple monotonic increasing version with no particular meaning attached. For such collections, clients might be configured to use the latest (by not specifying a version), but it's always possible to pick a specific revision. * special-purpose collections. Not necessarily collections aiming to cover a large number of common packages, but aiming to cover some application area, or related stack of packages (e.g. some of the web frameworks). * negative collections. Collections of packages you may specifically want to avoid (e.g. deprecated by their authors, or known-broken). Using such collections would rely on clients that can be configured to treat it negatively. Specifics: A package collection specifies a set of source package ids (id being name-version pair). It also optionally specifies a (partial) flag assignment for any package name. The collection does not specify how tools should treat them. That is, a collection does not specify if it should be treated as a strong or a soft constraint, inclusive or exclusive, positive or negative. Such things are completely up to the client's policy and configuration. Similarly for flag assignments, collections do not specify whether tools should interpret these as strong or soft constraints. Syntax: Package collection names and versions exactly follow those of package names (but they live in a different namespace). For example, "stackage-lts-2.9", or "deprecated-343" (the latter being a "rolling" collection with a meaningless monotonically increasing version). A collection distributed in the archive format is just a text file with one entry per line, such as: foo-1.0 foo-1.1 bar >= 3 && < 4 bar +this -that So each line can be one of: * a simple package id * a package version range, using Cabal version range syntax * a package name with a flag assignment, + for on, - for off The interpretation of the above is that: * both foo-1.0 and foo-1.1 are in the collection (ie union not intersection) * all versions of bar between 3 and 4 are in the collection * the package bar has flag 'this' as True, and flag 'that' as False Of course for some collections the policy is that only one version of any package is included, but this is a policy question and the format itself does not impose this constraint. Hackage archive format: collection files live under a different prefix from package tarballs (but are still considered part of the archive) and are named after the collection id. The collection files are not compressed (but of course http clients and servers can negotiate transport compression). The collection files are not included in nor listed in the existing 00-index.tar.gz, but there's other json format metadata for a client to enumerate the available collections and versions. And like with package tarballs, a client that wants a specific collection version can construct the url and fetch it directly. Security: The hackage security system that's currently in alpha testing can easily be extended to cover collections, similarly to how it covers package tarballs. Misc notes: There is no requirement that a hackage-format repo containing collections be closed. That is, the collections may refer to packages not in that archive. This could be useful for private hackage repos that host a small number of private packages, but also host collections that refer both to the private packages and public ones from the community central hackage. The resolution of package names is done by the clients, and some clients may be configured to union/overlay multiple repos. On the other hand, for the central community hackage it may be sensible to enforce a policy that the collections it distributes be closed (ie refer only to packages distributed via hackage). Questions: Is this sufficiently flexible to fully cover the obvious use cases? Are there any interesting use cases that are excluded? Anything else? Duncan