In the next branch of Haskell Platform we’ll be adding and removing packages from the specification for the first time. The Haskell Platform steering committee will make recommendations for additions and removals based on individual proposals to add and remove packages from the list.

It is hard to come up with “notability” criteria for why a package should be added or removed. There are many competiting reasons why people use the Haskell Platform, and what packages they need.

The goal though should be an almost fully automated criteria for determining when a package should be added, based on objective data. Then, combined with strategic and other concerns, packages will be added or, sometimes, removed.

Possible Criteria for Notability

A quick list of possible criteria by which to evaluate whether a package is “blessed”:

How popular is the package in Hackage downloads?

How many packages depend on it?

Do any applications of note depend on it?

Does it meet a stated end-user need?

Do similar systems include such a library (e.g. Python)?

Is it portable?

Does it add additional C libraries?

Does it follow the package versioning system?

Is the code of good quality?

Does it have a good development history?

Is it on hackage?

Does it provide haddock documentation?

Does it come with examples?

Does it have a test suite?

Does it have a maintainer?

Does it in turn require new Haskell dependencies?

Does it have a simple/configure-based Cabal build?

Does it conflict/compete with existing functionality?

Does it reuse existing types?

Does it follow the hierarchical naming conventions?

Is it -Wall clean?

Have declared correctness or performance statements?

Is it BSD licensed?

Is it thread-safe?

A Point System

One way of determining notability for a package would be to use a points system against an agreed-upon set of such criteria.

Does anyone know of similar examples, or would like to code up some programs to experiment with these ratings?

Distro Page Rank

Another source of raw data may well be a sort of “Page Rank” across unix distros for how often a package is used. On the Arch Linux distribution, we have 3 level support for Haskell. In the core system some Haskell apps and tools are provided in binary form. In the “community” binary repo there are yet more packages. Finally, in the user-contributed repository are around 1300 other packages (~90% of Hackage).

Does your distro have popularity statistics? Could you determine the top 100 Haskell package by vote?

Most Popular Packages in Arch Linux

Some users install packages with the ‘yaourt’ tool, and some of those users opt in to voting when they install. Here’s the top 100 packages sorted by votes in Arch Linux, with those that are in the Haskell Platform already, indicated:

Now, one of the other constraints on the Haskell Platform is sustainable growth. We can’t add 1000 packages tomorrow and hope to maintain quality. Instead, something like 10-20% growth per release cycle seems plausible. This would mean adding 4 to 9 new packages.

If we were to judge only on download popularity, the 10 new packages would be:

Now, one of the other constraints on the Haskell Platform is sustainable growth. We can’t add 1000 packages tomorrow and hope to maintain quality. Instead, something like 10-20% growth per release cycle seems plausible. This would mean adding 4 to 9 new packages.

If we were to judge only on download popularity, our first 5 new packages would be:

Merely because one killer app, darcs, depends on them, and so they are widely built (they may also fail to satisfy many of the other critieria noted above).

If we ignore those packages popular for being dependencies, we get a different top 5:

Now we’re getting there. pandoc is both a library and a popular app, so we might treat it specially. gtk2hs is very popular, but not cabalised, so we might also set that aside, leaving (and I’ll ignore ghc-paths as it is used by ghc):

Which is starting to look like a plausible list. In turn however, you can find fault with all these packages in various dimensions (utf8-string may be obsoleted by Data.Text, haxml is LGPL licensed).

Coming up with an obvious list is non-trivial!

Finally, this is clearly only one very small data set, which should only have a small influence. If we step over an look at the Hackage download statistics, sorted by popularity, our top 5 new packages would be:

Popularity by Category

If instead we thought that having a comprehensive library set was the key goal, we may choose to include libraries via category, no matter how popular in the global list. This would yield, according to Hackage,

For example.

What Is The Decision Model?

So how do we decide what goes in? One model would be:

Have people propose packages Sort them by category need Identify the top rank package in each category using a points system or page rank Add or remove packages based on this?

What do you think? What is a good way to decide when a package is sufficiently notable to add to the Haskell Platform?

What critieria would you use to determine when a package is blessed?