Last week I described a process for adopting a CPAN module. In this post I describe a metric for identifying candidates for adoption. These are essentially modules that have outstanding bugs, haven't been released for a while, and are used by other modules. I've generated a list of the top 1000 adoption candidates according to my current scoring metric. This is very much a work in progress.

This is a follow-up to a blog post I wrote last year. This time I include more factors and in particular the number of dependent distributions.

Each distribution is scored according to the following rubric:

Not released in the last year. Not released in the last 3 years. Has one or more outstanding bugs. Has ten or more outstanding bugs. At least one bug was reported in the last month. Has at least one dependent distribution. Has at least 10 dependent distributions. Author hasn't released anything in the last 3 years. At least one bug has been reported since the last release. One one user has PAUSE permissions for the distribution. The distribution only contains one module.

At the moment the score gets a +1 for each matching rule. Here's a plot which shows the distribution of scores across all CPAN distributions. Note that the y axis is logarithmic.

There are various types of factors in there, but I'm thinking it might be worth splitting them into two groups:

How deserving a candidate it is.

Suitability for adoption.

This is a first step. After a lot of good input, I'm working on the next iteration of this, which will include at least some of the following:

+1 (maybe more?) if the dist's author / maintainer has marked it as up for adoption, as per brian d foy's post. Basically look for ADOPTME as having a PAUSE permission.

Look at the CPAN Testers report.

Don't include wishlist items in the bug count from RT.

+1 if the module is mentioned in core documentation

+1 if the module is a core module.

Different weights for the rules, or possibly some form of gating. If a module has a lot of dependent dists, and hasn't been released for ages, but has no outstanding bugs, then it shouldn't be on the list.

has no outstanding bugs, then it shouldn't be on the list. Consider the total number of dependent distributions, not just the directly dependent distributions. Eg if A uses B uses C, then C's usage measure should be 2 (1 direct and 1 indirect).

+1 on ease of adoption if the dist has metadata identifying a github repository.

Get bug counts from all sources (eg github). At the moment this only factors in RT bug counts.

MetaCPAN has most of the data for generating this. It could provide the score for dists, and host a version of the list which is kept up to date.

Sources of data:

PAUSE dump of distribution metadata

02packages.details.txt

06perms.txt

RT bug data

David Cantrell's CPAN Dependencies Service. Thanks to David for letting me grab the data for all dists.

Let me know if you've got other ideas for extending or refining this.

Please enable JavaScript to view the comments powered by Disqus.

Disqus