Evaluating JavaScript libraries is hard. It’s fairly easy to tell if a library is popular, but it’s hard to tell if it’s any good. One useful metric to have would be the average number of questions a user of the library has while using it. We can’t measure that directly, but some recently released work allows us to get a pretty good estimate.

Thanks to Julian Shapiro, Thomas Davis and Jesse Chase we now have Libscore, a project that crawls the top one million sites on the web to determine what JavaScript libraries they use. It works by actually executing the JavaScript on each page, so it gives a very accurate picture of what libraries are being used.

The number we’d like to calculate is questions per user, . Libscore gives good numbers for the denominator. Turns out there are at least a couple of ways to get useful numbers for the numerator as well.

Number of questions on Stack Overflow

We can see how many questions are listed for the library’s tag on Stack Overflow. This gives the exact number questions, but has the problem that it does not count the number of users who had that question – Stack Overflow discourages duplicate questions.

Number of searches on Google

By digging into the data behind Google Trends we can get a number that is proportional to the number of searches for a particular search term on Google. The theory is that most searches involving a library indicates a developer with a question.

There is an undocumented API that gives us the raw data behind the Google Trends charts. It gives search counts normalized to a scale from 0 to 100, such that 100 is the maximum in the given dataset. As long as we’re careful about comparing everything to a common reference, and scaling the numbers appropriately where needed, we can get a set of numbers that are proportional to the real search counts.



By summing up the individual datapoints, we get the areas under these graphs, which is proportional to the number of searches.

While this fixes the problem of counting cases where multiple people have the same questions, it has problems of its own. One problem is that we’ll end up undercounting certain search terms. Ember, React, Backbone etc. are all normal words that have meanings of their own. To try to deal with this, we can throw in the “js” suffix as shown above. This helps, but it means we’re undercounting searches for these libraries. It helps that this problem occurs for most of the libraries, so at least the bias is spread out somewhat evenly.

Results

I gathered the relevant data for 11 JS libraries and frameworks of various types: jQuery, jQuery UI, Backbone, React, Knockout, AngularJS, Ember, Meteor, Modernizr, Underscore and Lo-Dash. Below, Underscore and Lo-Dash are counted as one, since their counts are conflated on Libscore. The data was collected in mid-December.

Library Libscore users Searches (Relative numbers) Stack Overflow questions jQuery 634872 49558 562442 jquery ui 176543 2961 30353 Modernizr 109076 216 655 underscore/lodash 20183 139 3367 AngularJS 4954 1259 69133 React 203 27 981 Backbone 7908 297 17006 Ember 185 113 13119 Knockout 1982 154 12488

The differences here are pretty dramatic. Using the Stack Overflow measure, Angular has a 6.5x higher WTF factor than Backbone, while Meteor has a 21x higher WTF factor than Angular does. Meteor has a staggering 49143x higher WTF factor than Modernizr.

What is bad about this metric?

There are a few ways in which the numbers used here may be biased in one way or the other.

Problems with the Google based metric:

Undercounting searches. The JS library names are words with meanings of their own, so we have to use the “js” suffix to count them.

Round-off errors. Because Google trends only produces numbers from 0-100, we have low precision on the numbers.

Problems with the Stack Overflow metric:

Since SO discourages duplicate questions, this will undercount the number of people having a question.

Problems with both metrics:

Undercounting sites: Sites that have been taken offline are not counted, while the searches and SO questions made during their development are counted. This produces some bias against the older libraries.

Sites that are unpublished are not counted, but the searches made during their development are. This produces some bias against the newer libraries.

Biased site sample: Only the top million sites are counted. This produces a bias against any library that is used mostly on smaller/less-popular sites.

What is good about this metric?

There are certain things that indicate that this approach is doing what it’s trying to do.

One is that the rankings produced by the Stack Overflow WTF and the Google WTF largely agree with each other. This means that neither measure can be entirely wrong, unless they are wrong in the same way.

A low WTF factor does not mean a framework is better, it just means it you will have less questions about it. One would expect a projects WTF factor to correlate with the scope and ambition of the project. One would expect frameworks to have a higher WTF factor than libraries, because they affect more parts of the development. In this view, the fact that Meteor produces the most questions makes sense, given that it is the only full stack framework in consideration: It involves both the client side, the server side and even the database. Choosing such an all-encompassing framework comes at cost: More questions. It is good to be able to quantify just how big that cost is.