HTTP Archive: jQuery

A recent thread on Github for html5-boilerplate discusses whether there’s a benefit from loading jQuery from Google Hosted Libraries, as opposed to serving it from your local server. They referenced the great article Caching and the Google AJAX Libraries by Steve Webster.

Steve(W)’s article concludes by saying that loading jQuery from Google Hosted Libraries is probably NOT a good idea because of the low percentage of sites that use a single version. Instead, developers should bundle jQuery with their own scripts and host it from their own web server. Steve got his data from the HTTP Archive – a project that I run. His article was written in November 2011 so I wanted to update the numbers in this post to help the folks on that Github thread. I also raise some issues that arise from creating combined scripts, especially ones that result in sizes greater than jQuery.

Preamble

SteveW shows the SQL he used. I’m going to do the same. As background, when SteveW did his analysis in November 2011 there were only ~30,000 URLs that were analyzed in each HTTP Archive crawl. We’re currently analyzing ~300,000 per crawl. So this is a bigger and different sample set. I’m going to be looking at the HTTP Archive crawl for Mar 1 2013 which contains 292,297 distinct URLs. The SQL shown in this blog post references these pages based on their unique pageids: pageid >= 6726015 and pageid <= 7043218 , so you’ll see that in the queries below.

Sites Loading jQuery from Google Hosted Libraries

The first stat in SteveW’s article is the percentage of sites using the core jQuery module from Google Hosted Libraries. Here’s the updated query and results:

mysql> select count(distinct(pageid)) as count, (100*count(distinct(pageid))/292297) as percent from requests where pageid >= 6726015 and pageid <= 7043218 and url like "%://ajax.googleapis.com/ajax/libs/jquery/%"; +-------+---------+ | count | percent | +-------+---------+ | 53414 | 18.2739 | +-------+---------+

18% of the world’s top 300K URLs load jQuery from Google Hosted Libraries, up from 13% in November 2011.

As I mentioned, the sample size is much different across these two dates: ~30K vs ~300K. To do more of an apples-to-apples comparison I restricted the Mar 1 2013 query to just the top 30K URLs (which is 28,980 unique URLs after errors, etc.):

mysql> select count(distinct(p.pageid)) as count, (100*count(distinct(p.pageid))/28980) as percent from requests as r, pages as p where p.pageid >= 6726015 and p.pageid <= 7043218 and rank <= 30000 and p.pageid=r.pageid and r.url LIKE "%://ajax.googleapis.com/ajax/libs/jquery/%"; +-------+---------+ | count | percent | +-------+---------+ | 5517 | 19.0373 | +-------+---------+

This shows an even higher percentage of sites loading jQuery core from Google Hosted Libraries: 19% vs 13% in November 2011.

Most Popular Version of jQuery from Google Hosted Libraries

The main question being asked is: Is there enough critical mass from jQuery on Google Hosted Libraries to get a performance boost? The performance boost would come from cross-site caching: The user goes to site A which deposits jQuery version X.Y.Z into the browser cache. When the user goes to another site that needs jQuery X.Y.Z it’s already in the cache and the site loads more quickly. The probability of cross-site caching is greater if sites use the same version of jQuery, and is lower if there’s a large amount of version fragmentation. Here’s a look at the top 10 versions of jQuery loaded from Google Hosted Libraries (GHL) in the Mar 1 2013 crawl.

mysql> select url, count(distinct(pageid)) as count, (100*count(distinct(pageid))/292297) as percent from requests where pageid >= 6726015 and pageid <= 7043218 and url LIKE "%://ajax.googleapis.com/ajax/libs/jquery/%" group by url order by count desc;

Table 1. Top Versions of jQuery from GHL Mar 1 2013 jQuery version percentage of sites 1.4.2 (http) 1.7% 1.7.2 (http) 1.6% 1.7.1 (http) 1.6% 1.3.2 (http) 1.2% 1.7.2 (https) 1.1% 1.8.3 (http) 1.0% 1.7.1 (https) 0.8% 1.8.2 (http) 0.7% 1.6.1 (http) 0.6% 1.5.2 (http)

1.6.2 (http) 0.5%

(tied)

That looks highly fragmented. SteveW saw less fragmentation in Nov 2011:

Table 2. Top Versions of jQuery from GHL Nov 15 2011 jQuery version percentage of sites 1.4.2 (http) 2.7% 1.3.2 (http) 1.3% 1.6.2 (http) 0.8% 1.4.4 (http) 0.8% 1.6.1 (http) 0.7% 1.5.2 (http) 0.7% 1.6.4 (http) 0.5% 1.5.1 (http) 0.5% 1.4 (http) 0.4% 1.4.2 (https) 0.4%

Takeaways #1

Here are my takeaways from looking at jQuery served from Google Hosted Libraries compared to November 2011:

The most popular version of jQuery is 1.4.2 in both analyses. Even though the percentage dropped from 2.7% to 1.7%, it’s surprising that such an old version maintained the #1 spot. jQuery 1.4.2, was released February 19, 2010 – over three years ago! The latest version, jQuery 1.9.1, doesn’t make it in the top 10 most popular versions, but it was only released on February 4, 2013. The newest version in the top 10 is jQuery 1.8.3, which is used on 1% of sites (6th most popular). It was released November 13, 2012. The upgrade rate on jQuery is slow, with many sites using versions that are multiple years old.

There is less critical mass on a single version of jQuery compared to November 2011: 2.7% vs. 1.7%. If your site uses the most popular version of jQuery the probability of users benefiting from cross-site caching is lower today than it was in November 2011.

There is more critical mass across the top 10 versions of jQuery. The top 10 versions of jQuery accounted for 8.8% of sites in November 2011, but has increased to 10.8% today.

8% of sites loading jQuery from Google Hosted Libraries add a query string to the URL. The most popular URL with a querystring is /ajax/libs/jquery/1.4.2/jquery.min.js?ver=1.4.2 . (While that’s not surprizing, the second most popular URL is: /ajax/libs/jquery/1.7.1/jquery.min.js?ver=3.5.1 .) As SteveW pointed out, this greatly reduces the probability of benefiting from cross-site caching because the browser uses the entire URL as the key when looking up files in the cache. Sites should drop the querystring when loading jQuery from Google Hosted Libraries (or any server for that matter).

The Main Question

While these stats are interesting, they don’t answer the original question asked in the Github thread: Which is better for performance: Loading jQuery from Google Hosted Libraries or from your own server?

There are really three alternatives to consider:

Load core jQuery from Google Hosted Libraries. Load core jQuery from your own server. Load core jQuery bundled with your other scripts from your own server.

I don’t have statistics for #3 in the HTTP Archive because I’m searching for URLs that match some regex containing “jquery” and it’s unlikely that a website’s combined script would preserve that naming convention.

I can find statistics for #2. This tells us the number of sites that could potentially contribute to the critical mass for cross-site caching benefits if they switched from self-hosting to loading from Google Hosted Libraries. Finding sites that host their own version of jQuery is difficult. I want to restrict it to sites loading core jQuery (since that’s what they would switch to on Google Hosted Libraries). After some trial-and-error I came up with this long query. It basically looks for a URL containing “jquery.[min.].js”, “jquery-1.x[.y][.min].js”, or “jquery-latest[.min].js”.

select count(distinct(pageid)) as count, (100*count(distinct(pageid))/292297) as percent from requests where pageid >= 6726015 and pageid <= 7043218 and ( url like "%/jquery.js%" or url like "%/jquery.min.js%" or url like "%/jquery-1._._.js%" or url like "%/jquery-1._._.min.js%" or url like "%/jquery-1._.js%" or url like "%/jquery-1._.min.js%" or url like "%/jquery-latest.js%" or url like "%/jquery-latest.min.js%" ) and mimeType like "%script%"; +--------+---------+ | count | percent | +--------+---------+ | 164161 | 56.1624 | +--------+---------+

Here are the most popular hostnames across all sites:

Table 3. Top Hostnames Serving jQuery Mar 1 2013 hostname percentage of sites ajax.googleapis.com 18.3% code.jquery.com 1.4% yandex.st 0.3% ajax.aspnetcdn.com 0.2% mat1.gtimg.com 0.2% ak2.imgaft.com 0.1% img1.imgsmail.ru 0.1% www.yfum.com 0.1% img.sedoparking.com 0.1% www.biggerclicks.com 0.1%

Takeaways #2

56% of sites are using core jQuery. This is very impressive. This is similar to the findings from BuiltWith (compared to “Top 100,000” trends). The percentage of sites using some portion of jQuery is even higher if you take into consideration jQuery modules other than core, and websites that bundle jQuery with their own scripts and rename the resulting URL.

38% of sites are loading core jQuery from something other than Google Hosted Libraries (56% – 18%). Thus, there would be a much greater potential to benefit from cross-site caching if these websites moved to Google Hosted Libraries. Keep in mind – this query is just for core jQuery – so these websites are already loading that module as a separate resource meaning it would be easy to switch that request to another server.

Although the tail is long, Google Hosted Libraries is by far the most used source for core jQuery. If we want to increase the critical mass around requesting jQuery, Google Hosted Libraries is the clear choice.

Conclusion

This blog post contains many statistics that are useful in deciding whether to load jQuery from Google Hosted Libraries. The pros of requesting jQuery core from Google Hosted Libraries are:

potential benefit of cross-site caching

ease of switching if you’re already loading jQuery core as a standalone request

no hosting, storage, bandwidth, nor maintenance costs

benefit of Google’s CDN performance

1-year cache time

The cons to loading jQuery from Google Hosted Libraries include:

an extra DNS lookup

you might use a different CDN that’s faster

can’t combine jQuery with your other scripts

There are two other more complex but potentially significant issues to think about if you’re considering bundling jQuery with your other scripts. (Thanks to Ilya Grigorik for mentioning these.)

First, combining multiple scripts together increases the likelihood of the resource needing to be updated. This is especially true with regard to bundling with jQuery since jQuery is likely to change less than your site-specific JavaScript.

Second, unlike an HTML document, a script is not parsed incrementally. That’s why some folks, like Gmail, load their JavaScript in an iframe segmented into multiple inline script blocks thus allowing the JavaScript engine to parse and execute the initial blocks while the file is still being downloaded. Combining scripts into a single, large script might reach the point where delayed parsing would be offset by downloading two or more scripts. As far as I know this has not been investigated enough to determine how “large” the script must be to reach the point of negative returns.

If you’re loading core jQuery as a standalone request from your own server (which 38% of sites are doing), you’ll probably get an easy performance boost by switching to Google Hosted Libraries. If you’re considering creating a combined script that includes jQuery, the issues raised here may mean that’s not the optimal solution.

SteveW and I both agree: To make the best decision, website owners need to test the alternatives. Using a RUM solution like Google Analytics Site Speed, Soasta mPulse, Torbit Insight, or New Relic RUM will tell you the impact on your real users.