Check out the shiny new web UI https://www.thirdpartyweb.today/

Data on third party entities and their impact on the web.

This document is a summary of which third party scripts are most responsible for excessive JavaScript execution on the web today.

Table of Contents

Goals

Quantify the impact of third party scripts on the web. Identify the third party scripts on the web that have the greatest performance cost. Give developers the information they need to make informed decisions about which third parties to include on their sites. Incentivize responsible third party script behavior. Make this information accessible and useful.

Methodology

HTTP Archive is an inititiave that tracks how the web is built. Every month, ~4 million sites are crawled with Lighthouse on mobile. Lighthouse breaks down the total script execution time of each page and attributes the execution to a URL. Using BigQuery, this project aggregates the script execution to the origin-level and assigns each origin to the responsible entity.

NPM Module

The entity classification data is available as an NPM module.

const { getEntity } = require ( 'third-party-web' ) const entity = getEntity ( 'https://d36mpcpuzc4ztk.cloudfront.net/js/visitor.js' ) console . log ( entity ) // { // "name": "Freshdesk", // "homepage": "https://freshdesk.com/", // "categories": ["customer-success"], // "domains": ["d36mpcpuzc4ztk.cloudfront.net"] // }

Updates

2019-02-01 dataset

Huge props to WordAds for reducing their impact from ~2.5s to ~200ms on average! A few entities are showing considerably less data this cycle (Media Math, Crazy Egg, DoubleVerify, Bootstrap CDN). Perhaps they've added new CDNs/hostnames that we haven't identified or the basket of sites in HTTPArchive has shifted away from their usage.

2019-03-01 dataset

Almost 2,000 entities tracked now across ~3,000+ domains! Huge props to @simonhearne for making this massive increase possible. Tag Managers have now been split out into their own category since they represented such a large percentage of the "Mixed / Other" category.

2019-05-06 dataset

Google Ads clarified that www.googletagservices.com serves more ad scripts than generic tag management, and it has been reclassified accordingly. This has dropped the overall Tag Management share considerably back down to its earlier position.

2019-05-13 dataset

A shortcoming of the attribution approach has been fixed. Total usage is now reported based on the number of pages in the dataset that use the third-party, not the number of scripts. Correspondingly, all average impact times are now reported per page rather than per script. Previously, a third party could appear to have a lower impact or be more popular simply by splitting their work across multiple files.

Third-parties that performed most of their work from a single script should see little to no impact from this change, but some entities have seen significant ranking movement. Hosting providers that host entire pages are, understandably, the most affected.

Some notable changes below:

Third-Party Previously (per-script) Now (per-page) Beeketing 137 ms 465 ms Sumo 263 ms 798 ms Tumblr 324 ms 1499 ms Yandex APIs 393 ms 1231 ms Google Ads 402 ms 1285 ms Wix 972 ms 5393 ms

Data

Summary

Across top ~4 million sites, ~2700 origins account for ~57% of all script execution time with the top 50 entities already accounting for ~47%. Third party script execution is the majority chunk of the web today, and it's important to make informed choices.

How to Interpret

Each entity has a number of data points available.

Usage (Total Number of Occurrences) - how many scripts from their origins were included on pages Total Impact (Total Execution Time) - how many seconds were spent executing their scripts across the web Average Impact (Average Execution Time) - on average, how many milliseconds were spent executing each script Category - what type of script is this

Third Parties by Category

This section breaks down third parties by category. The third parties in each category are ranked from first to last based on the average impact of their scripts. Perhaps the most important comparisons lie here. You always need to pick an analytics provider, but at least you can pick the most well-behaved analytics provider.

Overall Breakdown

Unsurprisingly, ads account for the largest identifiable chunk of third party script execution.

Advertising

These scripts are part of advertising networks, either serving or measuring.

Analytics

These scripts measure or track users and their actions. There's a wide range in impact here depending on what's being tracked.

Rank Name Usage Average Impact 1 StatCounter 3,403 59 ms 2 Treasure Data 4,833 62 ms 3 WordPress Site Stats 3,732 64 ms 4 Roxr Software 1,623 66 ms 5 Amplitude Mobile Analytics 2,429 69 ms 6 Heap 2,605 72 ms 7 Mixpanel 7,513 75 ms 8 Google Analytics 1,200,666 77 ms 9 Chartbeat 4,492 79 ms 10 Quantcast 3,787 80 ms 11 Hotjar 177,468 84 ms 12 Parse.ly 3,278 91 ms 13 Searchanise 4,302 91 ms 14 Smart Insight Tracking 1,161 93 ms 15 etracker 1,643 94 ms 16 Snowplow 5,845 95 ms 17 CallRail 7,019 101 ms 18 Nielsen NetRatings SiteCensus 11,805 102 ms 19 Marchex 2,503 104 ms 20 Baidu Analytics 10,739 106 ms 21 Crazy Egg 4,988 110 ms 22 Evidon 1,061 111 ms 23 ContentSquare 1,367 130 ms 24 VWO 4,724 158 ms 25 Trust Pilot 15,202 164 ms 26 Net Reviews 2,537 176 ms 27 PageSense 1,294 180 ms 28 FullStory 7,654 184 ms 29 Segment 9,541 191 ms 30 Kampyle 1,094 206 ms 31 Optimizely 19,583 223 ms 32 Nosto 1,946 225 ms 33 UserReport 1,300 228 ms 34 BounceX 1,386 233 ms 35 mPulse 13,177 246 ms 36 PowerReviews 1,043 261 ms 37 Marketo 1,336 359 ms 38 Inspectlet 5,605 361 ms 39 Histats 13,537 361 ms 40 Bazaarvoice 1,845 397 ms 41 Snapchat 13,344 410 ms 42 Salesforce 10,892 478 ms 43 Lucky Orange 7,529 491 ms 44 Feefo.com 1,686 502 ms 45 Gigya 2,261 579 ms 46 Yandex Metrica 292,542 580 ms 47 Ezoic 1,329 582 ms 48 Revolver Maps 1,144 609 ms 49 AB Tasty 3,010 1594 ms

Social

These scripts enable social features.

Rank Name Usage Average Impact 1 AddToAny 24,490 87 ms 2 Pinterest 17,218 89 ms 3 LinkedIn 14,038 111 ms 4 VK 13,473 121 ms 5 Twitter 213,911 139 ms 6 Kakao 18,109 158 ms 7 Instagram 9,441 184 ms 8 Yandex Share 24,181 184 ms 9 Facebook 1,461,331 229 ms 10 ShareThis 40,133 234 ms 11 SocialShopWave 2,044 302 ms 12 AddThis 118,289 403 ms 13 Disqus 1,252 994 ms 14 LiveJournal 3,680 1327 ms 15 PIXNET 15,434 1508 ms 16 Tumblr 7,972 2048 ms

Video

These scripts enable video player and streaming functionality.

Rank Name Usage Average Impact 1 Vimeo 10,403 355 ms 2 Brightcove 6,615 809 ms 3 YouTube 408,326 849 ms 4 Wistia 13,083 928 ms 5 Twitch 1,068 1896 ms

Developer Utilities

These scripts are developer utilities (API clients, site monitoring, fraud detection, etc).

Rank Name Usage Average Impact 1 Key CDN 1,468 55 ms 2 LightWidget 2,325 66 ms 3 Siteimprove 1,510 76 ms 4 Trusted Shops 5,639 79 ms 5 New Relic 13,062 95 ms 6 Accessibe 3,561 99 ms 7 GetSiteControl 1,812 99 ms 8 Riskified 1,074 106 ms 9 Affirm 2,500 113 ms 10 iubenda 12,781 124 ms 11 Optanon 8,025 128 ms 12 Swiftype 1,544 135 ms 13 Seznam 1,654 135 ms 14 Bold Commerce 13,095 144 ms 15 Cookiebot 20,838 147 ms 16 Sift Science 1,080 148 ms 17 Other Google APIs/SDKs 580,415 168 ms 18 Amazon Pay 5,751 169 ms 19 TrustArc 1,370 180 ms 20 MaxCDN Enterprise 2,394 198 ms 21 GitHub 1,653 232 ms 22 Fraudlogix 2,244 258 ms 23 Fastly 6,694 275 ms 24 PayPal 15,436 342 ms 25 Stripe 23,538 388 ms 26 Cloudflare 53,319 465 ms 27 Google Maps 267,417 520 ms 28 AppDynamics 1,281 527 ms 29 Secomapp 2,150 546 ms 30 Bugsnag 8,686 569 ms 31 Rambler 7,698 691 ms 32 GoDaddy 6,687 700 ms 33 Sentry 9,061 715 ms 34 Signyfyd 1,691 867 ms 35 Mapbox 5,206 877 ms 36 Yandex APIs 27,480 1080 ms 37 POWr 16,407 1364 ms 38 Esri ArcGIS 1,692 4750 ms

Hosting Platforms

These scripts are from web hosting platforms (WordPress, Wix, Squarespace, etc). Note that in this category, this can sometimes be the entirety of script on the page, and so the "impact" rank might be misleading. In the case of WordPress, this just indicates the libraries hosted and served by WordPress not all sites using self-hosted WordPress.

Marketing

These scripts are from marketing tools that add popups/newsletters/etc.

Rank Name Usage Average Impact 1 Beeketing 3,984 74 ms 2 RD Station 6,545 76 ms 3 iZooto 1,370 83 ms 4 Ve 1,848 149 ms 5 Listrak 1,073 154 ms 6 Hubspot 33,125 181 ms 7 Yotpo 13,629 202 ms 8 Mailchimp 23,373 204 ms 9 OptinMonster 7,493 260 ms 10 Bronto Software 1,056 262 ms 11 Pardot 1,435 404 ms 12 Albacross 1,920 490 ms 13 Sumo 18,438 686 ms 14 Bigcommerce 10,096 976 ms 15 Drift 6,565 1279 ms 16 Judge.me 8,307 1375 ms 17 PureCars 2,697 1901 ms 18 Tray Commerce 3,173 2328 ms

Customer Success

These scripts are from customer support/marketing providers that offer chat and contact solutions. These scripts are generally heavier in weight.

Rank Name Usage Average Impact 1 SnapEngage 1,083 59 ms 2 Pure Chat 4,661 70 ms 3 Foursixty 1,411 81 ms 4 LivePerson 4,302 112 ms 5 iPerceptions 1,992 132 ms 6 Comm100 1,321 134 ms 7 Intercom 15,656 245 ms 8 Help Scout 2,183 258 ms 9 Tidio Live Chat 12,655 383 ms 10 Tawk.to 63,460 405 ms 11 LiveChat 19,468 411 ms 12 ContactAtOnce 3,247 491 ms 13 Jivochat 45,110 553 ms 14 Olark 6,903 656 ms 15 Smartsupp 14,862 782 ms 16 ZenDesk 69,488 892 ms

Content & Publishing

These scripts are from content providers or publishing-specific affiliate tracking.

Rank Name Usage Average Impact 1 Research Online 2,222 62 ms 2 Accuweather 1,491 72 ms 3 Booking.com 1,656 157 ms 4 Tencent 2,257 163 ms 5 OpenTable 1,563 165 ms 6 SnapWidget 2,607 174 ms 7 Covert Pics 1,007 184 ms 8 AMP 74,549 308 ms 9 Medium 1,157 474 ms 10 Embedly 5,513 514 ms 11 Spotify 3,225 602 ms 12 issuu 1,934 670 ms 13 SoundCloud 4,464 986 ms 14 Dailymotion 1,838 1244 ms

CDNs

These are a mixture of publicly hosted open source libraries (e.g. jQuery) served over different public CDNs and private CDN usage. This category is unique in that the origin may have no responsibility for the performance of what's being served. Note that rank here does not imply one CDN is better than the other. It simply indicates that the scripts being served from that origin are lighter/heavier than the ones served by another.

Tag Management

These scripts tend to load lots of other scripts and initiate many tasks.

Mixed / Other

These are miscellaneous scripts delivered via a shared origin with no precise category or attribution. Help us out by identifying more origins!

Rank Name Usage Average Impact 1 ResponsiveVoice 1,241 70 ms 2 Amazon Web Services 38,265 161 ms 3 All Other 3rd Parties 1,380,493 318 ms 4 Parking Crew 5,147 326 ms 5 Heroku 2,002 607 ms 6 uLogin 2,316 1223 ms

Third Parties by Total Impact

This section highlights the entities responsible for the most script execution across the web. This helps inform which improvements would have the largest total impact.

Future Work

Introduce URL-level data for more fine-grained analysis, i.e. which libraries from Cloudflare/Google CDNs are most expensive. Expand the scope, i.e. include more third parties and have greater entity/category coverage.

FAQs

I don't see entity X in the list. What's up with that?

This can be for one of several reasons:

The entity does not have references to their origin on at least 50 pages in the dataset. The entity's origins have not yet been identified. See How can I contribute?

What is "Total Occurences"?

Total Occurrences is the number of pages on which the entity is included.

How is the "Average Impact" determined?

The HTTP Archive dataset includes Lighthouse reports for each URL on mobile. Lighthouse has an audit called "bootup-time" that summarizes the amount of time that each script spent on the main thread. The "Average Impact" for an entity is the total execution time of scripts whose domain matches one of the entity's domains divided by the total number of pages that included the entity.

Average Impact = Total Execution Time / Total Occurrences

How does Lighthouse determine the execution time of each script?

Lighthouse's bootup time audit attempts to attribute all toplevel main-thread tasks to a URL. A main thread task is attributed to the first script URL found in the stack. If you're interested in helping us improve this logic, see Contributing for details.

The data for entity X seems wrong. How can it be corrected?

Verify that the origins in data/entities.js are correct. Most issues will simply be the result of mislabelling of shared origins. If everything checks out, there is likely no further action and the data is valid. If you still believe there's errors, file an issue to discuss futher.

How can I contribute?

Only about 90% of the third party script execution has been assigned to an entity. We could use your help identifying the rest! See Contributing for details.

Contributing

Thanks

A huge thanks to @simonhearne and @soulgalore for their assistance in classifying additional domains!

Updating the Entities

The domain->entity mapping can be found in data/entities.js . Adding a new entity is as simple as adding a new array item with the following form.

{ "name" : "Facebook" , "homepage" : "https://www.facebook.com" , "categories" : [ "social" ] , "domains" : [ "*.facebook.com" , "*.fbcdn.net" ] , "examples" : [ "www.facebook.com" , "connect.facebook.net" , "staticxx.facebook.com" , "static.xx.fbcdn.net" , "m.facebook.com" ] }

Updating Attribution Logic

The logic for attribution to individual script URLs can be found in the Lighthouse repo. File an issue over there to discuss further.

Updating the Data

The queries used to compute the data are in the sql/ directory.

Edit all-observed-domains-query.sql to query the correct month's HTTPArchive run. Run all-observed-domains-query.sql in BigQuery. Download the results and check them in at data/YYYY-MM-01-observed-domains.json . Edit bootup-time-scripting.partial.sql to query the correct month's HTTPArchive run. Run origin-query.generated.sql in BigQuery. Download the results and check them in at data/YYYY-MM-01-origin-scripting.json . Run yarn build to regenerate the latest canonical domain mapping. Create a new table in lighthouse-infrastructure.third_party_web BigQuery table of the format YYYY_MM_01 with the csv in dist/domain-map.csv with three columns domain , canonicalDomain , and category . Edit bootup-time-scripting.partial.sql to join on the table you just created. Run yarn build to regenerate the queries. Run entity-per-page.generated.sql in BigQuery. Download the results and check them in at data/YYYY-MM-01-entity-scripting.json . Run web-almanac-all-observed-domains-identification.sql in BigQuery. Save the results to a BigQuery table YYYY_MM_01_all_observed_domains .

Updating this README

This README is auto-generated from the templates lib/ and the computed data. In order to update the charts, you'll need to make sure you have cairo installed locally in addition to yarn install .

# Install `cairo` and dependencies for node-canvas brew install pkg-config cairo pango libpng jpeg giflib # Build the requirements in this repo yarn build # Regenerate the README yarn start

Updating the website