EasyList Applied at the Web

The growth of EasyList isn’t in-and-of-itself concerning, as long as newly added rules are beneficial to users. If, though, EasyList’s size reflects an accumulation of expired or rarely used rules, then there is a lot of wasted computation, and a lot of wasted time, happening on users’ machines.

To answer this question, we applied EasyList to both the Alexa 5k, a curated list of the 5,000 most popular sites on the web, and a random sampling of 5,000 sites from the Alexa 1,000,000 (ensuring no duplicate sites). Our measurement was in several steps:

Use Selenium and the DevTools Protocol to record every URL requested when rendering and executing a website. Add additional automation to randomly select three distinct same-domain URLs from anchor tags on a page. Used the above automation to visit the homepage of each site, and a maximum of three child pages, and recorded all URLs requested for images, script files, and other web resources. Determine which of those URLs would be blocked by the version of EasyList fetched on that day, using Brave’s optimized ad-block implementation.

The results presented in this post describe the above steps applied to EasyList and the Alexa listings as of Saturday, July 13th, 2018. All measurements were performed through AWS Lambda. We’ve provided the code for the Lambda function on github.

Approximately 20% of the domains we requested either did not reply, or replied with error codes. We attribute this to anti-crawling techniques being applied to the well-know AWS IPs we crawled from.

As a result, we successfully crawled 8,085 domains, and 30,280 individual pages. We found that the vast majority of EasyList rules are not used when browsing popular websites; 3,268 of 39,198 (~8%) of network and exception rules were used during our crawls (these measurements exclude element rules).

We also found that the rules in EasyList were not equally useful, even when only considering the rules that were used at least once. For example, we found that only 201 rules accounted for 90% of blocking activity. In fact, 99.5% of rules were used 10 times or less on the ~30k pages we visited.

Finally, we also measured what kinds of resources are blocked by EasyList. As the above graph shows, images were most frequently blocked by EasyList, followed by script and iframe requests. This difference in blocking distribution matters, because different requests can have significant follow-on impacts to the browsing experience. Blocking an image request might save the user some network use, while blocking a stylesheet or an iframe might save the users additional sub-resource fetches.