I was shuffling to catch up with news waiting for Google I/O 2016 to start (which was 1AM my timezone), while an idea just popped up: let’s build an ad blocker to browse news in my phone without the unwanted distraction!

Some brainstorming needed here. We’re gonna need to prevent WebView from loading ads, or unwanted resources, when it tries to load a webpage. A little digging into WebView documentation leads us to WebViewClient . We can use shouldInterceptRequest() to intercept each request issued by a webpage, check its URL and decide whether we want to load resources from that URL.

Now how do we identify if resources from a URL are potentially ads? Let’s check how popular ad blockers like uBlock Origin or AdBlock do it: they both have a few black lists of things to filter. Easylist, EasyPrivacy, etc are some well known ones, but are overkill for our needs: they specify sites with CSS selectors, while we only have a URL to work with here. pgl.yoyo.org list used by uBlock Origin seems to be promising though: it generates all hostnames considered ad servers. Now we only need to match blacklisted hostnames with our URL!

TL;DR

A summary of what we need to do:

Get the list of ad hostnames from pgl.yoyo.org

Save the list somewhere, load it when application starts

Use WebViewClient.shouldInterceptRequest(WebView, String) to intercept requests

to intercept requests Check if the request URL belongs to one of the hostnames in the list and override it, returning a dummy resource instead of the actual one, which is supposed to be ads

Getting list of ad hostnames

pgl.yoyo.org site provides a few options to generate the list. Since we only care about hostnames without IP addresses, let’s choose plain non-HTML list -- as a plain list of hostnames (no HTML) with no links back to this page (we should accredit it somewhere else of course).

This will give us a list as follows:

pgl.yoyo.org.txt

101com.com 101order.com 123found.com 180hits.de 180searchassistant.com 1x1rank.com 207.net 247media.com ...

Load ad hostnames into memory

We can either save this list to a file, include it as an asset, or as a raw resource in our app. In either case we will have to do I/O operation to read from this file. Let’s pick asset.

Loading from file is simple. Okio is used below, but it can be replaced by java.io APIs. One thing to keep in mind is we should do I/O operation in background thread. A simple AsyncTask will do. Here we load directly into a static Set variable, which would persist in memory as long as the app process runs, but let’s keep it simple here.

MyApplication.java

public class MyApplication extends Application { @Override public void onCreate () { super . onCreate (); AdBlocker . init ( this ); } }

AdBlocker.java

public class AdBlocker { private static final String AD_HOSTS_FILE = "pgl.yoyo.org.txt" ; private static final Set < String > AD_HOSTS = new HashSet <>(); public static void init ( Context context ) { new AsyncTask < Void , Void , Void >() { @Override protected Void doInBackground ( Void ... params ) { try { loadFromAssets ( context ); } catch ( IOException e ) { // noop } return null ; } }. execute (); } @WorkerThread private static void loadFromAssets ( Context context ) throws IOException { InputStream stream = context . getAssets (). open ( AD_HOSTS_FILE ); BufferedSource buffer = Okio . buffer ( Okio . source ( stream )); String line ; while (( line = buffer . readUtf8Line ()) != null ) { AD_HOSTS . add ( line ); } buffer . close (); stream . close (); } }

Intercept request

Next step is to intercept WebView ’s requests to check if they should be overriden. The logic below caches previously checked results from the same session so we don’t end up rechecking the same URL.

webView . setWebViewClient ( new WebViewClient () { private Map < String , Boolean > loadedUrls = new HashMap <>(); @TargetApi ( Build . VERSION_CODES . HONEYCOMB ) @Override public WebResourceResponse shouldInterceptRequest ( WebView view , String url ) { boolean ad ; if (! loadedUrls . containsKey ( url )) { ad = AdBlocker . isAd ( url ); loadedUrls . put ( url , ad ); } else { ad = mLoadedUrls . get ( url ); } return ad ? AdBlocker . createEmptyResource () : super . shouldInterceptRequest ( view , url ); } }); webView . loadUrl ( "http://example.com" );

Match domain and override resource

Last step is to implement AdBlocker.isAd(url) and AdBlocker.createEmptyResource() . The latter one should be straightforward. The interesting bit now is how to match a full URL with the list of hostnames.

Let’s consider ads from Google Doubleclick network: it has URLs with hosts such as pubads.g.doubleclick.net , adclick.g.doubleclick.net , googleads.g.doubleckick.net . We have one single entry in our list that may match - doubleclick.net . Our strategy here would be to extract the host from URL, walk up the sub-domain chain, try to match the whole sub-domain first, then keep stripping off the sub-domain until we exhaust or find a match.

// Checking if pubads.g.doubleclick.net is a match doubleclick.net != pubads.g.doubleclick.net doubleclick.net != g.doubleclick.net doubleclick.net == doubleclick.net -> block pubads.g.doubleclick.net

AdBlocker.java

import okhttp3.HttpUrl ; public class AdBlocker { private static final Set < String > AD_HOSTS = new HashSet <>(); ... public static boolean isAd ( String url ) { HttpUrl httpUrl = HttpUrl . parse ( url ); return isAdHost ( httpUrl != null ? httpUrl . host () : "" ); } private static boolean isAdHost ( String host ) { if ( TextUtils . isEmpty ( host )) { return false ; } int index = host . indexOf ( "." ); return index >= 0 && ( AD_HOSTS . contains ( host ) || index + 1 < host . length () && isAdHost ( host . substring ( index + 1 ))); } @TargetApi ( Build . VERSION_CODES . HONEYCOMB ) public static WebResourceResponse createEmptyResource () { return new WebResourceResponse ( "text/plain" , "utf-8" , new ByteArrayInputStream ( "" . getBytes ())); } }

Check out a demo below:

That’s fun! Add an adress bar, a progress bar, a few standard browser buttons and you now have an ad-free Android web browser, built by yourself! The solution is not as comprehensive as uBlock Origin or AdBlock, but it should remove enough distraction.

A complete implementation can be found on Materialistic’s Github repository: