There are literally hundreds of objects and APIs available for pilfering browser data:

The question is then: what subset of this metadata provides enough entropy to create that sense of uniqueness?

Let’s consider this widely used fingerprinting library:

The library has about ~25 options baked in that a developer can use to build a fingerprint, along with about another dozen in active development. This toolset alone can likely produce a fingerprint with enough entropy to easily identify a a specific device out of tens of thousands, if not more, and the surface is rapidly broadening as browser expansion takes place.

For example, the advent of HTML5 in 2014 introduced the Canvas API, which was promptly discovered to have certain nuances that made it a boon for non-cookie based tracking. At a high level, canvas fingerprinting works by rendering an image or text on the canvas object. The image data is then translated to a non-visual representation in the form of a string of characters in order to create the fingerprint. Differences in the devices’s hardware will influence the resulting fingerprints despite the same code. The desired effect can easily be achieved in just several lines of code, and this is just one example of a “fingerprintable” data source.

Canvas fingerprinting on its own has been observed to add in the ballpark of 5+ bits of entropy, which on it’s own may not seem like much (2⁵=32), but if you consider that every single bit increases the entropy by an entire magnitude, those 5 points can make a tremendous difference when combined with other techniques.

So how prevalent is this practice? You might have noticed that the fingerprint2.js library has over 6k stars on GitHub — and that’s just the number of developers who have publicly expressed some sort of interest in the library.

Here at Confiant, we see this specific library surface through thousands of ad impressions daily, while tens of thousands of ad impressions every day leak the presence of some sort of fingerprinting code. In fact, next time you’re on your favorite website, chances are that if you open Chrome Dev Tools and search all files for the keyword “fingerprint”, there’s a good chance you’ll find some tracking code that’s either surfaced through an ad or analytics platform. Don’t be surprised if you see a reference to a canvas object in the same code base either.

Sometimes, a single data point alone can provide an abundance of information. Here’s another popular example that we see attached to ads, or leaked through ad calls in other ways:

https://github.com/faisalman/ua-parser-js UAParser.js — JavaScript library to identify browser, engine, OS, CPU, and device type/model from userAgent string.

Why should we care?

Tracking and privacy are complicated topics, but let’s assume for a second that legitimate advertisers, platforms, and analytics tools are out of the picture. This still leaves bad actors with a powerful tool to use and abuse in increasingly sophisticated ways.

The malvertising landscape is a high-octane game of cat and mouse where attackers need to iterate rapidly as security vendors get more adept at detection. For a bad actor, every payload reveal is a threat to the longevity of their campaign, especially if it happens in the wrong environment (e.g.: a scanner).

As a result, malvertisers are increasingly moving away from a “spray and pray” approach to triggering their payloads by leveraging some of the device fingerprinting techniques mentioned above to check if their campaign is being delivered to an individual ripe for a successful attack.

The endgame for the typical forced mobile redirect is a phishing page much like this familiar example:

Folks who fall for the trick will then need to submit their personal information through a form. The information will either be used for CPA fraud or perhaps even aggregated and sold somewhere. Another flavor of phishing landing page might look something like this:

The copy on this page happens to be device specific, and will ultimately lead to an actual malware install.

Despite the obvious use of fingerprinting to target the landing page copy, there’s usually a bit more going on behind the scenes for the more sophisticated bad actors. Fingerprinting will usually start at the creative level where the attacker will determine if the impression is being served to a human worthy of a redirect. An example attack might take the following precautions before triggering the payload:

Check that the impression is being served to the right device? (Android / IOS / Desktop)

Is it a new device worthy of targeting? (Certain browser API’s available that wouldn’t be available on older devices.)

How likely is it that the device is actually a scanner? ( e.g.: The battery API shows a power level of less than 100%)

Have we redirected this individual user before? (Detailed device fingerprint using canvas objects)

etc…

If the attacker’s creative determines that it’s not a worthwhile impression to reveal the payload, they can always show a dummy ad or fall back on IBV to recoup the purchase of the ad.

Where do we go from here?

Unfortunately there’s no easy and enforceable answer short of turning off all Javascript. While GDPR can help to keep already honest folks honest, a lot of these tracking techniques fly under the radar and store no data on the user’s browser the way that cookies do. Publishers need to continue to select their demand partners wisely or risk exposing their visitors to malicious activity via rogue ads. Of course, Confiant’s real-time blocking is always a powerful mitigation tool for malvertising attacks as well.