There are a number of different use cases to track users as they use a particular web site. Some of them are more "sinister" then others. For most web applications, some form of session tracking is required to maintain the user's state. This is typically easily done using well configured cookies (and not the scope of this article). Session are meant to be ephemeral and will not persist for long.

On the other hand, some tracking methods do attempt to track the user over a long time, and in particular attempt to make it difficult to evade the tracking. This is sometimes done for advertisement purposes, but can also be done to stop certain attacks like brute forcing or to identify attackers that return to a site. In its worst case, from a private perspective, the tracking is done to follow a user across various web sites.

Over the years, browsers and plugins have provided a number of ways to restrict this tracking. Here are some of the more common techniques how tracking is done and how the user can prevent (some of) it:

1 - Cookies

Cookies are meant to maintain state between different requests. A browser will send a cookie with each request once it is set for a particular site. From a privacy point of view, the expiration time and the domain of the cookie are the most important settings. Most browsers will reject cookies set on behalf of a different site, unless the user permits these cookies to be set. A proper session cookie should not use an expiration date as it should expire as soon as the browser is closed. Most browser do offer means to review, control and delete cookies. In the past, a "Cookie2" header was proposed for session cookies, but this header has been deprecated and browser stop supporting it.

https://www.ietf.org/rfc/rfc2965.txt

http://tools.ietf.org/html/rfc6265

2 - Flash Cookies (Local Shared Objects)

Flash has it's own persistence mechanism. These "flash cookies" are files that can be left on the client. They can not be set on behalf of other sites ("Cross-Origin"), but one SWF script can expose the content of a LSO to other scripts which can be used to implement cross-origin storage. The best way to prevent flash cookies from tracking you is to disable flash. Managing flash cookies is tricky and typically does require special plugins.

https://helpx.adobe.com/flash-player/kb/disable-local-shared-objects-flash.html

3 - IP Address

The IP address is probably the most basic tracking mechanism of all IP based communication, but not always reliable as user's IP addresses may change at any time, and multiple users often share the same IP address. You can use various VPN products or systems like Tor to prevent your IP address from being used to track you, but this usually comes with a performance hit. Some modern JavaScript extension (RTC in particular) can be used to retrieve a user's internal IP address, which can be used to resolve ambiguities introduced by NAT. But RTC is not yet implemented in all browsers. IPv6 may provide additional methods to use the IP address to identify users as you are less likely going to run into issues with NAT.

http://ipleak.net

4 - User Agent

The User-Agent string sent by a browser is hardly ever unique by default, but spyware sometimes modifies the User-Agent to add unique values to it. Many browsers allow adjusting the User-Agent and more recently, browsers started to reduce the information in the User-Agent or even made it somewhat dynamic to match the expected content. Non-Spyware plugins sometimes modify the User-Agent to indicate support for specific features.

5 - Browser Fingerprinting

A web browser is hardly ever one monolithic piece of software. Instead, web browsers interact with various plugins and extensions the user may have installed. Past work has shown that the combination of plugin versions and configuration options selected by the user tends to be amazingly unique and this technique has been used to derive unique identifiers. There is not much you can do to prevent this, other then minimize the number of plugins you install (but that may be an indicator in itself)

https://panopticlick.eff.org

6 - Local Storage

HTML 5 offers two new ways to store data on the client: Local Storage and Session Storage. Local Storage is most useful for persistent storage on the client, and with that user tracking. Access to local storage is limited to the site that sent the data. Some browsers implement debug features that allow the user to review the data stored. Session Storage is limited to a particular window and is removed as soon as the window is closed.

https://html.spec.whatwg.org/multipage/webstorage.html

7 - Cached Content

Browsers cache content based on the expiration headers provided by the server. A web application can include unique content in a page, and then use JavaScript to check if the content is cached or not in order to identify a user. This technique can be implemented using images, fonts or pretty much any content. It is difficult to defend against unless you routinely (e.g. on closing the browser) delete all content. Some browsers allow you to not cache any content at all. But this can cause significant performance issues. Recently Google has been seen using fonts to track users, but the technique is not new. Cached JavaScript can easily be used to set unique tracking IDs.

http://robertheaton.com/2014/01/20/cookieless-user-tracking-for-douchebags/

http://fontfeed.com/archives/google-webfonts-the-spy-inside/

8 - Canvas Fingerprinting

This is a more recent technique and in essence a special form of browser fingerprinting. HTML 5 introduced a "Canvas" API that allows JavaScript to draw image in your browser. In addition, it is possible to read the image that was created. As it turns out, font configurations and other paramters are unique enough to result in slightly different images when using identical JavaScript code to draw the image. These differences can be used to derive a browser identifier. Not much you can do to prevent this from happening. I am not aware of a browser that allows you to disable the canvas feature, and pretty much all reasonably up to date browsers support it in some form.

https://securehomes.esat.kuleuven.be/~gacar/persistent/index.html

9 - Carrier Injected Headers

Verizon recently added injecting specific headers into HTTP requests to identify users. As this is done "in flight", it only works for HTTP and not HTTPS. Each user is assigned a specific ID and the ID is injected into all HTTP requests as X-UIDH header. Verizon offers a for pay service that a web site can use to retrieve demographic information about the user. But just by itself, the header can be used to track users as it stays linked to the user for an extended time.

http://webpolicy.org/2014/10/24/how-verizons-advertising-header-works/

10 - Redirects

This is a bit a varitation on the "cached content" tracking. If a user is redirected using a "301" ("Permanent Redirect") code, then the browser will remember the redirect and pull up the target page right away, not visiting the original page first. So for example, if you click on a link to "isc.sans.edu", I could redirect you to "isc.sans.edu/index.html?id=sometrackingid". Next time you go to "isc.sans.edu", your browser will automatically go direct to the second URL. This technique is less reliable then some of the other techniques as browsers differ in how they cache redirects.

https://www.elie.net/blog/security/tracking-users-that-block-cookies-with-a-http-redirect

11 - Cookie Respawning / Syncing

Some of the methods above have pretty simple counter measures. In order to make it harder for users to evade tracking, sites often combine different methods and "respawn" cookies. This technique is sometimes refered to as "Evercookie". If the user deletes for example the HTTP cookie, but not the Flash Cookie, the Flash Cookie is used to re-create the HTTP cookie on the user's next visit.

https://www.cylab.cmu.edu/files/pdfs/tech_reports/CMUCyLab11001.pdf

Any methods I missed (I am sure there have to be a couple...)

---

Johannes B. Ullrich, Ph.D.

STI|Twitter|LinkedIn

I will be teaching next: Intrusion Detection In-Depth - SANS Cyber Defense Forum & Training