The Web never forgets: Persistent tracking mechanisms in the wild is the first large-scale study of three advanced web tracking mechanisms - canvas fingerprinting, evercookies and use of "cookie syncing" in conjunction with evercookies. Read the paper »

About

The study is a collaboration between researchers Gunes Acar1, Christian Eubank2, Steven Englehardt2, Marc Juarez1, Arvind Narayanan2, Claudia Diaz1

1 KU Leuven, ESAT/COSIC and iMinds, Leuven, Belgium {gunes.acar, marc.juarez, claudia.diaz}@esat.kuleuven.be

2 Princeton University {cge,ste,arvindn}@cs.princeton.edu

Reference: G. Acar, C. Eubank, S. Englehardt, M. Juarez, A. Narayanan, C. Diaz. The Web never forgets: Persistent tracking mechanisms in the wild. In Proceedings of CCS 2014, Nov. 2014. (Forthcoming)

Results

Canvas Fingerprinting

Background Canvas fingerprinting is a type of browser or device fingerprinting technique that was first presented by Mowery and Shacham in 2012. The authors found that by using the Canvas API of modern browsers, one can exploit the subtle differences in the rendering of the same text to extract a consistent fingerprint that can easily be obtained in a fraction of a second without user's awareness.

Results By crawling the homepages of the top 100,000 sites we found that more than 5.5% of the crawled sites include canvas fingerprinting scripts. Although the overwhelming majority (95%) of the scripts belong to a single provider (addthis.com), we discovered a total of 20 canvas fingerprinting provider domains, active on 5542 of the top 100,000 sites. On the right, collage of the images printed to canvas by various fingerprinting scripts discovered during the study. The images are intercepted using a modified browser (by instrumenting the ToDataURL method). Some blank space was cropped from images to save space.

Canvas Fingerprinting Scripts

The below table shows the summary of canvas fingerprinting scripts found on the homepages of top 100K Alexa sites.

Full list of sites using Canvas Fingerprinting »

Fingerprinting script Number of

including sites Text drawn into the canvas ct1.addthis.com/static/r07/core130.js (and 17 others) 5282 Cwm fjordbank glyphs vext quiz i.ligatus.com/script/fingerprint.min.js 115 http://valve.github.io src.kitcode.net/fp2.js 68 http://valve.github.io admicro1.vcmedia.vn/fingerprint/figp.js 31 http://admicro.vn/ amazonaws.com/af-bdaz/bquery.js 26 Centillion *.shorte.st/js/packed/smeadvert-intermediate-ad.js 14 http://valve.github.io stat.ringier.cz/js/fingerprint.min.js 4 http://valve.github.io cya2.net/js/STAT/89946.js 3 ABCDEFGHIJKLMNOPQRSTUVWXYZ

abcdefghijklmnopqrstuvwxyz0123456789+/ images.revtrax.com/RevTrax/js/fp/fp.min.jsp 3 http://valve.github.io pof.com 2 http://www.plentyoffish.com *.rackcdn.com/mongoose.fp.js 2 http://api.gonorthleads.com 9 others* 9 (Various) TOTAL 5559

(5542 unique1)

*: Some URLs are truncated or omitted for brevity.

1: Some sites include canvas fingerprinting scripts from more than one domain.

Evercookies & Respawning

Background Evercookies are designed to overcome the "shortcomings" of the traditional tracking mechanisms. By utilizing multiple storage vectors that are less transparent to users and may be more difficult to clear, evercookies provide an extremely resilient tracking mechanism, and have been found to be used by many popular sites to circumvent deliberate user actions1, 2, 3.

Results

We detected respawning by Flash cookies on 10 of the 200 most popular sites and found 33 different Flash cookies were used to respawn over 175 HTTP cookies on 107 of the top 10,000 sites. The below table shows the 10 top-ranked websites found to include respawning based on Flash cookies.

Country: The country where the website is based.

3rd*: The domains that are different from the first-party but registered for the same company in the WHOIS database.

Global rank Site Country Respawning (Flash) domain Flash cookie name 1st/3rd Party 16 sina.com.cn China simg.sinajs.cn stonecc_suppercookie.sol 3rd* 17 yandex.ru Russia kiks.yandex.ru fuid01.sol 1st 27 weibo.com China simg.sinajs.cn stonecc_suppercookie.sol 3rd* 41 hao123.com China ar.hao123.com $hao123$.sol 1st 52 sohu.com China tv.sohu.com vmsuser.sol 1st 64 ifeng.com Hong Kong y3.ifengimg.com www.ifeng.com.sol 3rd* 69 youku.com China irs01.net mt_adtracker.sol 3rd 178 56.com China irs01.net mt_adtracker.sol 3rd 196 letv.com China irs01.net mt_adtracker.sol 3rd 197 tudou.com China irs01.net mt_adtracker.sol 3rd

Cookie Syncing

Background Cookie synchronization or cookie syncing is the practice of tracker domains passing pseudonymous IDs associated with a given user, typically stored in cookies, amongst each other. Read the blog post that explains cookie syncing and our findings with animated diagrams: The hidden perils of cookie syncing (Freedom to Tinker)

Results The below table shows the number of IDs known by the top 10 parties involved in cookie sync under both the policy of allowing all cookies and blocking third-party cookies. Full list of domains involved in Cookie Syncing » All Cookies Allowed No 3P Cookies Domain # IDs Domain # IDs gemius.pl 33 gemius.pl 36 doubleclick.net 32 2o7.net 27 2o7.net 27 omtrdc.net 27 rubiconproject.com 25 cbsi.com 26 omtrdc.net 24 parsely.com 16 cbsi.com 24 marinsm.com 14 adnxs.com 22 gravity.com 14 openx.net 19 cxense.com 13 cloudfront.net 18 cloudfront.net 10 rlcdn.com 17 doubleclick.net 10

The table presents the comparison of high-level cookie syncing statistics when allowing and disallowing third-party cookies (top 3,000 Alexa domains). Statistic Third party cookie policy Allow Block # IDs 1308 938 # ID cookies 1482 953 # IDs in sync 435 347 # ID cookies in sync 596 353 # (First*) Parties in sync (407) 730 (321) 450 # IDs known per party 1 / 2.0 / 1 / 33 1 / 1.8 / 1 / 36 # Parties knowing an ID 2 / 3.4 / 2 / 43 2 / 2.3 / 2 / 22 The format of the bottom two rows is minimum/mean/median/maximum.

*Here we define a firstparty as a site which was visited in the first-party context at any point in the crawl.

Data

Databases available for download

(DO = Digital Ocean, EC2 = Amazon EC2) Name Size Machine # - Location (Provider) # of sites Flash enabled? cookie setting Data from previous crawls (Exp. #)

- Data loaded Continuous Profile Comments P01_alexa10k_05012014_fresh 114M 1 - N. Virginia (EC2) 10K yes Allow all no yes fresh profile P04_alexa10k_05032014_fresh 306M 1 - N. Virginia (EC2) 10K yes Allow all no yes fresh profile P06_alexa3k_05062014_fresh 84M 1 - N. Virginia (EC2) 3k yes Allow all No yes P08_alexa3k_05062014_fresh 84M 2 - N. Virginia (EC2) 3k yes Allow all No yes P09_alexa3k_05072014_flash 84M 2 - N. California (EC2) 3k yes Allow all (P6) - Flash yes loaded Flash from P6 P10_alexa3k_05072014_localStorage 77M 3 - N. Virginia (EC2) 3k yes Allow all (P6) - localStorage yes loaded localStorage from P6 P11_alexa3k_05072014_HTTP_cookies 90M 4 - N. Virginia (EC2) 3k yes Allow all (P6) - HTTP Cookies yes loaded cookies.sqlite from P6 P14_alexa3k_05122014_DNT 76M 1 - N. Virginia (EC2) 3k yes Allow all No yes DNT Enabled P15_alexa3k_05122014_DNT 81M 2 - N. California (EC2) 3k yes Allow all No yes DNT Enabled P16_alexa3k_05122014_no3Pcookies 55M 4 - N. Virginia (EC2) 3k yes Allow 1st party No yes Block third-part cookies P17_alexa3k_05122014_no3Pcookies 55M 3 - N. Virginia (EC2) 3k yes Allow 1st party No yes Block third-part cookies P21_alexa3k_06132014_opt-out 60M 5 - N. Virginia (EC2) 3k yes Allow all No yes Loaded Opt-out from: NAI, DAA, EDAA P22_alexa3k_06132014_opt-out 64M 6 - N. California (EC2) 3k yes Allow all No yes Loaded Opt-out from: NAI, DAA, EDAA L03_alexa10k_05032014_flash 295M 7- New York (DO) 10K yes Allow all (P1) - Flash no Flash loaded from P1 L04_alexa10k_05042014_flash 295M 7- New York (DO) 10K yes Allow all (P1) - Flash no Flash loaded from P1 L05_alexa10k_05042014_fresh 289M 8- New York (DO) 10K yes Allow all no no fresh profile L06_alexa100k_flash_no3Pcookies 2.1G 9- Leuven (local machine) 100K yes Allow 1st party Flash, from pilot crawls no Flash from pilot crawls, everything else cleared, no POST data, isolated with chroot. (DO = Digital Ocean, EC2 = Amazon EC2)

Code

Press

Due to the size of the files, data is available by request. Please feel free to email the authors for your requests. In the meantime, you can download a sample database The code developed during the study can be found at GitHub . This includes crawling infrastructure, modules for analysing browser profile data and crawl databases.