Princeton Web Census Data Release We are releasing the entire Princeton Web Census data containing privacy measurements of 1 million sites conducted regularly from December 2015 to June 2019. By Steven Englehardt, Gunes Acar, Dillon Reisman, and Arvind Narayanan.

This page is part of the Princeton Web Transparency and Accountability Project.



Background

Since 2015, we have conducted a web census to study third-party online tracking. Each month, our bot visits the web’s 1 million most popular sites and records data pertaining to user privacy, including cookies, fingerprinting scripts, the effect of browser privacy tools, and the exchange of tracking data between different sites ("cookie syncing").

Our open-source measurement software, OpenWPM, has been used in dozens of other studies. In 2016 we published a paper "Online Tracking: A 1-million-site Measurement and Analysis" based on a snapshot of this data, and released that snapshot.

Now we are releasing the entire Princeton Web Census data -- about 15 terabytes -- containing privacy measurements of 1 million sites conducted each month from December 2015 to June 2018.

We plan to run one or two more crawls in the next few months (until mid 2019), and we will update this data release periodically. (Update: November 2018 and June 2019 crawls are added to the release.)

Access

Send an email to web-census-data@lists.cs.princeton.edu to request access to the dataset. Please tell us who you are and a high-level description of what you plan to use it for. (We'll approve all requests, but we'd like to get an idea of how people are using the data.)

Overview of the data

Each month, we run measurements in eight configurations at scales ranging from 10,000 sites to 1 million sites, summarized here. Please visits dataset details page for usage information, timeline of changes and issues with the data.