



Web Privacy Measurement is the observation of websites and serves to detect, characterize and quantify privacy-impacting behaviors. Applications of Web Privacy Measurement include the detection of price discrimination, targeted news articles and new forms of browser fingerprinting. Although originally focused solely on privacy violations, WPM now encompasses measuring security violations on the web as well.

For these studies to be truly large-scale and repeatable, creating an automated measurement platform is necessary. At least within the academic literature, measurement infrastructures in the field of WPM have been largely one-off and do not comprehensively address the engineering challenges within this realm.

OpenWPM , a flexible, stable, scalable and general web measurement platform, is our solution to this infrastructure vacuum. This tutorial shows how to get started with OpenWPM, gives an overview of its general functionality and lists some key engineering challenges which are still being solved. We hope that this tool will enable other researchers to perform WPM studies and welcome future collaboration.

Installation

OpenWPM has been developed and tested on Ubuntu 14.04/16.04. An installation script, install.sh is included to install both the system and python dependencies automatically. A few of the python dependencies require specific versions, so you should install the dependencies in a virtual environment if you're installing a shared machine. If you plan to develop OpenWPM's instrumentation extension or run tests you will also need to install the development dependencies included in install-dev.sh .

It is likely that OpenWPM will work on platforms other than Ubuntu, however we do not officially support anything else. For pointers on alternative platform support see the wiki

Quick Start

Once installed, it is very easy to run a quick test of OpenWPM. Check out demo.py for an example. This will use the default setting specified in automation/default_manager_params.json and automation/default_browser_params.json , with the exception of the changes specified in demo.py .

Instrumentation and Data Access

manager_params['database_name'] in the main output directory. Response bodies are saved to content.ldb . The SQLite schema specified by: automation/schema.sql , instrumentation may specify additional tables necessary for their measurement data (see OpenWPM provides several instrumentation modules which can be enabled independently of each other for each crawl. With the exception of response body content, all instrumentation saves to a SQLite database specified byin the main output directory. Response bodies are saved to. The SQLite schema specified by:, instrumentation may specify additional tables necessary for their measurement data (see extension tables ).