The Effect of DNS on Tor’s Anonymity Overview • Writing • Code • Data • Contact

The domain name system (DNS) is a fundamental part of the Internet, mapping human-readable domains to machine-readable IP addresses. When fetching a web page in a browser, a DNS request almost always precedes the actual web traffic. This is also the case when using Tor Browser, the privacy-enhanced browser developed by The Tor Project to provide millions of users with anonymity online.

A lot of research has gone into improving the Tor network, but its use of DNS has received little attention. In this research project, we set out to learn how DNS can harm the anonymity of Tor users, and how adversaries can leverage the DNS protocol to deanonymize users, as illustrated by the diagram to the right. We study (i) how exposed the DNS protocol is compared to web traffic, (ii) how Tor exit relays are configured to use DNS, (iii) how existing website fingerprinting attacks can be enhanced with DNS, and (iv) how effective these enhanced website fingerprinting attacks are at Internet-scale.

We show how an attacker can use DNS requests to mount highly precise website fingerprinting attacks: Mapping DNS traffic to websites is highly accurate even with simple techniques, and correlating the observed websites with a website fingerprinting attack greatly improves the precision when monitoring relatively unpopular websites. Our results show that DNS requests from Tor exit relays traverse numerous autonomous systems that subsequent web traffic does not traverse. We also find that a set of exit relays, at times comprising 40% of Tor’s exit bandwidth, uses Google’s public DNS servers—an alarmingly high number for a single organization. We believe that Tor relay operators should take steps to ensure that the network maintains more diversity into how exit relays resolve DNS domains.

What does our work mean for Tor users? As we outline in our blog post, we don’t believe that there is any immediate cause for concern. While our attacks work well in simulations, not many entities are in a position to mount them. Besides, they require non-trivial engineering effort to be reliable, and The Tor Project is already working on improved website fingerprinting defenses.

The main outcome of this research project is a paper that is going to be published at the Network and Distributed System Security Symposium in February 2017. In addition, we published detailed replication instructions, to make it easier to reproduce our results. All our writing is listed below.

We have developed a tool, ddptr , which stands for “DNS Delegation Path Traceroute.” The tool determines the DNS delegation path for a fully qualified domain name, and then runs UDP traceroutes to all DNS servers on the path. These traceroutes are then compared to a TCP traceroute to the web server behind the same fully qualified domain name.

Now imagine that our machine is trying to establish a TCP connection to baidu.com. How many autonomous systems will our network packets traverse? The two images to the right show an example. (Click on the images for a larger version.) First, our machine has to resolve the domain before it can send packets to the IP address. The left image shows UDP traceroutes to all DNS1 servers in the delegation path for “baidu.com,” namely 192.58.128.30, 192.43.172.30, and 202.108.22.220. In total, these traceroutes traversed 13 different autonomous systems, illustrated by the rectangular boxes. The right image shows a TCP traceroute to “baidu.com.” The traceroute traversed at least four autonomous systems. In this simple example, we see that the DNS resolution process for baidu.com exposes our traffic to more autonomous system than the actual TCP connection, provided we run our own DNS resolver.

We also publish the (mostly Python and R) scripts that we used to analyse and plot our data. The git repository also contains the LaTeX source of our paper and the project page you are looking at.

git clone https://github.com/NullHypothesis/tor-dns.git

We publish the following datasets. Each tarball contains a README.txt file that explains the respective dataset. We also want to encourage you to replicate our work and reproduce all our datasets. Our replication guide is meant to ease this task.

Exit resolver dataset

The following dataset is a collection of .pcap files that we captured on the authoritative DNS server for tor.nymity.ch. We used this dataset to identify the DNS resolvers of Tor exit relays. The tarball contains a README file that provides more details.

DNS exposure dataset

The following dataset contains the output of the tool ddptr, which we ran on a VPS operated by OVH. The tarball contains a README file that provides more details.

DNS request number dataset

The following dataset contains the number of DNS requests per five minute interval as recorded on our exit relay. The dataset contains two files, one for a reduced exit policy, and one for an exit policy containing only port 80 and 443.

Internet-scale simulation dataset

The following dataset contains data for the (i) fraction of compromised streams and (ii) time until first compromise for 10,000 simulated Tor users. We generated the data with TorPS and by running traceroutes.

Popularity of Alexa’s top 10,000 domains

The following dataset contains the popularity of Alexa’s top 10,000 web sites. We obtained the data from the respective Amazon AWS API.

DNS requests for Alexa top 1,000,000 domains

The following datasets contain all DNS requests recorded by Tor Browser 5.5.4 when configured to not to browse over Tor for Alexa top 1,000,00 on April 15th 2016. The data was collected using tbdnsw as part of the DefecTor toolset.

PCAPs: alexa1mx5.tar.gz (7.4 GiB)

SHA-256: 100b2081ca194571206ba02d88459982baf7b0584b3dd3246c0c0413048ddb5e

SHA-256: Extracted textfiles: alexa1mx5-extracted.tar.gz (590 MiB)

SHA-256: 7361a816f24b34b1f8d9f26e9fa5a403622ce3b4b401a101f4b41cf1d6705ffc

SHA-256: Alexa top 1,000,000 file: top-1m.csv (22 MiB)

SHA-256: 65f8d31a61164825900d50296de35bfbeaac405c9227abf5680ff61c404aa933

SHA-256: IPv4 addresses for CloudFlare: ips-v4 (0.2 KiB)

SHA-256: 3a69b705b18bd630e748165183a8158220b755fa9026b7db967cd9769410e606

Website fingerprinting dataset for Alexa top 9,000x100 + Alexa 909,000x1

The following datasets contain a website fingerprinting dataset with 100 samples of Alexa top 9,000 (monitored sites) and one sample each of Alexa top 909,000 (unmonitored) collected with Tor Browser 5.5.4. The data was collected using tbw as part of the DefecTor toolset. The toolset also contains tools for extracting data. We use the same format for cells and extracted features as Wang et al.

Raw logfiles: alexa9kx100+900k.tar.gz (15 GiB)

SHA-256: c137074752143f893dba8857b0be1544ba12a6c08d4b296e7f63089e365fcf19

SHA-256: Extracted cells, features and DNS requests: alexa9kx100+900k-dns+cells+feat.tar.gz (4.1 GiB)

SHA-256: 2719475968afda4f36694fe9f84f9c1b1915db9ca440cf05b9a8361be55b8b05

SHA-256: Extracted features: alexa9kx100+900k-feat.tar.gz (817 MiB)

SHA-256: 4cfb258d4d1b12698cfa4aa56114692c646ee59dc7dbb3eecdde988336c16970

SHA-256: Extracted features used in our paper: alexa1kx100+100k-feat.tar.gz (94 MiB)

SHA-256: b7be02065cf20537683697cd083b26c2f299bb4ae5e089a58a2ba823132e8358

SHA-256: Alexa top 1,000,000 file: top-1m.csv (22 MiB)

SHA-256: 65f8d31a61164825900d50296de35bfbeaac405c9227abf5680ff61c404aa933

SHA-256: IPv4 addresses for CloudFlare: ips-v4 (0.2 KiB)

SHA-256: 3a69b705b18bd630e748165183a8158220b755fa9026b7db967cd9769410e606

We are a team of five researchers from three universities. Feel free to copy all of us if you have any questions or remarks.

At Princeton University:

At Karlstad University:

At KTH Royal Institute of Technology:

Last update: 2016-12-19