Today’s post covers Tor hidden services and their anonymity. In the first few paragraphs I will provide some basic, high level information on the Tor network and then talk about a way to uncover the real location of some anonymous hidden services.

For those not familiar with it: Tor allows anyone to proxy their network traffic* through the Tor peer to peer network, transiting several peers before reaching its destination. The destination can be either a service on the open Internet or a “hidden service” only available via the Tor network. (*only TCP and DNS traffic is allowed through Tor)

Routers:

Traffic transiting the Tor network is encrypted and routed through several peers in a way that no single peer ever knows both the source and the destination of the traffic, thus providing a level of anonymity to the originator. The peers which pass on traffic are called routers, and can be run by anyone who would like to contribute to the capacity of the Tor network. The Tor client run by the person wanting to proxy traffic through the Tor network chooses the routers to use for each connection.

Hidden services:

Tor allows people to run services, such as HTTP, HTTPS, SMTP, SSH, etc. that are available exclusively via the Tor network. These are called “hidden services” or “onion services” due to the .onion pseudo TLD used to identify them. These hidden services provide anonymity not only for the user of the service, but also for the publisher who runs the service.

What not to do:

Don’t run a router and a hidden service from the same connection if you want to remain anonymous. While I hate to tell people not to contribute, if you’re running a hidden service it is a bad idea to run a router from the same connection. The reason for this is that routers are public by design. Everybody needs to know what routers exist so that they can decide which ones to use for their connections.

In all my reading of the documentation and about the Tor network, I never ran across the advice not to run both a router and a hidden service. I have seen it mentioned since I started working on this project and began looking for it specifically. But, this is something that I think should be in big bold letters warning people of the risks, not tucked away in a small corner of a website somewhere.

Why shouldn’t I do that?

As I said in the previous section, “what not to do”, routers are public. When you run a router you publish a “descriptor” that tells other people how to talk to your router. This means you tell them what your real IP address is along with some other information. If you run a router and a hidden service from the same connection, when you have a service interruption both the router and hidden service will experience it at the same time. If someone were monitoring the uptime of both, they could correlate the simultaneous service outages and get a pretty good idea that the particular hidden service is at the same location as the router. The longer they watch, the surer they can be.

So what does this look like?

I was curious about how well this would work, as well as how many people were running both a router and a hidden service. In order to find out, I located a directory web site of hidden services and crawled it for links to hidden web sites. I then took this list and began periodically checking if they were up. At the same time, I started saving the published router descriptors. I then compared the uptime data gathered for each hidden HTTP service to the descriptors. I could discard any routers which were not up at a time the service was, but I couldn’t do the reverse due to transient connection problems to hidden services via the Tor network. Of the remaining candidate routers for each hidden service, I found the one which had a total uptime closest to that observed from the hidden service.

Here are some of the more interesting results. These graphs cover about 70 days of data. The top line shows the service uptime, and the bottom the uptime of the router which best matches the uptime of the service. Both lines have a positive bias. (If the service or router was up at any time covered by a particular pixel, the pixel shows it is up)

All in all, out of the 360 hidden services that I monitored, about 15-30 could be said have a fair to good correlation with a particular router. This is a little less than I expected which is good. Hopefully this number will shrink in the future as people become aware of this problem.

So which services and routers are those?

Now, now, that would be telling! I don’t know and I don’t care which particular services and routers these graphs represent. My goal with this project was not to “out” any particular service or individual, rather it was to call attention to this problem and hopefully warn any would be hidden service operators of the danger.

Danger Will Robinson! Danger!

With only a small amount of effort, the effectiveness of this method could be dramatically improved. The easiest and most effective improvement would be to actually poll the routers by connecting to them rather than relying on the descriptors. This would provide much higher resolution data regarding router uptime and allow for closer correlation in a shorter time. Polling of hidden services could also be improved by collecting data from multiple Tor nodes to improve resolution and mitigate transient connection problems. A more unscrupulous adversary could even intentionally cause outages with DoS attacks.