Computer scientists have devised an attack on the Tor privacy network that in certain cases allows them to deanonymize hidden service websites with 88 percent accuracy.

Such hidden services allow people to host websites without end users or anyone else knowing the true IP address of the service. The deanonymization requires the adversary to control the Tor entry point for the computer hosting the hidden service. It also requires the attacker to have previously collected unique network characteristics that can serve as a fingerprint for that particular service. Tor officials say the requirements reduce the effectiveness of the attack. Still, the new research underscores the limits to anonymity on Tor, which journalists, activists, and criminals alike rely on to evade online surveillance and monitoring.

"Our goal is to show that it is possible for a local passive adversary to deanonymize users with hidden service activities without the need to perform end-to-end traffic analysis," the researchers from the Massachusetts Institute of Technology and Qatar Computing Research Institute wrote in a research paper. "We assume that the attacker is able to monitor the traffic between the user and the Tor network. The attacker’s goal is to identify that a user is either operating or connected to a hidden service. In addition, the attacker then aims to identify the hidden service associated with the user."

The attack works by gathering the network data of a pre-determined list of hidden services in advance. By analyzing patterns in the number of packets passing between the hidden service and the entry guard it uses to access Tor, the researchers were able to obtain a unique fingerprint of each service. They were later able to use the fingerprint to identify the service even though they were unable to decrypt the traffic it was sending. In a press release, the researchers elaborated:

The researchers’ attack requires that the adversary’s computer serve as the guard on a Tor circuit. Since guards are selected at random, if an adversary connects enough computers to the Tor network, the odds are high that, at least on some occasions, one or another of them would be well-positioned to snoop. During the establishment of a circuit, computers on the Tor network have to pass a lot of data back and forth. The researchers showed that simply by looking for patterns in the number of packets passing in each direction through a guard, machine-learning algorithms could, with 99 percent accuracy, determine whether the circuit was an ordinary Web-browsing circuit, an introduction-point circuit, or a rendezvous-point circuit. Breaking Tor’s encryption wasn’t necessary. Furthermore, by using a Tor-enabled computer to connect to a range of different hidden services, they showed that a similar analysis of traffic patterns could identify those services with 88 percent accuracy. That means that an adversary who lucked into the position of guard for a computer hosting a hidden service, could, with 88 percent certainty, identify it as the service’s host. Similarly, a spy who lucked into the position of guard for a user could, with 88 percent accuracy, tell which sites the user was accessing.

The research is sure to interest governments around the world, including the US. On at least two occasions over the past few years, FBI agents have exploited software vulnerabilities, once Adobe Flash and once in Mozilla Firefox , to identify criminal suspects. Recently unsealed court documents also show the FBI seizing a Tor-hidden child porn site and allowing it to run for weeks so agents could gather evidence on visitors.

In an e-mail, Tor project leader Roger Dingledine said the requirements of the attack greatly limited its effectiveness in real-world settings. First, he said, the adversary must control one of the entry guards a hidden service is using. Such entry guards in theory are assigned randomly, so attackers would have to operate a large number of Tor nodes to have a reasonable expectation of seeing traffic of a given hidden service. Additionally, he cited research from last year arguing that researchers routinely exaggerate the risk of website fingerprinting on anonymity.

He went on to question the "classifier" algorithm that allowed the researchers to identify certain traffic as belonging to a Tor hidden service. It wouldn't be hard to thwart it, he said, by adding random padding to the data being sent.

"It's not surprising that their classifier basically stops working in the face of more padding," he wrote. "Classifiers are notoriously brittle when you change the situation on them. So the next research step is to find out if it's easy or hard to design a classifier that isn't fooled by padding.

The full text of Dingledine's e-mail is below: