Thursday, 11 August 2016 In hacking tags: USENIX security Tor Linux

tl;dr: This vulnerability is quite serious, but it doesn’t affect the Tor network any more than it affects the rest of the internet. In particular, the Tor-specific attacks mentioned in the paper will not work as described.

Recently, an excellent paper, entitled “Off-Path TCP Exploits: Global Rate Limit Considered Dangerous,” was published by Yue Cao, Zhiyun Qian, Zhongjie Wang, Tuan Dao, Srikanth V. Krishnamurthy, and Lisa M. Marvel at USENIX Security 2016.

The paper describes the 2012 modifications of RFC5961 to the specification of the Transmission Control Protocol (TCP), the latter of which is used to transport roughly 90% of our data across the internet. The modification was meant to protect against TCP “blind in-window” attacks.

When a TCP packet is sent, the sender and receiver both know a number, called the sequence number, that this packet should have. If the sequence number is not correct, various (complicated, boring) things may happen, but the important part is that neither the sender nor the receiver actually believes that this is a valid packet. Instead, they assume something went wrong somehow, or that an active attacker is attempting to inject packets into their communication stream. The term blind simply means that an attacker is unable to directly observe the packets going between the sender and receiver, but is usually instead trying to use some side-channel to determine this information. There’s another part of the TCP specification which describes windowing — which simply means (did I mention that TCP is very complicated and boring…) that the sequence number was “correct enough” — that is, that the sequence number was within the right range. Specification nerds have long argued over what “correct enough” means, because apparently they find this topic absolutely riveting.

The fix to the TCP blind in-window attack was to specify that, under certain conditions, if the TCP sequence number doesn’t match what was expected, the receiver of this messed up packet should send a “challenge” ACK to the sender. Depending on the type of messed-up-ness, the sender and receiver do one of a number of little dances with each other, in the special way that TCP is so fond of doing. When one party sends a challenge ACK, they increment a counter stored in a global variable which is shared across all TCP connections. This global variable is reset to 0 once per second, and it has a maximum value of 100, i.e. no more than 100 challenge ACKs will be sent per second (for all connections combined). If it wasn’t obvious from the title of the paper, global variables (across programming languages, frameworks, and contexts) are commonly known to be a very bad, no good, horrible idea.

The attack described in the paper is elegant. In terms of its impact, 96.6% of the Alexa top one million are running Linux kernels, and hence are likely vulnerable. The previously described global ACK counter enables various side-channels across TCP connections, meaning that a blind attacker can determine information about:

whether Alice and Bob are currently communicating over TCP , what the correct TCP sequence number is, and what the range of the valid window is.

The attacker does this by sending various crafted packets to the receiver (i.e. via a side-channel) while the sender is simultaneously sending valid packets to the receiver. The combined state of the attacker’s and the sender’s effects upon the global counter, for each of the above pieces of information, can be determined by whether the attacker receives either 99 or 100 ACKs in response:

The authors go on to claim the attack can be used to influence a Tor user’s path through the network. However, the authors seem to have a misunderstanding regarding how Tor’s path selection algorithm functions.

Their idea is summarised in the last paragraph of §7.2 of the paper (emphasis mine):

In general, we believe that a DoS attack against Tor connections can have a devastating impact on both the availability of the service as a whole and the privacy guarantees that it can provide. The default policy in Tor is that if a connection is down between two relay nodes, say a middle relay and an exit relay, the middle relay will pick a different exit relay to establish the next connection. If an attacker can dictate which connections are down (via reset attacks), then the attacker can potentially force the use of certain exit relays.

This is is technically incorrect. The way Tor’s path selection algorithm actually works — when a connection fails — is that the client forgets the path of that circuit entirely, and goes back to step #1 of the algorithm, effectively choosing an entirely new path without any memory of the path chosen before. Since the selection of the nodes in this new path (and in fact, any path) is dependent on their bandwidth weight from the consensus, the client has just as much probability to select the same exit as they did the last time. Therefore, to use this attack to “funnel” (as the authors describe) Tor users into using a particular exit node is of equal difficulty — in terms of bandwidth of the nodes you would need to run — to conducting a Sybil attack on the whole network.

Although, with a high-bandwidth exit in a sybil attack, the attacker has a high (and importantly, to the attack’s benefit, independent) probability that an exit it controls will get picked by the client. Whereas with this attack, the bandwidth weighting is likely detrimental to pulling off the attack, since the exits you’re injecting RSTs into still have independently high probabilities of being chosen again. In other words, knocking nodes out of the network doesn’t do anything to change their probability of being chosen, it merely makes them unavailable and thus only amounts to a DoS attack, not a path bias attack.

While the attack on Tor — as stated in the paper — does not work, the attack itself is impressive, and we encourage these (and other!) researchers to think of ways the attack might apply to Tor (and other networks).

Their attack does work as a general denial-of-service against not just Tor relays, but literally against anything running Linux.

The accepted Linux kernel patch solves the issue, and does so by randomising the time window that the global variable applies to.