Typical Web traffic is easy enough to spot: it uses TCP port 80. But plenty of protocols prefer to remain in the shadows and purposely make themselves difficult to identify—including Skype, BitTorrent, and eMule. If easy to identify, such protocols might make a tempting target for ISPs seeking to throttle back certain kinds of traffic. However, even these "obfuscated" protocols have a hard time hiding their secrets; encrypting the traffic can't keep them hidden, nor can certain tunneling behaviors that try to disguise one sort of traffic as another .

Who wants to identify traffic that hopes to remain hidden? Vendors of traffic analysis hardware, for one, who sell their gear to ISPs and must first be able to classify traffic before doing anything useful with it.

Deep packet inspection hardware, which can look inside the payload of individual packets, can be thwarted with even light encryption. But vendors have long ago figured out how to identify protocols even when they don't want to be identified, and even when the data is encrypted. There are different ways of doing this, but the most common relies on various arcane statistical measurements: how a protocol negotiates a handshake between a server and client, how it exchanges encryption keys, packet sizes, the order of packet arrivals, etc. Crafting a protocol that has no such distinct identifying characteristics is very, very hard, in part because vendors don't publicize their identification techniques; even when deep packet inspection engines are open-sourced, the truly tricky bits tend to get left out.

A pair of Swedish security experts recently released a new paper to remind us just how difficult good protocol obfuscation can be. "Breaking and Improving Protocol Obfuscation," (PDF) by Erik Hjelmvik and Wolfgang John, was written for Chalmers University of Technology and Sweden's Internet Infrastructure Foundation. It shows in detail how the authors were able to routinely identify obfuscated protocols like BitTorrent, Skype, and eMule.

But Hjelmik and John aren't out to produce better identification tools. No, their goal is to show the weaknesses in current obfuscated protocols in order to make those protocols better. If the protocols can't be identified, then ISPs can't do much to interfere with them.

"The purpose with [sic] our research is not to reinforce active filtering of P2P traffic on the Internet," they write. "Instead we want to support the concept of network neutrality by providing feedback to the creators of obfuscated protocols. As we have observed, the supposed-to-be-obfuscated protocols are not obfuscated enough to avoid statistical identification of various properties specific to the protocols."

In the paper, the authors show how an open source tool can be trained to reliably pick out encrypted, proprietary, and obfuscated protocols with more than 90 percent accuracy. (Skype proved most difficult to reliably identify.) To better hide their protocols, designers need to do more than encrypt payloads; they must pay attention to obscuring any unique flow properties as well, using tools like random padding of packets, randomized flushing of the datastream, and tricky techniques to randomize the direction of packet exchanges.

Call it the new protocol arms race. Hjelmik and John are working to make tools like SPID (Statistical Protocol IDentification) into lean, mean, traffic identification machines—but they're doing so only in order to push protocol designers to do a better job of obfuscating traffic.