This feature originally ran on August 28, 2012.

When Libyan rebels finally wrested control of the country last year away from its mercurial dictator, they discovered the Qaddafi regime had received an unusual gift from its allies: foreign firms had supplied technology that allowed security forces to track nearly all of the online activities of the country’s 100,000 Internet users. That technology, supplied by a subsidiary of the French IT firm Bull, used a technique called deep packet inspection (DPI) to capture e-mails, chat messages, and Web visits of Libyan citizens.

The fact that the Qaddafi regime was using deep packet inspection technology wasn’t surprising. Many governments have invested heavily in packet inspection and related technologies, which allow them to build a picture of what passes through their networks and what comes in from beyond their borders. The tools secure networks from attack—and help keep tabs on citizens.

Narus, a subsidiary of Boeing, supplies “cyber analytics” to a customer base largely made up of government agencies and network carriers. Neil Harrington, the company’s director of product management for cyber analytics, said that his company’s “enterprise” customers—agencies of the US government and large telecommunications companies—are ”more interested in what's going on inside their networks” for security reasons. But some of Narus’ other customers, like Middle Eastern governments that own their nations’ connections to the global Internet or control the companies that provide them, “are more interested in what people are doing on Facebook and Twitter.”

Surveillance perfected? Not quite, because DPI imposes its own costs. While deep packet inspection systems can be set to watch for specific patterns or triggers within network traffic, each specific condition they watch for requires more computing power—and generates far more data. So much data can be collected that the DPI systems may not be able to process it all in real time, and pulling off mass surveillance has often required nation-state budgets.

Not anymore. Thanks in part to tech developed to power giant Web search engines like Google’s—analytics and storage systems that generally get stuck with the label "big data"—"big surveillance" is now within reach even of organizations like the Olympics.

Network security camera

The tech is already helping organizations fight the ever-rising threat of hacker attacks and malware. The organizers of the London Olympic games, in an effort to prevent hackers and terrorists from using the games’ information technology for their own ends, undertook one of the most sweeping cyber-surveillance efforts ever conducted privately. In addition to the thousands of surveillance cameras that cover London, there was a massive computer security effort in the Games’ Security Operation Centers, with systems monitoring everything from network infrastructure down to point-of-sale systems and electronic door locks.

"Almost everything interesting happening in networking has some DPI embedded in it. What gets people riled up a bit is the ‘inspection’ part, because somehow inspection has negative connotations."

The logs from those systems generated petabytes of data before the torch was extinguished. They were processed in real-time by a security information and event management (SIEM) system using “big data” analytics to look for patterns that might indicate a threat—and triggering alarms swiftly when such a threat was found.

The combination of the sophisticated analytics and massive data storage in big data systems with DPI network security technology has created what Dr. Elan Amir, CEO of Bivio Networks, calls “a security camera for your network.”

"There's no question that within the next three to five years, not having a copy of your network data will be as strange as not having a firewall," Amir told me.

The capability used at London’s Games doesn’t have a billion-dollar price tag. Nearly any organization on a budget can assemble something similar, in some cases with hardware already on hand and a free initial software download. And the potential applications go far beyond benign network security. With the ability to store data over long periods, companies and governments with smaller budgets could not only track what's going on in social media, but reconstruct the communications between people over a period of months or even years, all with a single query.

“The danger here,” Electronic Frontier Foundation Technology Projects Director Peter Eckersley told Ars, “is that these technologies, which were initially developed for the purpose of finding malware, will end up being repurposed as commercial surveillance technology. You start out checking for malware, but you end up tracking people.”

Unchecked, Eckersley said, companies or rogue employees of those companies will do just that. And they could retain data indefinitely, creating a whole new level of privacy risk.

How deep packet inspection works

As we send e-mails, search the Web, and post messages and comments to blogs, we leave a digital trail. At each point where Internet communications are received and routed toward their ultimate destination, and at each server they touch, security and systems operations tools give every transactional conversation anything from a passing frisk to the equivalent of a full strip search. It all depends on the tools used and how they’re set up.

One of the key technologies that drives these tools is deep packet inspection. A capability rather than a tool itself, DPI is built into firewalls and other network devices. Deep packet inspection and packet capture technologies revolutionized network surveillance over the last decade by making it possible to grab information from network traffic in real time. DPI makes it possible for companies to put tight limits on what their employees (and, in some cases, customers) can do from within their networks. The technology can also log network traffic that matches rules set up on network security hardware— rules based on the network addresses that the traffic is going to, the type of traffic itself, or even keywords and patterns within its contents.

“Almost everything interesting happening in networking, especially with a slant toward cyber security, has some DPI embedded in it, even if people aren’t calling it that,” said Bivio’s Amir. “It’s a technology and a discipline that captures all of the processing and network activity that’s getting done on network traffic outside of the standard networking elements of packets—the addressing and routing fields. What gets people riled up a bit is the ‘inspection’ part, because somehow inspection has negative connotations.”

To understand how DPI works, you first have to understand how data travels across networks and the Internet. Regardless of whether they’re wired or wireless, Internet-connected networks generally use Internet Protocol (IP) to handle routing data between the computers and devices attached to them. IP sends data in chunks called packets—blocks of data proceeded by handling and addressing information that lets routers and other devices on the network know where the data came from and where it’s going. That addressing information is often referred to in the networking world as Layer 3 data, a reference to its definition within the Open Systems Interconnection network model.

The OSI Layers of an Internet data packet

OSI Layer Name Description Layer 1 Physical The format for the transmission of data across the networking medium, defining how data gets passed across it. WiFi (802.11) is a physical layer standard. Layer 2 Data link Within a network segment, handles the physical addressing—the media access control (MAC) addressing of devices on the network and their communication. Ethernet and Point-to-Point Protocol are data link protocols. Layer 3 Network Handles the logical addressing and routing of data, based on soft-defined addresses. Internet Protocol headers are the Layer 3 data in a packet. Layer 4 Transport Protocol information, such as in the Transmission Control Protocol (TCP) and the User Datagram Protocol, provides for error-checking and recovery and flow control of data. Layer 5 Session Handles communications between applications, such as remote procedure calls, inter-process communications like “named pipes,” and TCP secure sockets (SOCKS). Layer 6 Presentation or Syntax Data formatting, serialization, compression and encryption services, like the Multipurpose Internet Mail Extension (MIME) format. Layer 7 Application The data sent for specific applications in formats such as HTTP for the request and delivery of Web content, File Transfer Protocol (FTP), IMAP and SMTP mail connections, and other application-specific formats.

Internet routers generally just look at Layer 3 data to determine which network path a packet gets relayed down to. Network firewalls look a little deeper into the data when making a decision about whether to let packets pass onto the networks they protect. Packet-filtering firewalls typically look at Layer 3 and Layer 4, checking what transport protocol (such as TCP or UDP) and which Internet Protocol port number they use (this is commonly associated with a specific application; port 80, for example, is usually associated with Web services).

Application-layer firewalls, which emerged in the 1990s, look still deeper into network traffic. These set rules for network traffic based on the specific type of application the data within the packet was for. Application firewalls were the first real “deep packet inspection” devices, checking the application protocols within the packets themselves, as well as searching for patterns or keywords in the data they contain.