Issues with latency and dropped packets can kill a network's performance and cripple applications like real-time communications, scientific computing, and high-frequency trading. But the problems can be extremely difficult to diagnose, as they may not appear under test conditions, and real-time monitoring of performance can require dedicated hardware or procedures that actually cut into the usable bandwidth. A team of academic researchers have come up with what they think is a solution, one that could sample the transmission of a collection of representative packets in real time, in a manner that's inexpensive in terms of both hardware and networking resources.

The researchers, who were supported by grants from both the National Science Foundation and Cisco, presented their work at the SIGCOMM meeting on Thursday; they've placed a paper describing it online as well. The paper describes how the system—which they term a Lossy Difference Aggregator—would operate in principle, describe some simulations of its performance, and suggest how it might be implemented. Unfortunately, it appears that it would require an extension to an IEEE standard that's only been adopted recently, as well as dedicated processing hardware.

Doing real-time monitoring, if you ignore implementation details, is simple: simply assign each network packet a timestamp when it leaves a piece of hardware, and then compare that to the time at which it's received. The challenge is communicating these timestamps between the hardware. Each has to be matched with a specific packet, which can be computationally intensive, and the two pieces of hardware have to transfer the data in order to make time comparisons. It's possible to cut down on the work by choosing a representative sample of packets for a given time period, but coordinating the choice of packets across hardware can be a challenge.

The Lossy Difference Aggregator, as the "lossy" part of its name implies, is a way of selecting a representative subset of packets to track. Basically, as each one comes into the router, it's assigned a hash value. That value is then used to assign it a position in a data structure that has an arbitrary number of columns, termed "banks," and slots within each column. (Normally, I'd consider this structure a matrix, but the word doesn't appear at all in the paper.) Each entry contains the packet's hash value and a timestamp.

So, for example, a structure limited to 1024 entries could contain a single bank with 1024 entries, or four banks with 256 entries each. The hash value is used to place the packet in a specific location in the structure. So, in the authors' example, a hash with three leading zeros might assign it to bank 1, while seven leading zeros would place it in bank 2. A separate function can be used to assign it a row within the bank. Anything that doesn't find a place in this structure is discarded.

After a set sampling time, the sending hardware transmits this structure to the equipment that should be receiving it, which has been building a similar structure out of the same packets. At this point, the actual performance data should be simple: lost packets can be identified as unfilled slots, and the time stamps can be used to calculate various latency figures. Because it's so simple, the authors calculate that implementing it would require adding only an additional one percent to the transistor count of even the low-end ASICs currently in use. The data structure itself would require only 72Kbits of control traffic a second.

Mathematically, the authors demonstrate that the system would provide a statistically accurate measure of both the latency and its standard deviation. They also created a simulator, which they used to demonstrate its accuracy. Even under really bad conditions, like a 20 percent packet loss rate, its estimates of latency would be off by only four percent—and really, if you're losing 20 percent of your packets, latency's probably the least of your concerns. Comparisons with a method of actively monitoring network performance showed that the Lossy Difference Aggregator provided more accurate latency measures.

Of course, network hardware will need to recognize this traffic as distinct from the packets it's supposed to be routing. The authors suggest adding an extension to the IEEE 1588 standard, which is used for synchronizing the clocks of network equipment. Since accurate comparisons of time stamps require clock synchronization anyway, this seems like a reasonable suggestion.

The remaining challenge involves actually putting an implementation into hardware. The authors, perhaps due to their interactions with their sponsors at Cisco, seem especially attuned to the realities of the networking hardware world. The power of embedded processors, they suggest, is starting to commoditize the networking hardware market in the same way that the power of desktop processors has transformed the PC market. The specialized real-term monitoring hardware could represent a value-added proposition for vendors. Its first likely customers—high frequency traders and high performance computing centers—are also among the least price-sensitive.

The authors also point out that the data generated by their method can provide value well before it's fully deployed. Putting this hardware on either side of major network bottlenecks could be extremely useful, and it might be possible to arrange the protocol so that it operates across hardware that's separated by a number of intervening devices. As the intervening hardware is replaced, the data returned will simply become finer-grained.