A scary story about the demise of the Internet has been making the rounds through the social networking echo chamber lately. It's a paper by five researchers from the University of Minnesota and one from Kansas State University (actually only a poster was published). The title is ominous: "Losing Control of the Internet: Using the Data Plane to Attack the Control Plane." Like any good thriller, it has an iconic villain but plot holes large enough to drive a Cisco Carrier Routing System through—if the CRS had wheels. But despite that, it may very well be based on a true story.

The backstory is the epic war between bellheads and netheads. The bellheads have been running telephone networks for 125 years, and are well-aware that you must always keep your data and control planes separate. The data plane is the network that we users get to send our packets over. The control plane makes sure the data plane works, by running routing protocols and doing management. Of course, these functions also need to exchange data.

Bellhead standard operating procedure is to exchange this control plane data to its own connections that are completely separate from regular (data) network links. The netheads, on the other hand, firmly believe in "fate sharing," where the control messages and the data packets flow over the same connections, so if one works, the other does too, and vice versa.

There's just one little problem with that: if the data plane gets overloaded to the point that packets start to get "dropped," control plane packets, which share the same connections, also get lost. Earlier work shows that it's possible to disrupt the BGP routing protocol that ISPs depend on to make traffic find its way across the Internet, just by sending regular traffic. Granted, it takes a lot of regular traffic, but not quite so much that the traffic levels immediately look suspicious. The authors of this this original paper write: "Fortunately, major peering links with significant available bandwidth are difficult to attack due to required resources." Famous last words.

But our intrepid security researchers from Minnesota and Kansas follow up this little indie release with a bigger and badder sequel that's aimed squarely at the blockbuster summer audience. They start by adding a botnet that analyzes the Internet's structure, and then relentlessly aim an amplified attack at the most vulnerable parts of the Internet's core infrastructure.

I have to give them credit for not skimping on the bandwidth—you wouldn't believe how much research still looks at 11Mbps 802.11b. "[W]e use OC-768 size links, the largest link size currently in the SONET standard." For those who failed (or avoided) optical networking 101, an OC is worth almost 52Mb per second. So an OC-768 connection weighs in at 40Gb per second. The bots that make up the botnet are assumed to have 1Mbps upload capacity.

The researchers have the botnets perform traceroutes and then use a couple of simple equations to find the links in the core of the Internet that are used between the largest number of source-destination combinations. They then send traffic from one bot to another in such a way that it passes through the target link. Because the destination of the packets is a valid and willing recipient, and each bot only sends a megabit or so, this doesn't look suspicious at first—or even second—glance. But when some 40,000 bots all do this at the same time, it overloads the link in question, which then begins to drop packets. Including BGP packets.

And BGP normally wants to make sure that its fellow routers are still there, so it expects to see a packet every 90 to 180 seconds. Without such "keepalive" packets, BGP terminates its connection and throws away all the routing information from the now-disconnected neighbor. It then has to go through a list of up to 350,000 different "prefixes" (address ranges) in its routing tables and find alternate paths for the affected ones, updating its other BGP neighbors where appropriate. These in turn have to adjust their routing decisions and update their neighbors. This can go on for a while.

The authors claim that with a botnet of 250,000 bots they can pretty much stop the Internet dead in its tracks by building up more than an hour of backlogged route calculations for routers in the core of the network. I think they're right that it's possible to disrupt BGP sessions on a relatively large scale in this way, and that would be bad.

But now the plot holes. All of this work is done through simulations, but how these are performed is not explained. I'm especially interested in the model of BGP update propagation these researchers used. It seems they think every update is propagated everywhere, and updates are queued in unlimited quantities. I'm very sure the former is not true: although routers sometimes propagate updates that don't contain new information, they usually suppress those. And I'm confident that the latter isn't, either, if only because routers do all their processing in RAM, which isn't infinite.

Also, it takes 40,000 bots to saturate one 40Gbps link. Even assuming a good deal of background traffic, that still means no more than a dozen or so links can be targeted at the same time. And once the attack starts being successful, what happens to the botnet? Remember that this attack requires a high degree of coordination. An attack like this would seem to have significant self-limiting properties.

Still, it would be good if we could remove the vulnerabilities that make such attacks theoretically possible in the first place. The best solution is to give control messages such as BGP updates a higher priority than normal traffic. To some degree, current routers can already do this, but it would be good if they did so out of the box under all circumstances.

A pragmatic solution that the authors suggest is to disable BGP's keepalive mechanism. That way, the BGP session stays up no matter what. Of course there is a reason this mechanism exists: you really don't want to send your packets into a dead link or toward a dead router. But in many cases (certainly not always) it's possible to see when a link goes down without keepalives. In these cases, it would be possible to disable keepalives or at least make the time before a session is disconnected for lack of them a good deal longer. If only 10 percent of all ISPs disable keepalives, the attack no longer works. And we'd have a happy ending, in good Hollywood tradition.

ACM CCS '10, 2010. DOI: 10.1145/1866307.1866411 (About DOIs).