As an incident that we reported on last week shows, the Internet routing system isn't as secure as we want it to be. But how bad is it really?

Let's start with a very short introduction into Internet routing. Routing is based on autonomous systems (ASes) exchanging prefixes (ranges of IP addresses) using the Border Gateway Protocol (BGP). Autonomous systems are first and foremost Internet Service Providers (ISPs). However, some end-user organizations swim with the big fish, usually in order to connect to two or more ISPs at the same time. The IP addresses ISPs give out to their customers are aggregated into a relatively small number of prefixes that cover large address blocks, and these prefixes are "announced" or "advertised" over BGP to other ASes. Prefixes make their way from AS to AS, so eventually the entire Internet knows where to send packets with a given destination address.

The term "Border Gateway Protocol" makes more sense in the context of 20 years ago, when the word "gateway" was used for what we today call a router. So BGP is the protocol used between border routers—the routers that sit at the edge of neighboring autonomous systems. ASes connect in a hierarchy that looks a lot like this:

The up/down relationships are between ISPs and their customers; the customer pays the ISP. The dotted lines from side to side are "peering" relationships where traffic is exchanged without money changing hands. The economic models are such that traffic flows up the hierarchy, then sideways, and finally down. Paths that go sideways, then up or down, and then sideways again only happen if someone is giving away free service, and is thus rare.

So packets from AS 6 to AS 5 may follow the path 6 - 3 - 1 - 2 - 5, where AS 6 pays AS 3, which in turn pays AS 1, with AS 5 paying AS 2. So all the ISPs get paid, even though AS 1 doesn't pay AS 2 (or the other way around). However, the path 6 - 3 - 4 - 2 - 5 is not a valid way to get from AS 6 to AS 5. In this case, AS 4 would have to pay AS 2 for this traffic, but AS 3 doesn't pay AS 4 anything so, so AS 4 would be giving away service for free. On the other hand, from AS 6 to AS 8 over the path 6 - 3 - 4 - 8 is fine, because AS 8 is AS 4's customer so AS 8 pays AS 4 for the incoming traffic.

BGP itself is blissfully unaware of all the money issues. In its default state, BGP will trust everything that it hears and happily give away service for free. To avoid this, BGP routers must be configured with filters that make sure only correct information is carried by the protocol. Additionally, prefix advertisements, which are BGP's way of inviting incoming traffic, are only sent in accordance with business relationships.

With knowledge of how each AS is interconnected with other ASes and whether a connection involves a customer/ISP relationship or peering, it's possible to know exactly how each destination throughout the Internet should be reached from any given source. In addition, it's necessary to know which range of IP addresses belongs to which AS. Accounting for rerouting after failures adds some complexity, but is not a problem.

Knowledge of the graph of the network and the AS-prefix relationships would make it possible to create filters that validate the information received over BGP and reject incorrect or falsified information, and there are routing databases where ASes can register this information. Unfortunately, many people fail to do so, and the information that is there is often incomplete. The IETF and the Regional Internet Registries that give out IP addresses and AS numbers are now working on a database and certificate infrastructure that would allow network operators to do exactly this, but we're not there yet.

Where are those servers, anyway?

Network operators simply don't know whether CNN's servers are in Atlanta or Beijing. So when BGP updates come in claiming the latter, ISPs—well, their routers—have very little choice other than to install those updates and start sending traffic in the new direction. 999 times out of 1,000, a rerouting event like this is a routine change in the network or an equally routine repair of a failure. But that last one in 1,000 is incorrect—either a mistake or because of an attack of some kind.

Back in the 1990s, an incident like the one that sent traffic to China would have forced network engineers to scramble for hours debugging and fixing the problem. But these days, such incidents are way too common. As a result, a battery of monitoring systems is available Internet-wide. So the notion of something like this staying under the radar is about as likely as a tank driving down Pennsylvania Avenue without drawing any attention.

This leads to an unpleasant state of cognitive dissonance. On the one hand, it's inconceivable that Internet routing is so gullible. On the other hand, it works well most of the time. Fixing the situation would be complex, expensive, and wouldn't pay off anytime soon.

(I went to my first IETF meeting in 2002, when inter-domain [read: inter-AS] routing security was high on the agenda. I remember having lunch at a pizza place in Atlanta with 20 people from Cisco, furiously drawing AS topologies on our napkins the whole time. At that time, there were two proposals on the table to make BGP more secure: S-BGP [Secure BGP] by BBN and soBGP [secure origin BGP] by Cisco. A lot of fighting about these proposals and the difficulties of fixing the problem and even making any changes at all to BGP made almost a decade pass with little to show for it.)

Don't underestimate the complexities involved in securing Internet-wide routing. What if a certificate used in S-BGP or soBGP expires? If that means that a connection is brought down, good luck downloading the updated certificate.

Routing is a critical real-time system. In such systems, the traditional security model of shutting down when security credentials can't be validated doesn't really work. When the system is working, it's important to use security mechanisms to keep attackers from disrupting it. At the same time, it's essential that these same security mechanisms don't get in the way of fixing the system when it has failed or is about to fail. Unfortunately, existing security models don't strike that balance.

The saving grace of Internet routing is that most ISPs carefully filter what their customers send them. So, if I instruct my BGP router to tell my ISP that I'm the owner of the Windows Update IP address, my ISP should know better and ignore this BGP advertisement. Since there is a business relationship between ISPs and customers, both sides have an incentive to keep up to date with any changes in prefixes that are announced.

Once incorrect information has passed the customer-ISP boundary, however, it will quickly spread over peering connections without finding too much in the way of filtering in its way. Because, as noted above, there is currently no Internet-wide database of authoritative routing information that can be used to generate filters. The only way for ISPs to filter their peers would be to continuously exchange updated information on a one-to-one basis. Because of the continuous churn in customers and the addition of new prefixes, as well as the large numbers of peers big ISPs have, this is simply unworkable.

Reexamining the China routing snafu

So what exactly happened in China that caused 15 percent of the Internet's prefixes—not 15 percent the traffic—to get rerouted to that nation in April? And was it an accident or something more malicious? I wasn't at the China Telecommunications Corporation office to observe the incident as it unfolded, so I can't say for sure whether this was a diabolical plan executed to perfection or a network engineer doing something really, really stupid. But I'm betting on the latter, and not just because of Hanlon's Razor.

One common BGP screw-up is leaking the entire routing table. There are currently some 341,000 prefixes making up the Internet and, in order to be able to reach them all, BGP routers need to have all of these in their routing tables. If, for some reason, a BGP router doesn't have any filters, it will simply send a copy of that full table to all the routers in neighboring ASes that it's connected to.

Leaking a full table is a mistake that happens fairly regularly, and it looks like this is what happened in China. Here's what may have gone down.

When a filter is updated, it can become nonfunctional. Usually, this is caught with a "maximum prefixes" filter of last resort—this kills a BGP session if more than a predetermined number of prefixes is received. But, even without this, such a leak usually isn't too devastating because a detour through (for instance) China means that additional ASes are traversed, and BGP prefers shorter paths over longer ones. This is possible because, for each prefix, the ASes on the way to that destination are recorded in an "AS path."

However, just leaking the full table, or at least a sizable fraction of it, was exacerbated by a curious design decision by China Telecom. That decision made it look to the outside world like China Telecom had also stripped off the AS path from all the prefixes that it had leaked. So it looked to peers of China Telecom that destinations such as CNN were located in China Telecom's network, rather than that CNN was merely reachable through China Telecom.

This is why relatively many ASes started sending their traffic to China. Stripping off the AS paths happens when information in BGP is exported into another routing protocol that is used locally, and then re-exported back into BGP. This practice is considered dangerous because of earlier incidents like the one discussed here. There is also not really any good reason to do it—there are a few not-so-good reasons—and I can't think of any way that this would happen by accident.

So just the leaking of a full or partial BGP table in itself isn't too suspicious, although networks the size of China Telecom should know better. But doing so with the AS paths removed may be construed as a reason for a moderate level of suspicion by those so inclined.

If I were more paranoid, however, I would start looking around the Internet for anomalous prefix/AS combinations that show up randomly for short periods of time. Someone who really wants to intercept traffic would be better off hosting a few servers and a BGP router in a data center somewhere in the well-connected world and then try to see which ISPs fail to filter properly. With an ISP located, a targeted attack could be mounted to reroute traffic for much longer than 18 minutes without being found out. Rerouting North-American prefixes within America would also be much less suspicious than rerouting North-American prefixes to China, not to mention providing a much higher degree of deniability.

While we wait for some form of BGP security to become available, we should all think about what would happen if the addresses of the remote systems we communicate with are rerouted and our traffic intercepted. Encryption and cryptographic authentication such as HTTPS or a VPN protects against this. However, there is a problem with encryption: certificate authorities may not be as trustworthy as they should be. How to work around that is a story for another day.