Yesterday some Internet users would have seen issues with their Internet connectivity, experiencing slowness or parts of the Internet as unreachable. This incident hit users in Japan particularly hard and it caused the Internal Affairs and Communications Ministry of Japan to start an investigation into what caused the large-scale internet disruption that slowed or blocked access to websites and online services for dozens of Japanese companies.

In this blog post we will take a look at the root cause of these outages, who was affected and what networks were involved.

Starting at 03:22 UTC yesterday (aug 25) followers of @BGPstream would have seen an increase in alerts involving Google. The BGPstream alerts were informing us that Google was announcing the peering lan prefixes of a few well known Internet exchanges. This in itself is actually a fairly common type of incident and typically indicates something isn’t quite right within the networks hijacking those prefixes and so these alerts were the first clues that something wasn’t quite right with Google’s BGP advertisements.

A closer look at our data shows not only BGP hijack incidents but also a high number of BGP leak events. A random example is this one: 171.5.0.0/17 announced by AS45629 (J

astel out of Thailand), which all of a sudden became reachable with Google as a provider for Jastel. To demonstrate this let’s look at some of the example paths (not an exclusive list):

1103 286 701 15169 45629 13335 9498 5511 701 15169 45629 202140 29075 5511 701 15169 45629 52342 20299 262206 701 15169 45629

If we take a closer look at the AS paths involved starting at the right side, we see the prefix was announced by 45629 (Jastel) as expected. Since Jastel peers with Google (15169) that’s the next AS we see. The next AS in the path is 701 (Verizon) and this is where it is getting interesting as Verizon has now started to provide transit for Jastel via Google. Verizon (701) then announced that to several of it’s customers, some of them very large such as KPN (286) and Orange (5511). So by just looking at 4 example paths we can see it hit large networks in Europe, Latin America, the US, and India (9498 Airtel).

In the example above we can see how Google accidentally became a transit provider for Jastel

by announcing peer prefixes to Verizon. Since verizon would select this path to

Jastel

it would have sent traffic for this network towards Google. Not only did this happen for

Jastel

, but thousands of other networks as well.

Google is not a transit provider and traffic for 3rd party networks should never go through the Google network. Jastel has a few upstream providers and with the addition of Google and Verizon to the path, it’s likely only Verizon customers (which is still significant) would have chosen this path and only those that had no other alternative or specifically prefered Verizon over shorter paths. However this is just the start.

A word about traffic engineering

Google is one of the largest (CDN) networks in the world. It has an open peering policy and is extremely well connected with many peers. It’s also the source of a large amount of traffic with popular websites such as Youtube, Google search, Google Drive, Google Compute, etc. As a result many networks exchange a significant volume of traffic with just Google and those with direct peering with Google will want to make sure Google picks the right peering link with them. So as result large networks will start to deploy traffic engineering tricks to make sure traffic flows over the correct peering links with Google. The most powerful trick in the book is to start de-aggregating and announce more specifics. This means no matter the AS path length or whatever local-pref Google sets locally, the more specific prefixes are always preferred.

Since Google essentially leaked a full table towards Verizon, we get to peek into what Google’s peering relationships look like and how their peers traffic engineer towards Google. Analyzing this data set we find many more specific prefixes. Meaning prefixes that are not normally seen in the global Internet routing table (DFZ) and only made visible to Google for traffic engineering requirements. Let’s take a look at an example.

The prefix 114.154.133.0/24 is not normally seen on the Internet, instead it is announced as the larger aggregate 114.144.0.0/12 by AS4713 NTT OCN, the largest service provider in Japan.

During the time of the incident we see over 20,000 new OCN prefixes, all more specifics of their larger aggregate blocks (mainly their /11, /12’s, /13’s, 14’s and /15’s). In this case OCN announced these more specific prefixes primarily to control how traffic comes in from Google. Now that Google leaked these prefixes to Verizon as well, everyone seeing announcements for these prefixes would have sent traffic for this prefix towards Verizon and Google, essentially changing the local traffic engineering trick into a much more global traffic engineering setup.

Verizon customers and peers that would have seen this announcement would have preferred this over any other path since more specifics always win.

Size and impact of this incident

If we look at what networks were impacted the most we can see that AS4713 NTT OCN, the largest service provider in Japan was impacted most severe. Our data shows over 24,000 new more specific prefixes for OCN were visible via Google and Verizon during the time of the incident.

We also saw over 7,000 new more specifics for AS7029 (Windstream). The total list of new (mostly more specifics) is around 50,000. For those interested, the top 30 affected networks can be found below.

All of these leaks were visible between 03:22 UTC and 03:33 UTC, with some peers seeing the leaked paths till about 04:00 UTC. Or in local Japan time

12:22 PM and 1:01 PM.

Number of new prefixes via Google and Verizon ASN ASN name 24834 AS4713 OCN - NTT Communications Corporation 7715 AS7029 Windstream Communications Inc 4650 AS8151 Uninet S.A. de C.V. 2852 AS1659 Taiwan Academic Network (TANet) Information Center 1746 AS3209 Vodafone GmbH 1315 AS2519 ARTERIA Networks Corporation 1218 AS28573 CLARO S.A. 614 AS9394 China TieTong Telecommunications Corporation 560 AS12715 Orange Espagne S.A.U. 506 AS27747 Telecentro S.A. 463 AS16814 NSS S.A. 430 AS12066 TRICOM 428 AS45510 TELCOINABOX PTY LTD 404 AS11830 Instituto Costarricense de Electricidad y Telecom. 369 AS39651 Com Hem AB 357 AS6400 Compañía Dominicana de Teléfonos, C. por A. - CODETEL 316 AS10318 CABLEVISION S.A. 280 AS5615 KPN B.V. 225 AS4181 TDS TELECOM 224 AS43205 Bulsatcom EAD 221 AS17908 Tata Communications 183 AS395105 HYTEC-7779 179 AS45194 Syscon Infoway Pvt. Ltd. 166 AS9676 SaveCom Internation Inc. 164 AS4764 Wideband Networks Pty Ltd, Transit AS 152 AS18106 Viewqwest Pte Ltd 140 AS45069 china tietong Shandong net 131 AS10481 Prima S.A. 128 AS13445 Cisco Webex LLC 126 AS13156 Cabovisao, televisao por cabovisao, sa

Closing thoughts

In total we saw over 135,000 prefixes visible via the Google - Verizon path. Widespread outages, particularly in Japan (OCN) were because of the more specifics, causing many networks to reroute traffic toward verizon and Google which likely would have congested that path or perhaps hit some kind of acl, resulting in the outages. Many BGPmon users would have seen an alert similar like the one below, informing them new prefixes were being originated and visible global.

==================================================================== New prefix for AS14061 (Code: 60) ==================================================================== Detected new prefix: 178.62.96.0/19 Update time: 2017-08-25 03:25 (UTC) Detected by #peers: 18 Announced by: AS14061 (Digital Ocean, Inc.) Upstream AS: AS15169 (Google Inc.) ASpath: 18356 38794 45796 2516 701 15169 14061

Monitoring is one simple thing operators can do to quickly detect this and take action. In this case the recommended course of action would have been to shutdown the peering sessions with Google.