Measuring Internet Reliability

Research methodology



To model such a scenario, we have applied the same model for the third year in a row, but we did not merely repeat previous calculations; third year we are expanding the research significantly. The following steps were taken to rate AS reliability:

For every AS in the world, we examined all alternate paths to Tier-1 operators with the help of an AS relationship model, the core of Qrator.Radar;

Using the IPIP geodatabase, we matched countries to every IP address of every AS.

For every AS we calculated the share of its address space that corresponds to the relevant region. ISP’s were filtered out that reside at an internet exchange point in a region where they do not have a significant presence. A good example is found in Hong Kong, where traffic is exchanged among hundreds of members of HKIX, the biggest Asian Internet Exchange, that have zero presence in the local internet segment;

After isolating regional ASs, we analyzed the potential impact of each one’s failure on other AS’s as well as their specific countries;

In the end, for each country, we identified the AS that affects the largest portion of other AS’s in their region. Foreign AS’s were not considered. Why model such a situation? Strictly speaking, when the BGP and the world of interdomain routing were in the design stage, the creators assumed that every non-transit AS would have at least two upstream providers to guarantee fault tolerance in case one goes down. However, the reality is different; over 45% of ISP’s have only one connection to an upstream transit provider. A range of unconventional relationships among transit ISPs further reduces reliability. So, have transit ISPs ever failed? The answer is yes, and it happens with some frequency. The more appropriate question is — under what conditions would a particular ISP experience service degradation? If such problems seem unlikely, it may be worth considering Murphy’s Law: “Anything that can go wrong, will.”To model such a scenario, we have applied the same model for the third year in a row, but we did not merely repeat previous calculations; third year we are expanding the research significantly. The following steps were taken to rate AS reliability:

IPv4 Reliability

United States dropped 11 positions, from 7 to 18;

Bangladesh is out of the top 20;

Ukraine moved up 8 positions to #4;

Austria is out of top 20;

Two countries return to the top 20: Italy and Luxembourg after dropping in 2017 and 2018, respectively.

IPv6 Reliability

Broadband Internet and PTR records

Details by Region

“The most significant change we see in Ukraine is the fall in transit prices. This allows most of the profitable internet businesses to acquire several upstream connections beyond our borders. Hurricane Electric is especially active on the market, offering “international transit” without a direct contract, because they don’t remove prefixes from the exchanges – they merely announce the customer cone at local exchanges.”

«However, the big news involving Cogent comes from the United States. For two years — 2016 and 2017 — we identified Cogent's AS174 as the crucial one for that market. This is no longer the case — in 2018, the CenturyLink AS209 replaced Сogent, and the change sent the United States up the list by three places, to 7th.»

“This change is natural as the global Internet continues to grow. IT infrastructure within each country grows and is upgraded to support the information economy which is ever evolving and changing. Performance drives customer experience and revenue. Locality of IT infrastructure drives performance. These are macrotechnoeconomic forces.”

This report explains how the outage of a single AS can affect the connectivity of the impacted region with the rest of the world, especially when it is the dominant ISP on the market. Internet connectivity at the network level is driven by interaction between autonomous systems (AS’s). As the number of alternate routes between AS’s increases, so goes the fault-resistance and stability of the internet across the network. Although some paths inevitably become more important than others, establishing as many alternate routes as possible is the only viable way to ensure an adequately robust system.The global connectivity of any AS, regardless of whether it is a minor provider or an international giant, depends on the quantity and quality of its paths to Tier-1 ISPs. Usually, Tier-1 implies an international company offering global IP transit service over connections to other Tier-1 providers. But there is no guarantee that such connectivity will be maintained. Only the market can motivate them to peer with other Tier-1’s to deliver the highest quality service. Is that enough? We explore this question in the IPv6 section below. For many ISPs at all levels, losing connection to just one Tier-1 peer would likely render them unreachable in some parts of the world.Let’s examine a case where an AS experiences significant network degradation.Each year there are interesting movements in the reliability ranking. Last year we wrote that the overall performance of the top 20 countries had not changed from 2017. Nevertheless, this year we highlight the positive global trend towards improved reliability and overall availability. To illustrate this, we compare the 4-year average and median changes in IPv4 reliability ratings of all 233 countries.The number of countries that successfully limited outage to under 10% (indicating high fault tolerance) increased by 5 from last year, reaching a total of 35.Thus, we identify the most significant trend observed for the period of our research to be the significant improvements in reliability all over the world, in both IPv4 and IPv6.We have been repeating for years that the mistaken assumption that IPv6 works the same way as IPv4 is the main structural problem of the IPv6 development process.The problems with the peering wars that we outlined last year, wherein Cogent and Hurricane Electric do not peer with each other, persist not only in IPv6, but in IPv4 too. This year, we were surprised that another pair of v6 rivals — Deutsche Telekom and Verizon US successfully established an Ipv6 peering in May. You probably didn’t see it reported in the news, but this particular move is huge — two big Tier-1 ISPs stopped quarreling and finally established a peering connection in the protocol we all want to see developed much more actively than it has been so far.Paths for all Tier-1 providers to maintain full connectivity must be established. We also calculated the number of AS’s in each country that have only partial connectivity due to these peering wars. Here are the results:IPv4, after a year, remains significantly more reliable than IPv6. Average IPv4 reliability and stability numbers in 2019 are 62.924% for IPv4 and 54.53% for IPv6. There are still countries with poor availability (high partial connectivity numbers) in IPv6.Compared to last year we saw significant improvement in all three countries, especially in relation to partial connectivity. Venezuela was at 33%, China at 65% and UAE at 25% last year. While Venezuela and China significantly improved their connectivity by resolving severe issues of partially interconnected networks, UAE remains pretty much the same without any improvement.Based on the question we asked last year — “Does a country’s leading ISP always influence regional reliability more than everyone else?” — we have developed an additional metric to further investigate the subject. Perhaps, the most significant (by user base or customer base) ISP in a region is not necessarily the most critical for network connectivity overall.Last year we determined that the most accurate indicator of broadband ISPs reliability is based on analyzing PTR records. Generally, PTR records are used for Reverse DNS lookup: using the IP-address to identify the associated hostname or domain name.This means that PTR’s could enable measurement of the specific equipment within an individual provider’s IP-address space. Since we already know the largest AS’s for every country in the world, we could count the PTR records within the network of those providers and determine their share of overall PTR records in the region. We should add a disclaimer here: We counted ONLY PTR records, and did not calculate the ratio of IP-addresses without PTR records to IP-addresses with PTR records. So, we are speaking strictly of IP-addresses with present PTR records. The practice of adding PTR records is not fully implemented; some providers do this, but some don’t.We want to show exactly how many PTR-enabled IP addresses would go offline with an outage of each country’s ASN and the percentage that represents for the relevant region.Let’s compare the 20 most reliable countries of the 2019 IPv4 rating to the PTR-enabled rating:Clearly, such an approach that considers PTR-records yields very different results. In most cases not only does the primary regional AS change, but the percentage is entirely different. In all of the generally reliable (from the global availability point of view) regions, the number of PTR-enabled IP-addresses that shut down following an outage of one autonomous system is dozens of times higher. This could mean that the leading national ISP always handles end-users at one point or another. Thus, we should assume that this percentage represents the part of the ISP’s user base and customer base that would go offline (if switching to a second internet service provider were not possible) in the event of an outage. From this perspective, countries appear to be less reliable than they look from the transit point of view. We leave possible conclusions from this PTR-enabled rating to the reader.As always, we start with the very special position of Cogent’s AS174. Last year we outlined the influence of Cogent across Europe, where we identified AS174 as the core AS for 5 of the top 20 countries in the IPv4 rating. This year Cogent remains among the top 20 for the reliability, and we see only a couple of changes in the past 12 months. Most notably, in Belgium and Spain AS174 was replaced as the most critical AS. The primary ASN for Belgium in 2019 is Telenet’s AS6848, and for Spain, it is Vodafone’s AS12430.Now, let’s look more closely at the two countries among the most reliable with the biggest moves: Ukraine and the United States of America.Firstly, Ukraine dramatically improved its position. We reached out to Max Tulyev, board member of the Ukraine Internet Association for details on what happened in his country over the past 12 months:The primary AS for Ukraine changed from Telia’s AS1299 to UARNET’s AS3255. Mr. Tulyev explained that as a former educational network it is highly active for transit, especially in Western Ukraine.Now, across the globe to the United States.Our main question is pretty simple – what are the details behind the United States’ 11-position decline in the reliability rating?In 2018, the U.S. was ranked 7th with 4.02% of the country potentially losing global availability in the event of an AS209 outage. Our 2018 report gives some perspective on what changed in the United States a year ago:The 2019 results show the United States ranked 18th, with its reliability score dropping to 6.83% — a change of more than 2.5%, which would usually be enough to fall from the top 20.We contacted the Founder of Hurricane Electric, Mike Leber, for his comment on the situation:It is always interesting to dissect what could happen in the biggest economy in the world, even more so when we witness such an enormous fall in reliability rank. Just to remind our readers — last year we highlighted the replacement of Cogent’s AS174 by CenturyLink’s AS209 in the United States. This year, CenturyLink surrendered its position as the main AS of the country to Level3’s AS3356 — no surprise as the two companies now represent a single entity since the acquisition of 2017 and now CenturyLink’s connectivity entirely tied to the Level3’s. One might conclude that the overall drop in reliability is connected with the Level3/CenturyLink incident at the end of 2018 when 4 packets disrupted internet service in the world’s biggest economy for several hours. That event definitely affected CenturyLink’s/Level3’s ability to provide transit to the biggest players in the country, some of whom switched to other transit providers, or just diversified their uplinks. Nevertheless, in 2019 Level3 is the most vital provider in the U.S., the outage of which could disable global availability of nearly 7% of all the local autonomous systems that rely on their transit.Italy returned to the top 20 in 17th place with the same Fastweb AS12874, which probably results from significant improvements in the number of paths to this particular ISP. It was the same ISP in 2017 that dropped Italy to 21st place.In 2019, Singapore, which ranks in the top 20 reliability rating with only a slight change, saw a new primary ASN once again. Last year, we tried to explain the changes in the Southeast Asia regions as best we could. In 2019, the main Singaporean ASN changed from last year’s SingNet AS3758 to Starnet’s AS4657. With this change, the Singapore region lost 1 position, from 5th place last year to 6th place in 2019.China remarkably jumped from 113th place in 2018 to 78th this year with a change of about 5% within our methodology. In IPv6, China’s «partial connectivity» dropped from 65.93% last year to slightly above 20% this year. The main ASN in IPv6 changed from AS9808 belonging to China Mobile last year to AS4134 this year. In IPv4 the AS4134 has been the primary critical ASN for years.Unfortunately, China’s segment dropped 20 places in the IPv6 reliability rating — from a score of 10% reliability last year to 23.5% this year. Probably, all this means one simple thing — China Telecom is actively improving its network, staying the backbone of China’s connection to the outer Internet.With growing cybersecurity risks and continuous news of attacks on internet infrastructure, it is time for governments, private and public companies, as well as ordinary users to carefully reconsider their positions. Regional risks need to be studied carefully and honestly by analyzing real risk and reliability levels. The failover rate from a massive attack on a big, nationwide mission-critical service provider, like the DNS service, could actually cause real trouble with availability. Do not forget that the outside world would also be blocked from the services and data located within the troubled region if access were totally lost.Our report clearly shows that regional ISP markets which are subject to open market competition ultimately develop to become significantly more stable and failure-resistant in regards to internal and external risks. Without a competitive market, any AS failure could lead to network loss for a large portion of users from a country or even a broader region.