RIPE Atlas

Enter RIPE Atlas: a global, open and distributed network of probes which measure Internet connectivity and reachability. The probes themselves are distributed by RIPE Network Coordination Centre, one of the Regional Internet Registries. It is the largest Internet measurement network ever made!

To add a bit of historical reference, one must not forget that people have used electric telegraph since 1835 to broadcast the weather forecast. The first weather station started collecting weather data even before that, in 1781. In a connected and digital-first 21st century world, we still make use of hundreds of thousands of weather stations worldwide to quickly see the current weather conditions on our smartphones and decide on our clothing for the night out.

Geographical spread of RIPE Atlas probes.

Such connected world needs an Internet equivalent of weather stations, that would constantly monitor the Internet itself. And so, that’s where the RIPE Atlas project fits perfectly: it gives everyone the ability to measure the connectivity of any device connected to the Internet (by solely having a publicly routable IP address) from so many different probes.

In a nutshell, RIPE Atlas gives everyone the ability to use more than 10,000 probes worldwide, currently distributed in almost all of the world’s countries, thanks to hundreds of volunteers that are hosting them.

In a nutshell, RIPE Atlas probes are small Raspberry PI boards with Ethernet plug and a micro USB power charge with a special software inside. The latest model is running on NanoPi NEO Plus2 with 512MB RAM.

RIPE Atlas reached 10k connected probes in 2017.

It’s important to note that RIPE Atlas is a credit-based system: you can get uptime credits for having the probes online, and you can also get them whenever your probe delivers results for someone else’s measurement.

To get started, you can get free credits at RIPE events; you can also send and receive credits from other RIPE Atlas members.

Being volunteer-based makes more than 50% of probes get abandoned over time.

To learn more about RIPE Atlas itself, I suggest viewing an introductory video and browsing the official website.

Content Delivery Networks

A high-level overview of how geographical spread of CDNs points of presence (PoPs) help get content from origins servers closer to the end-user. (image courtesy of Cloudflare)

I will not get into the topic of implementing a CDN, and how they work under the hood. Instead, I will rather point out the important benefits that they provide:

Faster loading time of assets due to decreased latency. Assets (images, Javascript, CSS files and so on) are fetched once from the origin server, and from that point on they’re being served from the server closest to the user. This helps optimize end-user experience, make our customers happy, and boosts SEO rankings because of faster page load time.

of assets due to decreased latency. Assets (images, Javascript, CSS files and so on) are fetched once from the origin server, and from that point on they’re being served from the server closest to the user. This helps optimize end-user experience, make our customers happy, and boosts SEO rankings because of faster page load time. Cutting traffic costs . Typically, by serving your static content from popular cloud storages (i.e. AWS S3 & Google Cloud Storage) to your users you pay for the Internet traffic generated for each download/hit. A CDN helps as a man-in-the-middle: it will fetch the requested content only once from the origin server, store it, and then serve it from cache. This is a lot cheaper for you as your origin will have less outgoing traffic.

. Typically, by serving your static content from popular cloud storages (i.e. AWS S3 & Google Cloud Storage) to your users you pay for the Internet traffic generated for each download/hit. A CDN helps as a man-in-the-middle: it will fetch the requested content only once from the origin server, store it, and then serve it from cache. This is a lot cheaper for you as your origin will have less outgoing traffic. Caching . Using a CDN allows you to specify different dynamic caching policies and increase cache hit rate. If content is served from CDN’s server cache, it does not have to fetch it from your origin server. Cache hit rates of static content requests can often reach 90% and more, which essentially means cutting 90% of traffic costs from your data center.

. Using a CDN allows you to specify different dynamic caching policies and increase cache hit rate. If content is served from CDN’s server cache, it does not have to fetch it from your origin server. Cache hit rates of static content requests can often reach 90% and more, which essentially means cutting 90% of traffic costs from your data center. Ensuring readiness for traffic spikes in case of sudden traffic. CDNs have invested a lot of time and knowledge in developing large infrastructures that scale well, from being featured on Reddit and Hacker News to streaming a live UEFA Champions League finals.

In the case of Property Finder and similar assets-heavy websites, the main objective is to serve all the terabytes of assets we have to our customers in the fastest way possible.

To ensure all of this as well as the best and the most efficient caching policy, in addition to using a reliable and well-spread CDN, we’re doing our best to utilise the correct Cache-Control headers with appropriate expirations for different content types; ignoring query strings to avoid cache busting, using immutable flag etc.

Who is using CDNs?

Everyone!

Nowadays, the majority of the Internet traffic is passing Internet Exchange Points where traffic is exchanged for free, or for a very low fee. There, big content providers (think Netflix, Facebook, Google/Youtube) and ISPs connect and exchange traffic with the lowest latency and the highest throughput possible.

By reading this blog on Medium.com you’ve unknowingly accessed Cloudflare, one of the most popular CDNs. Also, your device has probably established a connection to Fastly’s servers when you played your favorite song on Spotify.

Your favorite blogs and news portals are served using AWS Cloudfront, Google Cloud or other CDNs. You’ve also, like the majority of the Internet population, generated some traffic to private CDNs by accessing Facebook, Instagram, Youtube, Netflix and so on.

You might think that only big companies use CDNs, but you’d be mistaken. Nowadays, it’s almost unimaginable to start a website, an app or an online service without thinking about the best way to serve your traffic. Upfront planning to use a CDN makes a lot of sense.

Imagine a common startupreneur scenario: you have an idea for a startup that you think will grow a lot in terms of users, scale very fast and have a lot of traffic (and sudden spikes).

Should you think about a CDN from day one? Absolutely!

You want to optimize your costs upfront and achieve the best performance at the same time. To achieve this, you can rely on your gut feeling, a friend’s recommendation, a Google search…or, you can utilise scientific, statistical data with real numbers and not just guess.

If you’re still interested, keep on reading. Here’s where the research starts!

Creating measurements

To create your very first measurement using RIPE Atlas, you can use either web UI or a nice JSON API.

Using the RIPE Atlas Web Wizard is really simple; in just a few clicks you can create a measurement with the summary of all the associated costs.

RIPE Atlas Web UI.

The probes available for use are hosted at different places: on a local router at home in residential areas, racks in workplaces and offices, and inside data centers. They can also be connected to mobile 4G connection or via a satellite uplink on a very remote location. As long as there’s an Ethernet connection, the source of the connectivity doesn’t really matter!

The coverage of IPv4 and IPv6 networks in total is pretty much the same: below 10 % of worldwide autonomous system numbers (ASNs).

IPv4 ASNs covered — 3602 (5.627%)

IPv6 ASNs covered — 1446 (8.617%)

However, in a grand scheme of things, all of the major worldwide consumer ISPs and hosting companies have a sufficient number of probes hosted with them, and almost all of the world’s countries are connected — 182 (92.857%).

Research methodology

A comprehensive RIPE Atlas REST API.

As noted above, using web UI has its drawbacks in some scenarios.

Particularly in my case, as I want to analyze all the countries in the world one by one, selecting the probes and then filtering them by different tags, it would be very cumbersome to repeat this process manually for 182 countries.

Luckily, all of this can be done through a very simple REST API.

First of all, we need a key ingredient to conduct this research: plenty of RIPE Atlas credits. Luckily, I’ve had a probe connected for more than five years in which it had collected almost 60 million credits, which was more than enough to conduct this research more than a dozen times. I occasionally made some analysis for private use and also for investigations like this.

Secondly, a list of CDN providers was defined, by analysing the current CDN market and favoring companies with global presence instead of just a regional availability.

Here’s a breakdown of the CDNs chosen for this research (a total of 7):

1) Akamai: a really old player in the market

2) AWS Cloudfront: a global player with almost 200 PoPs across 30 countries. They have regional edge points of presence to which all the other POPs are connecting to as a pre-optimization step to concentrate hits to a regional edge POP.

3) Microsoft Azure: more than 130 PoPs and a very large network with different tiers.

4) Cloudflare: the most popular choice for small to medium websites with a very generous or almost limitless free tier.

5) Google Cloud CDN: use Google’s global network in conjunction with Cloud Storage or with Compute Engine instances.

6) Fastly: a popular CDN for different projects of big scale (Github, Spotify, etc). Available in more than 30 PoPs and planning on expanding to even more.

7) Cachefly: used to be a US-centric CDN, but recently grew to a global player.

Once the CDN providers list was defined, I decided to write a simple script using the Go programming language, due to its simple concurrency primitives. This small script (less than 200 lines) goes through all the ISO2 (ISO 3166) codes of the countries, combines them with all the possible combinations of CDNs that were defined before, and sends a three-packet ping measurement API request to RIPE Atlas’ API.

Cloudflare has its own, very popular, DNS resolver on 1.1.1.1. For Cloudfront and Google Cloud I had to create my own distribution, but all the others were very easy to test with some of the well-known hostnames of the companies publicly using them (FIFA, etc).

Selected target hostnames.

Using the request options we’d select up to 50 probes, and use a couple of tags to filter out all the unavailable or unstable probes that would negatively influence our results set. The probe selection tags I used were system-ipv4-capable, system-ipv4-works, and system-resolves-a-correctly to ensure that DNS resolution works correctly.

Parsing the measurements

Once we received an API response after creating a measurement, we saved the measurement ID to a results database, in the form of a CSV file. This database was used to store all the measurement IDs and their country/CDN key pairs. We’d wait for some time before fetching the results of the measurement as sometimes they can take up to 15 minutes. Also, the API calls had to be periodically paused because of the throttling on RIPE Atlas API side: up to 100 concurrent measurements and up to 1 million credits daily expenditure are allowed.

Some requests failed, as RIPE Atlas is still not distributed in all of the world’s countries; this was expected and such responses were discarded, hence some of the gray areas on the results map.

Here’s a screenshot of a single measurement result from the perspective of a web UI:

Result of a single measurement on Web UI.

We can see all the probes involved, their related ASNs, packet loss in percentage, and a round-trip time from a probe to a target host. (our metric of interest). Of course, consuming these results through an API made more sense, and that’s what we’ll focus on.

In addition to avg field, the response contains the 3 ping RTTs.

All the results were separated into a separate directory for each CDN, and within those directories, a file per country was created.

Results set is available on GitHub repository: https://github.com/emirb/ripe-atlas-cdn-analysis

After collecting all the measurements from RIPE Atlas API and storing them, I ended up with a combination in the following format:

iso2_code,cdn_name,rtt_ms

Overall, the whole research consumed around 50,000 credits.