A lot has happened behind the scenes to get you your result. But let’s focus on Load Balancing, and how and why Maglev was designed differently to your traditional LB.

2. Maglev vs Traditional Load Balancer

Figure 3: Traditional Load Balancer (left) vs Maglev Load Balancer (right) | Retrieved From “Maglev: A Fast and Reliable Software Network Load Balancer” pg. 1

Figure 3, showcases a traditional LB (left) and a Maglev LB (right) that sit between a router and some arbitrary back end services.

2.1 Traditional LB

Incoming requests go through the router to the LB then routed to a backend service. For redundancy, often times a backup LB is provided that is plugged horizontally with the main/primary LB and can be used for redundancy and at times increased throughput. In most cases the use of a traditional LB is sufficient, but when working at Google’s scale, there are several limitations with this approach

Limitations of the Traditional LB

The paper claimed that such a design falls short of Google’s “high” throughput requirements. Scaling is limited to a single machine’s throughput capacity Though often deployed in pairs i.e primary LB and a backup, redundancy is only 1 + 1. R.K Johnston writes an interesting blog post talking about how 1 + 1 redundancy is not good enough. In essence, 1+1 systems are designed to often be replicas of each other, if one goes down, the +1 replica has to handle the load. In systems with 99.99% reliability, there is a chance of one going down quite often (errors, maintenance etc). Over a period of time, the chances of both primary and +1 being out of service increases. R.K Johnston states that this happens more often than you would think. Hardware LBs in particular come with several drawbacks. Although they can be faster:

a. The design is inflexible for modifications and rapid prototyping.

b. Scaling these LBs typically requires purchase of new hardware that also needs to be deployed.

c. Costly to purchase and upgrade.

Maglev on the other hand, finds ways to addresses these limitations.

3. Maglev

Maglev is the name of Google’s Distributed LB that has been in use since 2008. A typical Maglev instance is designed to processes about 813,000 1.5 kilobyte IP packets per second to over 9,060,000 100 byte IP packets per second, over a 10 Gbps line rate. Maglev unlike some traditional LBs is a software-based LB that runs on Linux commodity servers. This is a very important point to remember as most of the benefits of software-based LBs apply.

Benefits of Maglev

Maglev can run on typical Linux servers (that Google already had in abundance), and could easily be built as a distributed LB running on these servers in multiple data centers worldwide increasing availability. With regards to availability and redundancy, Maglev offers N + 1 redundancy (see Figure 3), instead of 1 + 1 as per your traditional LB. Features can quickly be added and deployed to the existing software. According to the paper, services can easily be divided across multiple shards of the LB in order to achieve performance isolation.

Admittedly, the paper notes that the design of Maglev is inherently very complex despite its ability to overcome most of the limitations of traditional LBs.

3.1 Design Goals

In order to achieve maximum reliability and throughput, Maglev uses two key design principles

Connection Persistence: Which means, once a connection from a user is made to a particular endpoint, that connection will continue to be served from that endpoint and not be swapped out to another similar endpoint to ensure quality of service. This is done through connection tracking and consistent packet hashing. Consistent Hashing: In order to process hundreds of thousands of packets of differing sizes and structure, a technique of 5-tuple hashing (source IP, source port, destination IP, destination port and IP protocol number), is used to encapsulate packets, consistently deliver them to the appropriate receiving queue where they can ultimately be matched with appropriate backends.

One area that is important to note, is how packets arrive at the Maglev. Section 1 gave an overview of what this looks like. However one key component in the cluster is the router. Through various techniques, Google’s routers are able to intelligently select appropriate Maglev’s to send incoming packets to. The following section shows how it is done.

4. Router & Maglev Interaction

Figure 4: Maglev-Router Interaction

Within every Maglev, there is a controller and within the controller is a component called the VIP Announcer. The controller observes the health of the forwarder and based on the health, uses the VIP announcer to inform the router of the available VIP backends that can be accessed from this Maglev instance. The controller ultimately, makes the decision on whether to continue to announce the VIPs of the backends in the forwarder. To ensure quality of service, only healthy forwarders in a Maglev can be selected. If a forwarder is deemed unhealthy, the controller informs the router that traffic to cannot be served by that Maglev instance.

For instance, we have a Maglev instance called Maglev Alpha. Maglev Alpha has the following backend services, Google Drive, GMail and Maps (based on some configuration objects). The VIP announcer would then announce these services VIPs to the router. The router would then announce these VIPs to Google’s backbone and this information is then propagated to the internet making this information globally accessible i.e for DNS etc.

Once a request for GMail is received at the router, it would know that Maglev Alpha has these resources healthy and available and would direct the packets to it. If GMail on Maglev Alpha for instance went down for maintenance or other issues, Maglev Alpha’s VIP announcer would inform the router that this Maglev instance cannot handle requests for GMail and therefore the router would ignore it for those particular requests. This is highly simplistic example, however it helps elaborate the role of the VIP announcer and the controller.

5. Packet Forwarding

One element I have intentionally obscured in Figure 4 is the forwarder. The forwarder is responsible for the following:

Handle incoming packets, quickly and reliably House a pool of backends that it constantly monitors

Each VIP is configured to either one or more backend pool. The implementation behind the forwarder is detailed and complex and we shall address it in the next article.

Conclusion

Maglev takes a different approach to load balancing that allows better redundancy and scalability through the use of software instead of hardware. We have seen where it lies with regards to Google’s infrastructure, and how it communicates with other Google networked devices such as routers to ensure, routers can efficiently direct packets to available Maglev instances. This article served to shown an overarching perspective. Part II goes deeper into the implementation of the forwarder and other components that make Maglev work.

I must end by stating, majority of the information was obtained from the Google Research paper “Maglev: A Fast and Reliable Software Network Load Balancer” that is linked below

As always, thanks for reading.

Twitter: @martinomburajr

Links

Google’s Research Paper

Maglev: A Fast and Reliable Software Network Load Balancer — https://research.google.com/pubs/pub44824.html

Previous Articles on Load Balancing