What is Load Balancing?

Load balancing is one of the founding pillars of designing distributed systems. A load balancer simply distributes a set of requested operations (database write requests, cache queries) effectively across a set of servers.

Here’s an example of a client application accessing some server on the internet without Load Balancing. In this case, the client application connects to the web server directly.

There are two main problems with this model for websites servicing a very high number of requests:

Single Point of Failure: If something happens to the Web server, the entire service becomes unavailable for certain period of time. This is unacceptable for majority of online retailers and service providers. Overloaded Servers: The number of requests that your web server can entertain is usually capped. There is only so much RAM and CPU you can add to a single server. So as your business grows, you’ll soon saturate your server’s ability to entertain requests. The only way for you to service the increasing number of requests is to add a few extra servers and have a load balancer distribute the requests to your cluster of servers.

The picture below shows how adding a load balancer in front of your web servers can help alleviate the above two issues. Now you can add any number of web servers behind your load balancer and ensure that even if one of the servers goes offline, your system as a whole is still servicing requests. Moreover, because you can now spread out the requests across multiple servers, your latency on each request will go down because each server is not bottlenecked on RAM/Disk/CPU anymore.

Where are load balancers typically placed?

The next obvious question is what type of workloads can you load balance? That is, where all can you place load balancers to achieve high scalability ?

Load balancers are typically placed between:

The client application/user and the web server The Web Server and the Application/Job servers The Application servers and the Cache Servers The Cache Servers the Database Servers

Note that introducing load balancers at each of these 4 software layers may or may not be necessary depending on your system requirements. Load balancing at each layer increases availability, performance and fault tolerance, but it also introduces more complexity in the system. More complexity usually translates to more cost and maintenance overhead in the long run. There is no free lunch -something to always keep in mind while designing any distributed system.

What are the different types of load balancers?

Load balancing can be achieved in three ways:

By using Software Load balancers in clients which requests data from a list of servers By using Software Load balancers in services layer By using Hardware Load balancers in services layer

Software Load Balancers in Clients

This is probably the cheapest way to implement load balancing . In this case, all logic for your load balancing resides on the client application. On startup, the client application (Eg. A mobile phone app) is provided with a list of web servers / application servers it can communicate with. The client app picks the first one in the list and requests data from the server. If a failure is detected persistently (after a configurable number of retries), it marks the first server as unavailable and picks another server from the list to request data from.

Software Load Balancers in Services

Software load balancers are pieces of software that receives a set of requests and routes them according to a set of rules. Unlike hardware load balancers, software load balancers do not require any specific

Type of hardware – they can be installed on any windows or linux machines. One has the option of either using an off-the-shelf software load balancer like HA Proxy or writing their own custom software for

Load balancing specific workload types. For example, when designing the Authentication platform for Microsoft Office365, we wrote a custom load balancer to load balance Active Directory Queries.

Hardware Load Balancers

Hardware Load balancer device (HLD) is a physical device used to distribute web traffic across a cluster of network servers. HLDs basically present a virtual server address to the outside world and when client applications attempt to connect, it would forward the connection on the most appropriate real server doing bi-directional network address translation (NAT). HLDs, also known as Layer 4-7 Routers are typically able to load balance not only HTTP and HTTPS traffic, but also TCP and UDP traffics. For example, TCP traffic to a database cluster can be spread across all servers by a HLD.

The load balancer could control exactly which server received which connection and employed “health monitors” of increasing complexity to ensure that the application server (a real, physical server) was responding as needed; if not, it would automatically stop sending traffic to that server until it produced the desired response (indicating that the server was functioning properly).

HLDs, while remarkably flexible in terms of the type and scale of load balancing they perform, are expensive to acquire and configure. Because of this reason, most online service providers use HLDs at the first entry point of user requests into their infrastructure and then use internal software load balancers to route data behind their infrastructure wall.

For example, SharePoint online (back in 2012) had one F5 Big-IP Hardware Load Balancer in front of our web servers and used software load balancing in it’s application servers for load balancing across active directory instances and databases.

What are the benefits of using load balancing?

Using a Load Balancer as the gateway to your internal cluster of web servers has the following benefits:Facilitate Zero-downtime rolling updates to web servers: This is done by effectively taking a web server (due for maintainance) out of the load balancer pool, waiting for all the active connections to “drain i.e. service requeests in progress” and then safely shutting down the server. This way, no client requests in flight are dropped and you can perform patching/maiantenance on the web servers without affecting you high availability SLA.Facilitate immediate increase in capacity: Adding more web servers to DNS for load balacing purposes takes time to propagate. DNS is basically an Eventually Consistent system. However, with Load balancers (hardware or software), as soon as you add a new server, it can start servicing the client requests immediately. Thus, you can increase your capacity at the flick of a switch (well almost 🙂 ).Enhance Fault Tolerance: Load balancers enable a fault web server instance to be immediately taken out of rotation by removing it from the load balancer pool. This is much better than having to remove the server from DNS which takes time and during that window, the DNS will still be sending traffic to the faulty web server which will fail the client requests. Reduce load on web servers through SSL termination: SSL offloading ( a.k.a SSL termination) is a load balancer feature that allows you to handle all SSL encryption/ decryption work on the load balancer and use un-encrypted connections internally between the load balancer and web servers. This removes a significant load of the web servers who no longer have to absorb the overhead of traffic encryption/decryption. It’s also possible to provide SSL acceleration using specialized hardware installed on the load balancer. Please check out https://kemptechnologies.com/solutions/ssl-acceleration-solutions/Facilitate Just In Time Load Balancing: If your web servers are hosted in the cloud via AWS or Azure, you can add new workloads (web servers and front ends) depending on the load your system is experiencing. If you use the elastic load balancer (ELB) in AWS or the cloud load balancer in Azure, the scaling can happen automatically and just in time to accomodate your increasing/decreasing traffic. This automatic load balancing has three benefits – no downtime and low latency for your customers, no IT maintenance for the load balancer since it’s hosted in AWS or Azure cloud and cost savings because the system scales down automatically when traffic reduces.

What are some of the Load Balancing Algorithms?

Whether you’re using a software or hardware load balancer, it needs to decide which backend server to forward the request to. Different systems might require different ways of selecting servers from the load balancer — hence the need for different load balancing algorithms. Some of the common load balancing algorithms are given below:

Round Robin: Requests are distributed across the group of servers sequentially.

Requests are distributed across the group of servers sequentially. Weighted Round Robin: same as round robin but some servers get a bigger share of the overall workload based on some criteria.

same as round robin but some servers get a bigger share of the overall workload based on some criteria. Least Connections: A new request is sent to the server with the fewest current connections to clients. The relative computing capacity of each server is factored into determining which one has the least connections. If a webnode fails and is taken out of service the distribution changes. As long as all servers are running a given client IP address will always go to the same web server.

A new request is sent to the server with the fewest current connections to clients. The relative computing capacity of each server is factored into determining which one has the least connections. If a webnode fails and is taken out of service the distribution changes. As long as all servers are running a given client IP address will always go to the same web server. Fastest Response : The load balancer regularly pings the servers and mainatains a map of servers with least response times. Traffic is routed to the servers in clusters with least response times.

: The load balancer regularly pings the servers and mainatains a map of servers with least response times. Traffic is routed to the servers in clusters with least response times. IP Hash: The IP address of the client is used to determine which server receives the request.

URL hash: This is like source IP hash, except hashing is done on the URL of the request. Useful when load balancing in front of proxy caches, as requests for a given object will always go to just one backend cache. This avoids cache duplication, having the same object stored in several / all caches, and increases effective capacity of the backend caches.

This is like source IP hash, except hashing is done on the URL of the request. Useful when load balancing in front of proxy caches, as requests for a given object will always go to just one backend cache. This avoids cache duplication, having the same object stored in several / all caches, and increases effective capacity of the backend caches. Consistent Hashing: Look at the Consistent Hashing post for a detailed explanation. Also, here’s a research paper from Google explaining how Vimeo solved their load balancing problem using a variant of this technique.

How to use load balancing during system design interviews?

In summary, you’ll almost ALWAYS be asked some sort of scalability question in system design interviews for which you’ll need to use a load balancer. The key things to remember from this article are:

Load balancing enables elastic scalability and redundancy (you can have many copies of the same data). Elastic scalability improves performance and throughput of data. Redundancy improves availability and also helps in backup/restore of service in case a few servers fail.

Load balancers can be placed at any software layer – refer to the section above for details.