Sometime ago, In a technical interview, I faced a somehow network balancing problem which I could afford the problem and decided to post it on Medium as my First post.

Before explaining, I should note that there are several ways to handle this problem and actually choosing the right solution is related to various factors.

Problem

Write an application which listens on a specific IP address/port and captures source and destination IP addresses of every TCP packet sent to it. Information (src/dst IPs) plus the timestamp of capture event should be stored on Redis database(s). The application should be written in a way that ONLY keeps mentioned information for the specific amount of time (e.g. for last 5 minutes). We may have more than one Redis database, in that case, the storing process should be balanced among available databases; For example, suppose you have two Redis databases A and B with balance parameter of 30 and 70 respectively, in such case, 30 percent of requests must be stored in A and the other 70 percent of requests must be stored in B. Note: Application should be able to change the balance parameter of Redis instances real time.

Solution

In general, What we’re facing here is collecting information of each packet in Transport layer of TCP/IP model and considering an expiration value for each key-value in Redis. But the interesting and important part is balancing Redis databases.

Why HAProxy?

Nowadays, Load balancing is one of the Hot topics of Software Engineering and DevOps community. HAProxy, which stands for High Availability Proxy, is a popular open source software TCP/HTTP Load Balancer and proxying solution which can be run on Linux, Solaris, and FreeBSD. Its most common use is to improve the performance and reliability of a server environment by distributing the workload across multiple servers (e.g. web, application, database). It is used in many high-profile environments, including GitHub, Imgur, Instagram, and Twitter.

Among of HAProxy config’s options, there’s a cute option to set a weight for each server which by this option we’ll be able to make a proportional for the Redis instances. Weight has a valid range of 1 through 256 for each server, but you don't have to use 256 as a basis for the calculation.

The weight of each server is the ratio of that server’s declared weight to the sum of all declared weights, so with 2 servers we can just use the values 70 and 30 and the distribution will be what we‘d expect:

Server1: 70 ÷ (30 + 70) = 0.7

Server2: 30 ÷ (30 + 70) = 0.3

The server that “weighs more” receives proportionally more requests. We can also use 3 and 7 or 33 and 77 or combination within the 1–256 range. Keeping our configuration so all of the weights add to a total sum of 100 is a more human-friendly solution.

In the following example, I defined two Redis server which has 77 and 33 weight, respectively. Also, there’s an important point that we defined Round Robin as balance type which according to this algorithm we can choose the server sequentially in the list.