To introduce this article, I want to refer the famous scale cube model from the book: The Art of Scalability: Scalable Web Architecture, Processes, and Organizations for the Modern Enterprise from Martin L. Abbott and Michael T.

Scale cube model

In this three-dimensional model of scalability, the origin (0,0,0) represents the least scalable system. It assumes that the system is a monolith deployed on a single server instance. As shown, a system can be scaled by putting the right amount of effort in three dimensions.

x-axis: Cloning

Cloning y-axis: Decomposing by service/functionality

Decomposing by service/functionality z-axis: Splitting by data partition

The most intuitive evolution of a monolithic, unscaled application is moving right along the x–axis, which is straightforward, most of the time inexpensive (in terms of development cost), and highly effective. The principle behind this technique is trivial, that is, cloning the same application n times and letting each instance handle 1/nth of the workload.

Scaling along the y-axis means decomposing the application based on its functionalities,services, or use cases. In most cases it means moving from a monolithic application to decompose into smaller services.

In z-axis scaling the application is split in such a way that each instance is responsible for only a portion of the whole data. This is a technique mainly used in databases and also takes the name of horizontal partitioning or sharding.

Considering its complexity, scaling an application along the z-axis should only be in mind once the other two are fully exploited and our application has a size that really deserves investing on this complex scaling type.

The y-axis scaling is a topic with a huge amount of information. The protagonist will be the first one: x-axis.

If you have worked in any cloud application it’s probably using already any scaling mechanism to clone instances through and auto-scaling policy, for example.

This is fine and works effectively. But, it has a cost. Any cloud provider will charge you for things like:

Using a load balancer

Add more machines

Scale machines vertically

The last one is a mechanism less common but may happen.

While that stuff is fine and necessary; What can we do to reduce costs and add flexibility from code side? Creating a cluster. 🚀

To perform it we need to use the cluster module from node. The cluster module simplifies the forking of new instances of the same application and automatically distributes incoming connections across them, as shown int he following figure:

Cluster module

The Master process receives signals, emit logs and is responsible for spawning a number of processes (workers), each representing an instance of the application we want to scale. All of them use a common store. Each incoming connection is then distributed through the cloned workers, spreading the load across them.

Scaling an application also brings other advantages, in particular the ability to maintain a certain level of service even in the presence of malfunctions or crashes. This property is also known as resiliency and it contributes towards the availability of a system, key to maintain an SLA.

There two common scenarios where our SLA can be compromised:

Error happened

Code needs to be updated

In both cases the app needs to be restarted, so there is an small window where our application is unavailable. 💥 With the cluster module, this is pretty easy task; the pattern consists of restarting the workers one at a time. This way, the remaining workers can continue to operate and maintain the services of the application available. Let’s implement this! 💻

The app.js is pretty straightforward like the following image shows:

app.js

It responds to any request by sending back a message containing its PID; this will be useful to identify which instance of the application is handling the request. Also, to simulate some actual CPU work, we perform an empty loop 10 million times; without this, the server load would be almost nothing considering the small scale of the tests we are going to run for this example.

Let’s now try to scale our application using the cluster module. I am dividing in small parts to ease the explanation.

Clustered app part 1

Firstly after requiring modules we create an if statement only for the master thread. Inside, we obtains the amount of CPUs, and then, we create one worker by CPU using the fork function. Behind, fork uses child_process.fork() to achieve that splitting.

We create a listener for exit event. This part is important because when an error happens, the instance dies, but we create another one using fork again.

Clustered app part 2

We need to manage the case when the process is killed. We are using SIGUSR2 that is a UNIX signal emitted from the user (sorry windows users).

Once the application receives that signal it recursively restarts each worker. You may notice that in any time there will be more than one worker restarting in parallel so we are ensuring availability. The restartWorker function disconnects the worker and creates a listener for the exit event. This time the handler is going to be a bit different. exitedAfterDisconnect is true if the worker exited due to .kill() or .disconnect() .

Once the old worker exists, we create a new worker and a listener once it’s listening that will allow us knowing when we can restart the next worker.

Clustered app part 3

Last but not least; do you remember the if statement at the beginning of the file? The workers that are NOT the master thread will directly enter and execute our app.

Now that we already have the implementation, let’s execute it.

First, we should run the application with the command node clusteredApp

We need to find the master process PID with ps af

The master process should be the parent of a set of node processes. Once we have the PID, we can send the signal to it: kill -SIGUSR2 <PID>

Now the output of the clusteredApp application should display something like this:

Clustered app output

We can try to use Siege to verify that we don’t have any considerable impact on the availability of our application during the restart of the workers:

siege -c200 -t10S http://localhost:8080

The preceding command will load the server with 200 concurrent connections for 10 seconds. As a reference, the result for a system with 4 processors is in the order of 90 transactions per second, with an average CPU utilization of only 20%

The process takes around 10 seconds, try to kill master process as shown above to test the system the most possible.

Siege test

In the left console we see the output of app.js handling requests. The right terminal shows the final output of Siege test (10 process kills) with (hey! 😎) 100% of availability.

The used code is in a GitHub repository.

PM2 is a small utility, based on cluster, which offers load balancing, process monitoring, zero-downtime restarts, and other goodies. This article is for learning purposes but PM2 should help if you want to achieve something like this.

I hope you enjoyed this article distilling some of Node’s magic 🍄