Build a custom AWS CloudWatch metrics to monitor your Express HTTP connections Maxime Hilaire Follow Mar 19 · 6 min read

Photo by Chris Liverani on Unsplash

Monitoring is key for troubleshooting, here is how I built a custom AWS Cloudwatch metrics to monitor and tune HTTP connection in nodeJS and express.

Everything started with a 502 “Bad Gateway”

I was initially troubleshooting a crash on an ECS (EC2 type) nodeJS task crashing periodically. The stack is a classic monolithic MERN (Mongo Express React Node).

The current setup was allowing 256MB ram on a t2.small type for the node app (which to me should be enough for a basic web app, without much processing needed).

Usual troubleshooting practices led me to look at the ALB metrics. And indeed I was receiving some spikes up to 900 requests/min over 15mn span according to Cloudwatch. The source of these calls was a bulk sync process every week via REST API which I didn’t expect to dump so many requests at once…

Looking at the ECS log I quickly found that docker killed the container because it exceeded 85% mem hard limit.

A spike in HTTP request impact the required memory in express, how could I have tuned and scaled for that?

This is true for any system: more requests means more resources. But sometimes we want to limit request flow to limit high resource usage and cost.

As an experienced J2EE developer, I know which metrics to look at in tomcat, but what about nodeJS/express?

What are the max HTTP connection limit and pools? When should I scale up? Should I vertically scale?

The ALB connections give metrics for the whole cluster and per target group. But what about per container?

How HTTP connections tuning works in Express

I did some googling, few relates about Express HTTP tuning. It’s like if most dev goes all-in with node/Express HTTP server without having to care for default connections limit… From someone coming from a Java and Tomcat world that seemed too nice to be true. I needed to understand the truth.

And indeed, surprisingly some articles mention that nodeJS is elastic: Means by default there is no max limit, in case of an HTTP request spike, it will just open as many connections as it can.

According to these references:

https://www.quora.com/How-many-connections-can-a-node-js-server-handle

http://blog.caustik.com/2012/08/19/node-js-w1m-concurrent-connections/

https://blog.fullstacktraining.com/concurrent-http-connections-in-node-js/

By default, the 2 factors used as a max request limit on a node app are based on:

The max number of open file descriptor

The memory available on your container

If you reach the open file descriptor Express will simply reject new connections, so make sure to increase your maximum open file descriptor on your system. The default is usually 1024. Check the value with ulimit -a

This limit was already increased in my case. If not you can increase with

ulimit -n 99999

But if you reach the container max memory limit, docker will kill your app once you reach a given memory threshold. If your app is not in a cluster and doesn’t auto-scale or doesn’t have an auto-restart you may want to set a lower limit like this:

server.maxConnections = 20;

Express will reject new incoming connections and you will get this error from curl:

Recv failure: Connection reset by peer

This is a quick workaround that still doesn’t support our use case. In a spike, the app will reject connections and requests will be lost. With a connection pool like in tomcat, we could delay the request but unfortunately, I didn’t find any with express. But at least the app won’t crash.

Lowering the HTTP request rate from the caller could have been an option, but I don’t own that third-party service. In the end, I chose to change the architecture to send the request from my end.

On a production system, it’s important to know the limit of your app to adjust system requirements accordingly before go-live and allow autoscaling based on some key metrics you have defined.

To define that key metric, you need to learn how to monitor your app

As we said early by default the app will use either as much as memory as available and open one file descriptor per HTTP socket. Therefore, usual graphs to monitor per container are memory, total open file descriptor, CPU (if your CPU is 100% busy all the time processing request will take more time and it will hang more connections) and finally HTTP connections.

I assume you already know how to display the first 3 metrics on your favorite monitoring system. But what about HTTP connections?

Tomcat provides JMX monitoring built-in that conveniently allows you to track connections requests, threads, and pools. But nothing similar exists in nodeJS express HTTP server.

A quick option would be to rely on your reverse proxy metrics (Nginx, ALB or apache).

But, if you have more than one node behind the reverse proxy it will give you the total request for your whole cluster. What if you want to monitor for each node in your cluster?

Build your custom Cloudwatch metrics

Here comes the purpose of my article: Create your custom CloudWatch HTTP metrics.

Get the connections total count from Express

The only way to get opened HTTP connections at a given time with Express is by creating a server object from the builtin node http module

const http = require('http');

const server = http.createServer(app);

Then calling this callback function:

server.getConnections((error, count) => {

console.log(count);

});

And make sure to update your code to use server.listen(... instead of your express app.listen(...

Note: This function only gives a connection count at a given time, not an average nor a total count from startup.

Now, to be able to monitor this metric you need to push the value every X minute/second to a monitoring system.

Pushing metrics to Cloudwatch

As the stack is hosted in AWS, I’ll be showing how to push this value into CloudWatch.

You will need to install the node-schedule package to define a new cron job and aws-sdk to push metrics to CloudWatch:

// Push count every seconds

schedule.scheduleJob('* * * * * *', () => {

return server.getConnections((error, count) => {

if (error) {

console.error('Error while trying to get server connections', error);

return;

}



console.log(`Current opened connections count: ${count}`);



const params = {

MetricData: [

{

MetricName: 'HTTPConnections',

Dimensions: [

{

Name: 'PerNodeId',

Value: '<yourNodeAppID>'

// Set here any dynamic and unique ID

// than can identify easily your running

// node app, like its container ID

},

],

Unit: 'Count',

Value: count

},

],

Namespace: 'App/NodeJS'

};

//Make sure to set the IAM policy to allow pushing metrics

cloudwatch.putMetricData(params, (err) => {

if (err) {

console.error('Error while trying to push http connections metrics', err);

}

});



});

});

2. Add an IAM policy to grant your app to push metrics (in my case on the ECS role):

{

"Version": "2012-10-17",

"Statement": [

{

"Condition": {

"Bool": {

"aws:SecureTransport": "true"

}

},

"Action": [

"cloudwatch:PutMetricData"

],

"Resource": "*",

"Effect": "Allow"

}

]

}

That’s it!

3. Start the app node index.js and keep it running.

Refresh your app page in your web browser or run a curl in a loop to generate traffic.

for ((i=1;i<=100;i++)); do curl -v “localhost:3000”; done

You should see this in the log:

Server listening on port 3000!

Current opened connections count: 0

Current opened connections count: 1

Note: If you have very few requests all closed within a few ms from the beginning fo the second, it’s likely that the count stays at 0 as the cron displays the value at the beginning of the second.

Browse to your CloudWatch “Metric” tab.

In “All metrics”, browse to “App/NodeJS>HTTPConnections>PerNodeId” and refresh a couple of time your CloudWatch metrics you should see this graph now:

If you have enabled the keep-alive header it’s likely that your browser/reverse proxy will reuse connections for performance purposes, so your total HTTP request sent could be different than the number of opened connections in Express.

Note: I’m not sure why, but custom metrics in Cloudwatch only shows data points every 5 minutes even though I push every second. I’m still looking into that :)

From there you can trigger Cloudwatch alarms and even event (auto-scaling etc.) with finer grain tuning than using the memory metric.

You will find a working project sample of this article in this GitHub repo

For the next step, I recommend you to start stressing your app to visualize data in this graph. My best tool for that: Gatling!

Happy monitoring!