We recently experienced two major memory issues that caused our node.js server to crash at critical moments. We quickly learned how to buy time so that our end users wouldn’t know any better and how to efficiently get to the bottom of any memory issues. Memory issues can be daunting when you first encounter them — they were for our team. The goal of this article is to make them no more of a problem than your average bug.

Types of memory problems

memory leak

In computer science, a memory leak is a type of resource leak that occurs when a computer program incorrectly manages memory allocations in such a way that memory which is no longer needed is not released.

In lower level languages like C, a memory leak is often caused by allocating memory → buffer = malloc(num_items*sizeof(double)); but failing to free the memory when its no longer needed → free(buffer);

In garbage collected languages like JavaScript, a memory leak occurs when items that are no longer needed can still be accessed via the executing program — or root object. Any object that can be accessed via the program doesn’t get collected by the JavaScript engine and will not be released from the heap. If the heap grows too large, you can run out of memory.

memory bloat

A memory bloat is when a program uses much more memory than it needs to complete its job. Perhaps you’re keeping large objects from being garbage collected by keeping them around much longer than needed. Or perhaps you’re even keeping a larger than necessary object in memory that your program doesn’t even need at all (which happened to cause one of the two major issues we uncovered)

Detecting memory problems

Our memory issues came with obvious warning signs — mainly, this grim message from our production logs:

FATAL ERROR: CALL_AND_RETRY_LAST Allocation failed - JavaScript heap out of memory

Signs of a memory leak include degraded performance as time passes. If your server continues to do the exact same process over and over, initially is fast, and gradually gets slower before crashing, its likely a memory leak.

Signs of a memory bloat include generally slow performance. However, a memory bloat without a memory leak will not get worse over time.

Short term solution

You often don’t have time to understand the problem before making a move — we definitely didn’t. Luckily, there are ways to increase the memory allocated to the node process. The V8 engine has a default memory limit of ~1.5GB on 64 bit machines. Even if you’re running your node process on a machine with much more RAM, it won’t matter if you don’t increase this limit. You can pass a run-time flag to your node process to increase this limit:

node --max_old_space_size=$SIZE server.js

$SIZE is in megabytes and can theoretically be any number that your machine can handle. We increased ours to 8000 which bought us enough time based on rigorously testing different loads on the server. We also upgraded our dyno RAM since it was easy enough to do via Heroku.

Side note: Another action we took was to quickly set up a Twilio service to alert us by text whenever a memory intensive request came to our server. This allowed us to shepherd the request through and restart the server to free up memory afterwards. This is obviously not ideal — but it was necessary to ensure our users never experienced a critical failure, even if it meant being on call 24/7 until the problem was solved.

Debugging the problem

Now that you bought yourself some time, you can take a deep breath and actually investigate the problem. This is the daunting part — but it really shouldn’t be. There are plenty of tools and resources out there to help get to the bottom of the problem. If you’ve got the time, I encourage you to read these articles from Chrome DevTools.

take some heap snapshots

The problem lies in the heap. Its too big. You can actually take some snapshots over time and dive into the snapshots using Chrome DevTools to see why its so big. Its important that your snapshots are taken over time so you can dig into the objects that have remained from one snapshot to another. These objects are likely the culprit of a memory leak. There are many ways to record some snapshots of the heap.

Using heapdump to take a heap snapshot

We used heapdump for our first go at it, and it proved to be very useful. Make sure to import it and use it in a place in your code that will take consecutive snapshots over time and name them appropriately. For example, we took a snapshot every time our server received a request that was about to invoke a memory intensive process. This way, we could invoke it many times by just sending another request.

Using heapdump to take heap snapshots.

Using chrome’s remote debugger to take a heap snapshot

If you’re using Node v6.3 or later, you’ve got an even sweeter option. Run node --inspect server.js and then navigate to chrome://inspect and you can remotely debug your node process. For an even more Tesla experience (sorry Cadillac), install this Chrome plugin which will automatically open a debugger tab when you run node with the --inspect flag. Now just take snapshots whenever you think is best.

Using Chrome DevTools remote debugging to take heap snapshot.

Load the snapshots and determine the type of memory problem

The next step is to load the heap snapshots into Chrome DevTools. If you used the second option, you’ve already got them loaded. If you used heapdump, you will need to load them yourself. Make sure to load them in the correct order, loading them sequentially in the order they were taken.

The key thing you’re looking for now is to determine if you have a memory bloat or a memory leak. If you have a memory leak, you’ve likely captured enough data to start digging into the heap to find the source of the problem. However, if you have a memory bloat, you will need to try some other memory analysis methods to get informative data.

Our first memory issue ended up looking like the profile below after we loaded them. As you can see, the heap is steadily growing over time. This is a glaringly obvious memory leak, so if we had any doubts as to whether or not we had a leak, they went away at this point.

Issue 1: obvious memory leak.

Our second memory issue, which occurred a couple months after fixing our memory leak, ended up looking like the profile below for the same test.

Issue 2: no obvious memory leak.

The reason this method doesn’t work well for a memory bloat is that you are capturing a snapshot of the heap at a particular point in time. If that point in time isn’t during the expensive function’s execution, then the heap will not contain any valuable information about the memory used by that function. I’ll recommend two ways around this, both of which helped us find the culprit function and variable. If you are using Node v6.3 or later, you can record an allocation profile via Chrome’s remote debugger and node’s --inspect flag as mentioned above, which will give you a profile of the memory used by individual functions over time.

Another option is to submit many simultaneous requests to your server (assuming there is some asynchronicity in your processes that will allow some variety in the heap snapshot). We bombarded our server with simultaneous requests which gave us some very large heap snapshots that we were able to dive into.

Analyze the snapshots to find the source of the problem

You’ve now got the data that will potentially point you to the culprit of your memory problems. This part can be a bit intimidating but as I’ve pointed out a few times already, this page is your friend.

The retained size is the size of memory that is freed once the object itself is deleted along with its dependent objects that were made unreachable from GC roots.

A good place to start is to sort descending on retained size and start diving into the large objects. For us, the function names pointed us directly to the part of our code that was the culprit.

Issue 1: Memory leak — diving in.

Because we knew that it was a memory leak, we knew that looking for inappropriately scoped variables was a good place to start. We opened our index.js of our email service, and a module level variable at the top of the file immediately stood out.

const timers = {};

We followed it through, made appropriate changes, tested a few more times and alas our memory leak was solved.

The second issue was slightly more complicated to debug but the same general strategy worked. Below is the recorded allocation profile which we got by using Chrome DevTools and node --inspect .

Just like the heap snapshot details, many names of functions and objects are not recognizable and are at a lower level than the code you wrote. So, when you do see a name that you recognize, take note. The allocation profile led us to one of our functions called recordFromSnapshot which was a good starting point. Our heap snapshot investigation — which was not really different from the one for the memory leak above — brought us to a very large object named target . Target was a variable that was declared within the recordFromSnapshot function. target was leftover from our early codebase and wasn’t needed anymore. Getting rid of it fixed our memory bloat and sped up a process that once took 40 seconds to around 10 seconds, with no need to increase our node process size.

Wrapping it up

Our two major memory issues forced our fast paced team to slow down and understand the performance of our server. We now understand the performance of our server on a much more granular level, and we know how long specific functions take and how much memory they use. We have a much better understanding of what resources we will need as we continue to scale. Most importantly, we don’t fear memory issues nor do we expect to encounter them again.

I hope you enjoyed this post. We’d love to hear your thoughts!

Thanks to Christopher Dzoba for reading drafts of this post. Also — thanks to my teammates for helping debug and triage these issues along the way, especially Christopher Dzoba on the memory leak and Stephen Saunders for being a partner in crime on solving the memory bloat issue.

Here is an article I wrote on a general debugging strategy called Hypothesis Driven Debugging.

P.S. We’re hiring!

Resources:

The go-to memory problem solving guide for web development via Chrome DevTools

Node wiki on max_old_space_size

Node v7.x’s V8 module API