We’ve just released version 4.0.56 of the Phusion Passenger application server for Ruby, Python and Node.js, which fixes a number of interesting and important bugs. They’re the kind of typical bugs that make me go “what the **** was I thinking?!” after I’ve analyzed them, because the fixes are very simple.

Leaking file descriptors

The first bug is a file descriptor leak. A file descriptor is number which represents a kernel resource, such as an open file or a socket. Every time you call File.open or TCPSocket.new in Ruby, you get an IO object that’s internally backed by a file descriptor. In Node.js you even often work with file descriptors directly: most fs functions return a file descriptor. Since Phusion Passenger is written in C++, we also work with file descriptors directly.

Schematically, it looks like this:

You’re probably familiar with memory leaks. A file descriptor leak is very similar. If you ever lose track of a file descriptor number, you’ve leaked it. In Ruby this is not possible because all IO objects own their file descriptor, and IO objects are garbage collected (thus closing the corresponding file descriptor). However in Node.js and in C++ this can easily happen if you’re not careful. When leaked, the kernel resource stays allocated until your process exits.

What went wrong

In Passenger, we leaked a file descriptor when creating an error report file. This file is created if your app can’t spawn for some reason (e.g. it throws an exception during startup). The code that was responsible for rendering the file looked like this, in semi C++ pseudocode:

void processAndLogNewSpawnException(SpawnException &e) { int fd = -1; FdGuard guard(fd); fd = createNewReportFile(); if (fd != -1) { renderReportFileContents(fd, e); } }

Notice the guard variable. In C++, it is a so-called RAII object: “Resource Acquisition Is Initialization”. It is a common coding pattern in C++ to ensure that things are cleaned up when exceptions are thrown, kind of like the C++ equivalent of the ensure keyword in Ruby or the finally keyword in Javascript. When this function exits for any reason, be it a normal return or an exception, the guard destructor is called, which is supposed to close the file descriptor.

The facepalm moment was when Paul “popox” B reported that the guard was on the wrong line. The guard was created before the file descriptor was assigned to fd , so the guard did nothing all this time. Every time a report file was created, a file descriptor was leaked.

The one-line fix

The solution was to move the guard object a little bit:

void processAndLogNewSpawnException(SpawnException &e) { int fd = -1; // Guard object was here fd = createNewReportFile(); // It is now here FdGuard guard(fd); if (fd != -1) { renderReportFileContents(fd, e); } }

Thank you Paul B!

Node.js load balancing

The other issue fixed in 4.0.56 is a Node.js load balancing issue. In Passenger we load balance requests between application processes as much as possible. Traditionally, the reason for load balancing has been to minimize latency. This utilizes the concept of “application concurrency”: the maximum number of concurrent requests a single app process can handle. For Ruby apps, the concurrency is 1 (unless you configured multithreading, in which case the concurrency is equal to the number of threads). Since Ruby apps have finite I/O concurrency, Passenger load balances a request to a different process only if one process has run out of concurrency.

Node.js is different in that it’s fully asynchronous. It can effectively have an unlimited amount of concurrency.

Passenger orders processes in a priority queue by “busyness”. Load balancing is achieved by routing a new request to the process with the least busyness.

What went wrong

What went wrong with the Node.js case is the fact that we had special rules for application processes with unlimited concurrency. The busyness for such processes is calculated as follows:

if (sessions == 0) { return 0; } else { return 1; }

sessions indicates the number of requests that a process is currently handling. This piece of code effectively sorted Node.js processes in two categories only: idle processes and non-idle processes.

From a concurrency point of view, there is nothing wrong with this. Node.js apps have unlimited concurrency after all. However this resulted in lots of requests “sticking” to a few processes, as Charles Vallières reported:

* PID: 24526 Sessions: 84 Processed: 1 Uptime: 9s * PID: 24545 Sessions: 1 Processed: 0 Uptime: 9s * PID: 24571 Sessions: 83 Processed: 0 Uptime: 8s * PID: 24596 Sessions: 1 Processed: 0 Uptime: 8s

Then it dawned to me that I forgot something. An even distribution of requests is desirable here, because now the reason for load balancing becomes different. It’s to maximize CPU core usage, because single Node.js process can only use 1 CPU core.

The fix

The fix was incredibly easy:

return sessions;

Yes, facepalm time.

Thank you Charles Vallières!

Installing or upgrading to 4.0.56

Final

Phusion Passenger’s core is open source. Please Phusion Passenger’s core is open source. Please fork or watch us on Github.

If you would like to stay up to date with Phusion news, please fill in your name and email address below and sign up for our newsletter. We won’t spam you, we promise.