In this benchmark test, we compare three web application servers—Go, Node, and Elixir (Cowboy)—by subjecting each to a synthetic workload, first with 10k, and later with 100k connections.

To simulate a generic web application client and server behavior, we have devised the following synthetic workload. The client device opens a connection and sends 100 requests with 900±5% milliseconds in between each one. The server handles a request by sleeping for 100±5% milliseconds, to simulate a backend database request, and then returns 1 kB of payload. Without additional delays, this results in average connection lifetime of 100 seconds, and average load of 1 request per second, per device.

In the first test, we set the target to simulate 10k devices on a c5.2xlrage AWS instance that has 8 vCPUs and 16 GiB of RAM.

In the second test, we set the target to simulate 100k devices on a c5.9xlarge AWS instance that has 36 vCPUs and 72 GiB of RAM.

Correspondingly, our target load was 10k and 100k requests per second. The test run consisted of a 20-minute rampup from 0 to the target device count and load, then 20 minutes of sustained target device count and load, and, finally, a 20-minute rampdown back to 0. Each test takes 1 hour to complete.

The metrics were collected from two sources. First, we collected the following metrics from each client device: the time to open a connection, the time from sending a request to receiving HTTP headers of the corresponding response, and the time from receiving HTTP headers to receiving the entire HTTP body.

These metrics were aggregated from all client devices into histograms to retain distribution characteristics. In addition, summary metrics such as the total number of open connections and average requests per second were collected.

The second source of metrics was the target server itself, where we collected network and CPU utilization.

We used Ubuntu 18.04 with 4.15.0-1031-aws kernel, with sysctld overrides seen in our /etc/sysctl.d/10-dummy.conf.

Our test webservers were Go version 1.10.4, Node version 10.14.2, and Erlang 21.2.4-1 with Cowboy 2.6 and Ranch 1.7. The source code for each test application is available at https://gitlab.com/stressgrid/dummies.

The 10k Connections Test

In the 10k test, all test webservers achieved their target connection count.

Go and Elixir both were mostly keeping up with the target load of 10k requests per second. Node, due to latencies we will see later, was maxing out at slightly below 10k requests per second.

The time to open a connection averaged out in single-digit milliseconds. It is mostly the same across the three subjects, with the exception of Go having a few hiccups during the rampup phase.

The time to receive request headers includes our artificial delay of 100 milliseconds and randomization of ±5%. Both Go and Elixir are very similar in reflecting this delay, in addition to constant network latency. The story with Node is very different, as we observe it adding latency immediately, followed by a big jump in both latency and latency deviation about 12 minutes into the run, when the rampup hits 5750 requests per second.

The time to receive request body averaged out in single-digit microseconds. In the sustained phase, Go, Node, and Elixir are very similar.

Inbound network utilization is very similar for all three test applications. In the sustained phase, it is 1.75 MB per second, which makes one request be around 175 bytes. Outbound network utilization is closely correlated to requests per second, with Node somewhat lagging behind. In the sustained phase, it is around 12.5 MB per second, which means for every 1kB of payload there is about 250 bytes of protocol overhead.

CPU utilization is where things get interesting. The Node application utilized the least amount of CPU time and achieved its maximum CPU utilization while still in the rampup phase. We’ve seen that Go and Elixir demonstrated very similar performance characteristics from a client’s perspective, yet Elixir achieved this result with significantly higher CPU utilization.

The 100k Connections Test

In the 100k test, only Go and Elixir reached the target connection count. Node peaked at about 60k connections.

Both Go and Elixir were also able to keep the sustained target load of 100k requests per second. Node maxed out at about 25k requests per second.

12 seconds into the rampup phase, at about 60k connections, Node became severely overloaded with connections taking single-digit seconds to open.

Go and Elixir remained in low single-digit milliseconds to open connection, with a few hiccups.

As Node overloaded, the average time to receive request headers grew to 1.5 seconds.

For both Go and Elixir, the time to receive request headers remained at around 100 milliseconds. Notably, Elixir maintained a nearly-constant performance throughout the entire test. Go’s average slowed by a few milliseconds after reaching about 70k connections.

The time to receive request body remained in single-digit microseconds, with a distribution very similar to the 10k test. This likely means we are testing the client side :-)

Network utilization grew proportionally to the 10k test. Go and Elixir both peaked at 125 MB/s on the outbound network, which is about 1 Gigabit per second. This means that if we wanted to achieve 10x of the workload, we would come close to saturating a 10 Gb/s link. Node, by peaking at below the target workload, also showed network utilization corresponding to the handled load.

CPU utilization confirms the findings from the 10k test. Go was very efficient in using CPU and would have headroom, had the application been more compute-intensive. Elixir nearly saturated all 36 cores while delivering a surprisingly consistent performance. Node exhausted its scalability limit while utilizing only single-digit percent of available CPU resources.

Methodology

To run this test, we used the Stressgrid framework with 2 c5.2xlrage generators for 10k devices and 20 c5.2xlrage generators for 100k devices. Stressgrid monitors CPU utilization of the generators, so we can avoid skewed results due to generator oversaturation. In this test, all generators stayed below 80% CPU utilization.

Both target and generator instances were placed into a single VPC with two availability zones, using the internal IP network for communication. With this approach, we tried to simulate the typical behavior of a load balancer. With on-demand instances, the 10k test cost was around $10 and the 100k test cost was around $40, including data transfer.

Conclusions

The goal of this benchmark was not to analyze why Go, Node, and Elixir exhibit the observed behavior. Instead, we wanted to quantify their behavior so that readers familiar with the internal workings of each system can reflect on the strong and the weak sides. For those readers already using one of the systems, or those planning to migrate from one to another, we hope to provide some back-of-the-napkin guidelines to help with capacity planning. We also welcome suggestions on how to improve our benchmarking approach.

Discussion on lobste.rs

Discussion on Hacker News