Check out Luhnar for an easy way to speed up your client's Django, WordPress, or other CMS-based or custom site.

The majority of benchmarks posted on the web are derived from testing simple “hello world” apps. Although certainly better than nothing, these tests tell us little about real-world performance. Ideally, one would compare multiple implementations of a non-trivial application, but this takes a lot of time that is often hard to justify in the face of dogged competition and looming deadlines. Occasionally, however, the stars do align, and one has the chance to conduct such an experiment.

Lately, I’ve been researching the various merits of Python vs. JavaScript (ala Node.js), in terms of developing web-scale cloud services. In the course of my work, I ported an internal HTTP-based event queuing service from Python (currently running in production) to JavaScript. In a previous post, I shared the results from some informal performance testing of these implementations.

In this post, I’d like to share my results from a second round of more rigorous performance testing, during which the test environment and variables were tightly controlled1. I switched from ApacheBench to Autobench/Httperf, in order to generate a more consistent, realistic load. I also monkey-patched PyMongo this time around, so that both Python and Node implementations would use non-blocking I/O all the way through.

Testing Environment

Servers

Rackspace Cloud Virtual Machines 4GB RAM 2 vCPUs Next Generation Platform Chicago Region (ORD)

Arch Linux (2012.08) x86_64 Full system upgrade (pacman -Syu) linux-3.6.5-1

Tuning Set nofile to 10000 in /etc/security/limits.conf Updated /etc/sysctl.conf to handle a large number of TCP requests and socket churn



Network

200 Mbps, per server

Internal network interface (10.x.x.x)

Python

CPython 2.7.3

PyPy 1.9.0

Gevent 1.0rc1

PyMongo 2.3

WebOb 1.2.3

Node.js

Node.js 0.8.14

Connect 2.5.0

node-mongodb-native 1.1.11

Other

MongoDB 2.2

Autobench 2.1.2

Httperf 0.9.0 (recompiled)

Actors

Cloud Servers

3 Autobench hosts

1 API server

1 DB server

Implementations

Node.js : V8 + Node.js + Connect

: V8 + Node.js + Connect Gevent : CPython + gevent.wsgi + gevent.monkey.patch_all()

: CPython + gevent.wsgi + gevent.monkey.patch_all() WsgiRef : CPython + WSGI Reference Implementation

: CPython + WSGI Reference Implementation WsgiRef-PyPy: PyPy + WSGI Reference Implementation

Benchmarks

As in my previous experiment, I benchmarked retrieving a fixed set of events from an event queuing service backed by MongoDB, with alternative service implementations in Python and JavaScript. Unfortunately, I was not able to directly compare PyPy to both CPython and Node.js, since Gevent is currently incompatible with PyPy, and I did not have the luxury of reimplementing the queuing service a third time (using a non-blocking framework that works with PyPy, such as Tornado).

Update: See my followup post on Tornado, Gevent, PyPy and Cython, in which I share my results from running my app with Tornado on PyPy.

For each test, I ran Autobench directly against a single message bus implementation. I set min_rate and max_rate to 20 and 2000, respectively, in order to test a wide range of requests per second2. The x axes on the graphs below represent the range of req/sec attempted.

I carried out all benchmarks against a single instance of each implementation; no clustering or load balancing solutions were employed (i.e., HAProxy, Gunicorn, Node’s Cluster module, etc.). Although this setup does not model production deployments, it removes variability in the results, making them easier to verify and interpret.

For those implementations that supported HTTP/1.1 Keep-Alive3, I ran each test twice, once with 1 GET4 per connection, and once again with 10 GETs per connection. I denoted this in the results by appending the number of requests per connection to each implementation name, as in Gevent (1) and Gevent (10). The results of the latter test may be especially instructive regarding web apps, since browsers typically perform several requests per connection.

Each request to the message bus returned an identical set of JSON-encoded events (shallow objects, ~1K of text). I also tested Gevent (10) and Node.js (10) against a larger result set containing ~64K of events, and against an empty result set (where the server responded to every request with 204 No Content).

Results

Except where noted, only the results from testing the 1K data set appear in the graphs below. I used Flot to visualize the raw data (see also the JavaScript file accompanying this post).

Now, I’ll step aside for a moment and let the data speak for itself…

Gevent vs. Node.js

Throughput (req/sec)

Response Time (ms)

Errors

Standard Deviation (req/sec)

Sync vs. Async

Throughput (req/sec)

Response Time (ms)

Errors

PyPy vs. CPython

Throughput (req/sec)

Response Time (ms)

Errors

0 KiB vs. 1 KiB vs. 64 KiB

Throughput (req/sec)

Response Time (ms)

Errors

Standard Deviation (req/sec)

Q.E.D.

Node.js outperforms Gevent significantly in terms of latency and error rates for small data transfers (~1K). However, in the case of larger response bodies (~64K), the difference between the two platforms is more subtle. Overall, the best-case scenario for Node.js appears to be serving large numbers of concurrent requests for small chunks of data, over a persistent connection.

PyPy performs only slightly better than CPython when using Python’s WSGI reference implementation. More work is needed to determine whether PyPy would perform similarly to Node.js given a compatible, non-blocking Python web framework and a non-blocking MongoDB driver.

Finally, regarding blocking vs. non-blocking I/O frameworks, Gevent certainly outperforms WsgiRef in terms of throughput and response time, although not by as much as one might expect.

The plot thickens…

Thanks to June Rich for reading drafts of this.