Caching is one of the first things you can do when you need to start thinking about scaling. Among efforts such as query minimization, denormalization, code optimizations, compression, database tuning, indexing, and load balancing, caching remains one of the lowest hanging fruits in methods to lighten your server load and handle huge amounts of traffic. There are many options, and I chose to evaluate a few of the most interesting setups.

This is not intended to be a rigoriously scientific test, but more of a first impression of the different caching systems. For all the tests I'm describing, I'm using a single VPS on Rackspace Cloud with 320MB of RAM, a quad-core AMD Opteron 2350HE, and a bleeding edge server stack using Ubuntu Server 9.10, NGINX with UWSGI, Python 2.6, Django 1.1, and PostgreSQL 8.4. I'm serving the home page view of Django-Mingus, which provides a realistic amount of complexity to the Python side of things and gives us a 9387 byte response. I'm using 4 UWSGI processes and a single NGINX worker. All my tests are using ApacheBench, which I'm running on the same machine. Note that for all my cache tests I'm prepopulating the cache before running the benchmark. Here are the different setups I'm going to evaluate:

No caching whatsover. Django's template caching templatetag. Django's two-part caching middleware. NGINX Memcached module. On-disk caching with Django-staticgenerator. Varnish as front-end load-balancing cache.

No Caching

For any content-driven website, this is probably the worst idea of them all, and as you'll find out, it is trivial to implement most of the above caching strategies. Clearly, my single server arrangement is not going to be representative of your large app server cluster, so I urge you to evaluate all the options if you are anticipating scaling. Finding the right recipe for your server setup is going to be the fun part.

For the purpose of establishing a baseline, I ran ApacheBench on my setup with no caching turned on. I'm running 10 concurrent requests for a 1000 requests using the following ApacheBench command:

ab -n 1000 -c 10 <server-name>

Here's a snipped version of the results:

Concurrency Level: 10 Time taken for tests: 68.619 seconds Sent requests: 1000 Completed requests: 1000 Failed requests: 0 Total transferred: 9660000 bytes HTML transferred: 9387000 bytes Requests per second: 14.5732231597662 Transfer rate: 141.610400362873 kb/s received Connnection Times (ms) min avg max Connect: 0 0.12 10 Response: 309 681.65 1330

It's probably possible to tune this for shorter latency, but we got the main number we were looking for; we can push 14.57 requests/second without a cache. Not bad, until you get Slashdotted!

Django's template caching templatetag

Django provides an easy way to cache parts of your template using the "cache" template tag. Here is an example of usage:

{% load cache %} {% cache 500 sidebar %} This goes into cache. {% endcache %}

Django-Mingus makes good use of the cache template tag in the default templates. In this test, I enabled Memcache in Django and removed view caching so I could get an idea how segment caching affects performance. This page benefits from 10 template cache hits and 4 other Memcache hits used in some of Mingus's apps.

Concurrency Level: 10 Time taken for tests: 26.19 seconds Sent requests: 1000 Completed requests: 1000 Failed requests: 0 Total transferred: 9479000 bytes HTML transferred: 9387000 bytes Requests per second: 38.1825124093165 Transfer rate: 353.449253054601 kb/s received Connnection Times (ms) min avg max Connect: 0 0.29 10 Response: 90 260.61 490

Enabling templatetag caching has given a significant speed boost to 38.18 requests/second. This is a 262% improvement over no cache. Response time is also improved, down from 682ms to an acceptable 260ms. Good, but there's still a lot of room for improvement.

The subtle increase in performance shouldn't deter you from implementing the tag though, as template caching bears the benefit that one segment can be cached and used across multiple pages (for example, a sidebar that is the same on different parts of the site).

Django's two-part caching middleware

Django comes equiped with middleware that provides frontend proxy-style full page caching with almost no configuration. Full page caching is clearly where you're going to find the greatest benefits. Something like Squid, Varnish, or NGINX is better suited for this job, but the ease of setup makes this middleware useful for environments where a minimal amount of complexity is desired. Because of the greater performance, I'm running 10,000 requests instead of 1,000 to get a better sample.

Concurrency Level: 10 Time taken for tests: 9.07 seconds Sent requests: 10000 Completed requests: 10000 Failed requests: 0 Total transferred: 130040000 bytes HTML transferred: 127560000 bytes Requests per second: 1102.53583241455 Transfer rate: 14001.3437155458 kb/s received Connnection Times (ms) min avg max Connect: 0 0.15 10 Response: 0 9.02 470

This is about as fast as Django's going to run on this hardware without a more sophisticated caching proxy. We've revved Django's internal caching to give us 1103 requests/second, over 75 times as many as we had with no caching. However, we're still passing every request into Python, which gives us limits we cannot avoid without moving the caching layer into the frontend server. For this we'll need to explore NGINX or Varnish.

NGINX's Memcached module

NGINX has a very nice caching feature that most servers lack: it can serve an HTML document directly from Memcached without ever touching your Python code. Since we are already using NGINX, enabling the Memcached HTTP caching module was a trivial task. For this test, I will disable Django's caching middleware and add a custom cache update middleware that sets a cache key that NGINX can be configured to read. I used a modified version of the middleware from Oliver Weichold's blog post on using Django with NGINX+Memcached. Enabling the module in NGINX config was just adding a new location directive for Memcached and assigning the web app as a 404 handler for that location:

Before:

location / { uwsgi_pass unix:///tmp/mingus.sock; include uwsgi_params; }

After:

location / { default_type text/html; set $memcached_key nginx.$request_uri; memcached_pass 127.0.0.1:11211; error_page 404 = @cache_miss; } location @cache_miss { uwsgi_pass unix:///tmp/mingus.sock; include uwsgi_params; }

Running the same benchmark as above, here are my results:

Concurrency Level: 10 Time taken for tests: 3.699 seconds Sent requests: 10000 Completed requests: 10000 Failed requests: 0 Total transferred: 130640000 bytes HTML transferred: 129190000 bytes Requests per second: 2703.43336036767 Transfer rate: 34489.8959178156 kb/s received Connnection Times (ms) min avg max Connect: 0 0.36 30 Response: 0 3.66 109

Now we're getting serious! I was serving 2703 requests/second through memcache on my VPS. Now we're in Slashdotting territory. This is over 185 times as fast as vanilla Django. The important thing to note here is that we're accomplishing the same thing as Django's built-in two-part caching middleware, but now we are doing it 2.5 times faster.

On-disk caching with django-staticgenerator

Another approach is to use on-disk caching techniques to serve static files. This is made possible with django-staticgenerator, which has middleware that generates flat files that NGINX can serve directly. It was simple to set up, and here are my results:

Concurrency Level: 10 Time taken for tests: 2.78 seconds Sent requests: 10000 Completed requests: 10000 Failed requests: 0 Total transferred: 131320000 bytes HTML transferred: 129190000 bytes Requests per second: 3597.12230215827 Transfer rate: 46130.2832733813 kb/s received Connnection Times (ms) min avg max Connect: 0 0.67 90 Response: 0 2.66 190

Now we're rocking 3597 requests/second. NGINX can serve static files like nobody's business.

Varnish

Varnish is a very powerful load balancing caching proxy that is made for heavy traffic. I'm configuring it as an HTTP proxy to my NGINX server to see how it stacks up.

Concurrency Level: 10 Time taken for tests: 2.76 seconds Sent requests: 10000 Completed requests: 10000 Failed requests: 0 Total transferred: 131230000 bytes HTML transferred: 129190000 bytes Requests per second: 3623.1884057971 Transfer rate: 46432.716259058 kb/s received Connnection Times (ms) min avg max Connect: 0 0.60 20 Response: 0 2.74 90

Varnish is very competitive in raw speed, serving 3623 requests/second, an impressive number, nearly 250 times higher than if there was no cache. Varnish is also very configurable and built for extremely high traffic.

Conclusion

Every scaling problem has it's own variables that can greatly affect the types of decisions that need to be made to implement a good caching stategy. For example, a multi-server setup is likely to behave much different given the same benchmarks. There are also more complicated factors such as how to treat logged-in users and cookies. There are workarounds for cookie hashing problems (such as removing the "Vary: Cookie" response header) that can add complexity to certain environments, so there is more to consider than raw performance.

Also make note that not all of these are mutually exclusive. A good combination of caching might be internal template caching plus either Varnish or NGINX acting as a frontend cache. My best suggestion is to experiment and see what works best for your environment, and I hope this post was helpful for summarizing your options.