A better Caching system for Django

Django has pretty good support for caching, which is one of the easiest ways of speeding up a web application. The default is to cache for ten minutes, which means that if you get multiple requests for a page within a ten minute window then Django can serve up a stored copy of the page without hitting the database or rendering the HTML. The caching period can be set per-page and fragments of pages can be cached rather than the whole, but the system rests on the fact that it doesn't matter if the content doesn't change for a period of time.

Trouble is, not many web applications work that way. Consider a humble blog post with a piece of content and a list of comments. Granted, the post content isn't going to change very often and will probably never change again once the author has corrected all the typos, but a list of comments may change at any point through the life-time of the blog. If the page were cached, comments would not be visible on the page for a short period of time, which doesn't give that instant gratification that web users expect these days.

It would be nice if a web app could benefit from caching and still serve a fresh page when content changes. Alas, Django's default time-based caching is never going to achieve this. What is needed is a way of invalidating an item in the cache when there are changes that alter the content; using the blog example, this would be when the post content changes or a new comment is submitted. Django does give you the tools to do this -- using the low level cache API, you can construct a cache key (a simple string) that changes according to the page content, then store and retrieve your content manually. For example you might create a cache key with the modification date of the post, and the ID of the most recent comment. Then the page need only be rendered when the cache key isn't found in the cache. The downside of this approach is that it can take a little work to calculate the key (you may still have to hit the database to generate the cache key). This is not quite as nice as time-based caching, which will do negligible work for cached pages.

A better approach, which has the benefit of always serving a fresh page, and still requires negligible work for cached pages is to construct the cache key based on the request, then invalidate (i.e. delete) the cache key when the content changes. Again, this can be done with Django, although it has no explicit support for it. Such a system requires more work because the web app must track any event that could potentially require a fresh page to be generated, but has the major advantage that up-to-date pages are virtually always going to be available in the cache.

I've written ad-hoc page-event cache systems a couple of times and they are a major headache to maintain. Even on simple pages there can be multiple events that may invalidate a cached page. What would be nice, is some formal way of associating a model instance with a the URL(s) that depend on it, so that when the instance is saved, the associated pages will be invalidated and re-generated at the next request.

I've not yet settled on the best way of implementing this, but it will probably be a decorator that builds up a mapping of object type plus primary key to its dependant URLs, and a mixin for models that hooks in to the save method.

I'm open to suggestions regarding the most elegant implementation, and any other potential solutions!