May 11, 2017

One of the great things about Python is how easy it is to hit the ground running. The standard library is vast, and for every common problem people have, someone has written and published a library that you can download and install with pip . Often there is one right way to do things. Want to make HTTP requests? pip install requests . Want a database adapter, ORM, and migration system (but not using Django or some other integrated framework)? Use SQLAlchemy and alembic . And so on…

But when it comes to even the simplest kind of caching, I see code like this everywhere:

def get_all_posts ( person ): cache_key = ' get_all_posts: {} ' .format(person.id) posts = cache.get(cache_key, default = None ) if posts is None : # do the expensive operation posts = Post.objects.filter( person = person).all() cache.set(cache_key, posts) return posts

What’s going on here? We’ve got five lines devoted to the grunt work of caching and one line devoted to the actual thing we want to do. How could we improve this?

If you really inspect those lines, there are three things you need to know:

1. The function I want to cache is:

def get_all_posts ( person ): return Post.objects.filter( person = person).all()

2. The cache backend I want to use to do it is: cache

3. The value of the function only changes when person.id changes

Enter quickcache

Two years ago, I wrote quickcache so that I could cache functions the way I wanted to. With quickcache, the above looks like this:

from quickcache import get_quickcache quickcache = get_quickcache( cache = cache) @quickcache ([ ' person.id ' ]) def get_all_posts ( person ): return Post.objects.filter( person = person).all()

The first parameter to quickcache is a list of names of arguments to vary on, and you can use intuitive . –notation to access those arguments’ properties as well.

Now, I should point out, I wrote this with Django’s cache library in mind, but you can use any backend as long as it is wrapped to present a very simple interface:

# get the value for key from the cache, return default if it's missing cache.get(key, default=None) # set the value for key to value cache.set(key, value) # remove key from the cache cache.delete(key)

Tiered Caching

There’s often a tradeoff between caching in process memory and caching in a shared cache like memcached or redis. If you store it in process memory, then it is blazing fast to retrieve again in the same process, but on a multi-worker web service, for example, other forked processes would not get the same benefit. If you store it in a shared cache, then other processes can benefit from it, but if you access it multiple times in short succession, for example within the same web request, then each time requires a round trip to memcached/redis, which adds up in a way that caching in local memory doesn’t.

Why not use both?

At Dimagi, that is what we do, and quickcache supports a special configuration that makes this easy with Django out of the box, and some simple tools you can use to replicate it with your own cache backend as well.

quickcache comes with the concept of a TieredCache—simply a cache that combines two or more caches and outsources the get s, set s, and delete s to them. On a get it’ll try the first, then the second. On a set it’ll set in both. And on a delete it’ll delete from both. If you’re not using Django, you can use this helper to configure it using your own cache backends.

If you are using Django, it’s even easier. For a cache that stores in the shared cache for five minutes and in local memory for 10 seconds, you could use the following:

from quickcache.django_quickcache import get_django_quickcache quickcache = get_django_quickcache( memoize_timeout = 10 , timeout = 5 * 60 )

Sometimes you want to skip the cache

At Dimagi, we also find ourselves writing code like this:

def get_all_posts ( person , force = False ): cache_key = ' get_all_posts: {} ' .format(person.id) if force: posts = None else : posts = cache.get(cache_key, default = None ) if posts is None : # do the expensive operation posts = Post.objects.filter( person = person).all() cache.set(cache_key, posts) return posts

That way, get_all_posts was cached by default, but you could also force it to skip the cache, after which it would update the cache.

This also comes out of the box with quickcache ,, whether you use the generic or Django variants. To get exactly the same behavior, you can use:

@quickcache ([ ' person.id ' ], skip_arg = ' force ' ) def get_all_posts ( person , force = False ): return Post.objects.filter( person = person).all()

Easily configurable at every level

Of course, each time you use quickcache, you may want to use it with different defaults.

Each argument to get_quickcache or get_django_quickcache can be passed in at one of three stages, later values overriding defaults. Usually, you’ll define a singleton quickcache that you use everywhere. For example,

# singletons/quickcache.py quickcache = get_django_quickcache( memoize_timeout = 10 , timeout = 5 * 60 )

Then you don’t have to define quickcache every time. Just import it from this file. When you do use it, you can override the defaults. For example, here I override timeout and set skip_arg for the first time:

@quickcache ([], timeout = 60 * 60 , skip_arg = ' force ' ) def ...

You can also bake in extra args at any time between when you call get{_django}_quickcache and when you use it. For example, if you’re going to be using skip_arg='force' repeatedly in a file, you can inherit the defaults from your main quickcache object, and change just skip_arg :

from singletons.quickcache import quickcache skippable_quickcache = quickcache.but_with( skip_arg = ' force ' )

Then the previous example could just be

@skippable_quickcache([], timeout=60 * 60) def ...

For either flavor, the arguments you can set are vary_on , skip_arg , and two advanced args that let you mess with the internals, helper_class (if you’re really dying to override some quickcache internals), and assert_function (if you want to fine-grained control over the way certain warnings are logged).

For get_quickcache only, you can also set the cache argument. For get_django_quickcache only, you can set the memoize_timeout and timeout arguments.

Custom vary_on and skip_arg

Using string values for skip_arg and a list of strings for vary_on works nine out of 10 times, but when you really want control, you can pass a function to either. The function should have the same arguments as the function you’re decorating. Your skip_arg function should return the value to vary on; your skip_arg function should return True when you want to skip the cache.

Clearing the cache

A caching utility becomes nearly useless if it doesn’t let you easily clear a particular cache value. quickcache has a nice interface for doing this as well.

# person writes a new post Post( person = person).save() # now the cached value for get_all_posts(person) is out of date # so clear it: get_all_posts.clear(person) # now it'll return a fresh value next time you call it all_posts = get_all_posts(person)

Conclusion

For us at Dimagi, quickcache has been a dream come true. We use it all the time. It is hard to imagine what building web applications in Python would be like without it; I’ve largely blocked those memories out. It’s so nice to use, simple, and powerful that it feels wrong to keep it all to ourselves.

That’s why we’ve decided to spend a little extra time to isolate the Django dependencies, eliminate any dependency on the CommCare HQ codebase, and publish it on pypi.

Sometimes there’s a right way to do it in Python. Want to cache values at the function level? pip install quickcache .

Check out the code at https://github.com/dimagi/quickcache.