A cache is not really much more than a temporary storage of the outcome of computationally heavy tasks. So you calculate something, and store it. When the request for calculation comes in a second time, you just reply with the answer you still have in cache. This saves you a lot of time calculating things that will lead to the same outcome.

If it’s that easy, why talk about it? Well, because even the simplest things in life take effort to master. Even though the concept is simple, there are some things to consider when you implement it.

What

There are many things you can cache. When we talk about cache, we tend to think of HTML pages generated by a web application or assets on the page. Often though, it’s more interesting to cache pieces of the calculation. Those are function caches / memoizations. Caches can exist on multiple layers: for example from complete results in browsers all the way down to CPU, GPU or Disk caches.

Let’s say your menu involves a recursive lookup in your database. You can cache the result so you won’t have to consult your database the next time around. This menu is rendered on nearly all pages leveraging the cache, so the reduction in computation time is significant. Having a frequently consulting cache is also known as having a cache with a high hit rate.

Let’s say that the page gets cached on an edge gateway (a pipe through which all the traffic flows from your internal network to the outside world) which honours the cache headers the application generates. You are now already posed with one of the complexities with regards to cache. Active Cache invalidation becomes increasingly hard. Imagine aggregates being derived from caches. There’s no way to actually verify the result unless you were to do a a completely uncached run. Your application therefore will behave more unpredictably, and ‘cache’ will easily become the scapegoat. I’ll talk some more about that and cache-headers under chapter ‘when’ .

It is crucial to prevent errors from being cached. When a backend process bugs out, trigger a circuit breaker instead, effectively backing off and giving your backend some space to breathe. Caching errors usually leads to empty strings being cached. These empty strings often lead to a killing customer experience, because the application detaches the state of failure from the message at that point. In other words, there’s no error thrown when the process issues that cache, and there’s no way you can distinguish the error from the process actually producing an empty string.

Where

Now that you know what you want to cache, you’ll have to start figuring out where you’ll do that.

Caches are usually stored using a key-value storage mechanism. This mechanism should first and foremost be super-blazingly-fast. Caches should be quick with minimal dependency, otherwise the solution defeats the purpose.

Any solution that does synchronous disk IO is therefore already excluded. Ideally, you want something that serves from memory. Depending on how much cache you want or need to build, you would like to have the ability to spread data over a cluster. Redundancy isn’t an issue since caches are volatile by nature. They are only there to speed up things, but shouldn’t be critical to the mission and can always be regenerated.

This brings us to the topic of consistent-hashing. What this basically means is your key will always bring you to the same server in the cluster. Let’s say you have three servers with memory. They all by approximation contain 1/3rd of your cache. When you consult your cache, you always want to be routed to the same machine that should or should not contain the cache, and not leave it up to coincidence.

Some of your algorithms might generate many, many, many possible cache-keys. It’s not always useful to store them all. Because of this, it’s advisable to limit the amount of space you grant this algorithm’s cache. When you limit the space (per cache-item as well as per cache subject) your storage engine will have to evict caches. Evicting (freeing up memory so new things can be stored) comes with a risk. For example, you usually don’t want to evict the most consulted cache. You will have to choose a cache algorithm / strategy that works best to manage your space, like evict the Least Recently Used (By applying a LRU Cache strategy), or Least Frequently Used (LFU).

Often Memcached (or a derivative like Couchbase) is used to do exactly all of the above. Some prefer Redis with eviction configurations. I prefer bare Memcached since I like its atomic nature. It’s designed to work as an LRU caching cluster, nothing else.

Not all caches can be (nor should be) controlled within your own environment though. There is a moment you’ll hand over the thing you’ve produced to a consuming system, e.g. the browser of a visitor. They maintain caches as well. As a matter of fact, you might want to add a geographically convenient cache to reduce roundtrip time, e.g. by using a (multi-)CDN (Content Delivery Network). So it is crucial you instruct proxies, CDN’s and other systems like the browser in how they should treat your cache. This is usually done by attaching cache headers to the response. Think of:

public or privately cacheable

how long can it be cached

which headers determine the cache-key remotely (vary)

When

One of the most important properties of cache is the TTL (Time To Live). If you set it for too long, you’ll have to sit the time out before the cache invalidates and your changes have effect (in case you cannot actively unset, e.g. in remote caches). If you set it too short, you’ll risk doing lots of calculations and slowing things down unnecessarily.

When it comes to files, it’s a good idea to apply cache busting by just renaming the file when you regenerate the asset. This allows the client to cache the file infinitely (since after change there will be a new file; this works particularly well when you have a CDN in place).

When it comes to micro caches, it works particularly well to go with short lived caches. Think of max cache times of 5 minutes. When you exceed these 5 minutes, you risk becoming too reliant on your cache being there, and you’ll find yourself not fixing the actual issues that create slowness. Again, cache is an optimisation, but should not be mission critical.

There are many ways you can repopulate your cache when it falls out of grace because its TTL has expired. This is important, especially when you have computationally intensive operations combined with a high load.

You can apply stale cache renewal. This basically means that you keep serving the old artifacts, but fork off an asynchronous call to your backend. Once it comes back you replace the cache with the new result. This ensures that your visitor always hits cache and (almost) never the original algorithm.

You could also apply cache warming. By warming the caches you ensure that there’s some cache ready before your visitors hit the cache. You can now control when and how you want to strain the system to produce new content. There are evidently many ways to warm your caches. A popular one would be by replaying the access-log and ‘crawl’ through your own frontend. You replay access-logs to have the highest chance of indexing pages that are visited most. You can also do active cache renewal by warming during operations. You can do this by appending a query variable that tells your application to not hit cache but consult the original algorithms and rewrite the cache entries even if their TTL’s haven’t expired yet.

However, when caches have to be repopulated and your backend is doing computationally intensive work, you risk a cache stampede or dogpiling, which can cause many strange outages. One moment the system is doing fine, the other moment all bells and whistles are going off.

What then happens is that too many requests try to consult the same cache, and they all draw the conclusion they should consult the original algorithm. The problem is that they do this in parallel. Because of this, the requests have the potential to slow each other down and consume unnecessary and unreasonably many resources. One way to solve this is by putting a semaphore on the cache. You lock the cache, creating a queue for the cache result, rather than a queue for the backend systems. When the first request answers the question and stores it in the cache, the other requests can simply utilise the result.

Conclusion

Even though the concept ‘Cache’ seems trivial, implementing it properly can be challenging. I hope this article gave you some insights in what you have to look out for and how you can improve your existing setup when needed.