Caching strategies, cache invalidation, and everything else caching related is hard to get right, especially as systems become more and more liberal in the data that is cached, and more variable in the retention rules.

This isn’t a guide on caching and caching patterns, but sheds light on some of the excellent options that are available in the Perl ecosystem.

Problem

I’m going to use a totally useless web service as an example; MD5-as-a-Service. All it does is take a word, calculate the MD5 sum, and returns the checksum in a JSON response.

[edit] Note: This is not a realistic web service; it’s just an example, purely for the purpose of the blog post. Calculating MD5 checksums seemed more fun than sleeping for 5 seconds.

#!/usr/bin/env perl use warnings;

use strict; use Digest::MD5 qw(md5_hex);

use Mojolicious::Lite;

my ($c) =

my $word = $c->param('word'); post '/:word' => sub {my ($c) = @_ my $word = $c->param('word'); my $digest = md5_hex($word); $c->render(json => {

word => $word,

digest => $digest,

} );

}; app->start;

The service can be served out of Starman , which is a pre-forking web server, defaulting to 5 workers.

$ starman --listen :5000 -a app.psgi

2018/11/13-20:32:40 Starman::Server (type Net::Server::PreFork) starting! pid(92029)

Resolved [*]:5000 to [0.0.0.0]:5000, IPv4

Binding to TCP port 5000 on host 0.0.0.0 with IPv4

Setting gid to "20 20 20 504 401 12 61 79 80 81 98 33 100 204 395 398 399"

And requests can be made with a simple curl command.

Simple.

This works for a while. But then, after a period of time, it seems MD5-as-a-Service has gotten popular, and too many precious CPU cycles are being wasted calculating the same checksums over and over again.

Local Caching

The service only lives on one server at the moment, so some sort of local cache sounds like a good idea. The first tool to grab from the CPAN toolbox is Cache::FastMmap .

It’s fairly simple to add.

#!/usr/bin/env perl use warnings;

use strict; use Cache::FastMmap;

use Digest::MD5 qw(md5_hex);

use Mojolicious::Lite; my $CACHE = Cache::FastMmap->new(

share_file => '/tmp/md5-perl-caching',

cache_size => '10m',

);

my ($c) =

my $word = $c->param('word'); post '/:word' => sub {my ($c) = @_ my $word = $c->param('word'); if (my $digest = $CACHE->get($word)) {

$c->render(json => {

from_cache => 1,

word => $word,

digest => $digest,

} );

}

else {

my $digest = md5_hex($word);

$CACHE->set($word, $digest); $c->render(json => {

from_cache => 0,

word => $word,

digest => $digest,

} );

}

}; app->start;

First, it checks if the requested value is in the cache. If it is, it serves the value out of the cache back to the client. Otherwise, it calculates the checksum requested, stores it in the cache, and then serves the value back to the client.

This is referred to as the Cache-Aside pattern.

I’ve added an extra key in the JSON response, purely to see whether or not the value came from the cache.



{"digest":"acbd18db4cc2f85cedef654fccc4a4d8","from_cache":0,"word":"foo"}

$ curl -X POST

{"digest":"acbd18db4cc2f85cedef654fccc4a4d8","from_cache":1,"word":"foo"} $ curl -X POST http://localhost:5000/foo {"digest":"acbd18db4cc2f85cedef654fccc4a4d8",,"word":"foo"}$ curl -X POST http://localhost:5000/foo {"digest":"acbd18db4cc2f85cedef654fccc4a4d8",,"word":"foo"}

Excellent!

The best part is that even though Starman is a pre-forked web server, Cache::FastMmap was designed to share the cache between many processes.

A shared memory cache through an mmap’ed file. It’s core is written in C for performance. It uses fcntl locking to ensure multiple processes can safely access the cache at the same time. It uses a basic LRU algorithm to keep the most used entries in the cache.

When it comes time to tweak the details of the cache to get more performance out of the module, the documentation explains all of the knobs that can be tuned for all of the other caching nerds out there.

Expiration

The code above initialised the cache with a size of 10MB. If the cache exceeds 10MB, it will expire entries based on a LRU algorithm (as mentioned above in the docs).

That might make sense for the kind of data being cached in this service— because MD5 checksums don’t change no matter how much time passes — but when a system is caching values that can change, e.g. values out of a database that represent an organic value, expiring cache items based on a unit of time makes sense.

A simple way to do this for all items in the cache is at initialisation.

my $CACHE = Cache::FastMmap->new(

expire_time => '3s',

); $CACHE->set(foo => 'bar');

This sets the expiry time at 3 seconds. The value 10m can be used for 10 minutes, 1h for 1 hour, etc.

However, if the cache is storing many different things with different expiration requirements, the expiry can be specified with the call to set .

$CACHE->set(foo => 'bar', '3s');

Alternatively, items can be removed explicitly.

$CACHE->remove('foo');

A Side Note

If data in the cache goes stale, it’s important to expire the cached data. Other than expiring cached data based on a unit of time, there are a couple of other simple strategies for expiring stale data: