MySQL Conf - Memcached Internals

Brian Aker and Alan Kasindorf gave a great talk on Memcached - every developer’s favorite caching layer - at MySQL User Conference earlier last week. The slides are available online, and they’re well worth a bookmark. Brian and Alan promised to continue updating the deck as they hone out their presentation at OSCON and other events. In fact, their next iteration will include the Nginx integration trick, and new and improved section on consistent hashing. In other words, check back frequently.

Memcached Best Practices

At its core, Memcached is a high-performance, distributed caching system. It is application neutral, and is currently used on many large scale web sites such as Facebook (2TB of cache, circa Q1 2008), LiveJournal, Mixi, Hi5, etc. However, it is also an extremely simple piece of software: all of the logic is client-side, there is no security model, failover, backup mechanisms, or persistence (albeit the last one is in the roadmap). But that hasn’t stopped the developers from deploying it in all kinds of environments, and here are a few best practices suggested by Brian:

Don’t think row-level (database) caching, think complex objects

Don’t run memcached on your database server, give your database all the memory it can get

Don’t obsess about TCP latency - localhost TCP/IP is optimized down to an in-memory copy

Think multi-get - run things in parallel whenever you can

Not all memcached client libraries are made equal, do some research on yours - hint, use Brians.

Slab Allocator - Heart of Memcached

The heart of Memcached is in its memory slab allocator. A little daunting at first sight, it is actually a very elegant solution once you understand the motivation and the tradeoffs of its architecture:

Size of maximum allocated memory to Memcached is only limited by the architecture (32/64-bit)

The key size is limited to 250 bytes, the data (value) size is limited to 1mb

On bootup, Memcached grabs the memory and holds on to it - wasted memory at cost of performance

Allocated memory is split into variable sized buckets by the slab allocator

Default slab allocator will create 32-39 buckets for objects up to 1mb in size

You can customize the page size at compile time, and slab sizes on startup

Each stored objects gets stored in a closest-size bucket - yes, memory is wasted

Fragmentation can be a problem - either customize slab sizes, or evict/flush your cache every so often

If there is unused memory in a different slab class, that memory will be re-allocated, if required

To guarantee no-paging, disable swap on your OS - practically, just keep it tiny, as to avoid disasters

Memory Management

The memory is managed via a Least Recently Used (LRU) algorithm:

Each slab class has its own LRU - evict target depends on the size of the object

Expiration timestamps are checked once every second - minimum lifespan is 1 second

Objects marked for deletion are handled asynchronously - checked and evicted once every 5 seconds

Inconsistency between the two timers listed above can result in sub-optimal eviction policy

LRU can be completely disabled - do it at your own risk

Best Practices: Invalidations and Expiry

Memcached does not provide any mechanisms for deleting a set of associated keys (object, name, etc.). For good or for worse, you could implement this functionality yourself with the help of prepend and append commands, however, be careful with the 1mb limit! A much cleaner way to handle this situation is to forgo the invalidation process all together:

Instead of invalidating your data, expire it whenever you can - memcached will do all the work

Generate smart keys - ex. on update, increment a version number, which will become part of the key

For bonus points, store the version number in memcached - call it generation

The latter will be added to Memcached soon - as soon as Brian gets around to it

Roadmap and Future Ahead

Memcached is a force to be reckoned with already, and the roadmap is only going to solidify this position. Amongst the big objectives:

Binary protocol is in the works

Support for generations - as described in paragraph above

Multi-engine support: char based, durable (persistence!), queue

Facebook has overhauled the core to be highly-threaded, and is expected to contribute the changes

Kudos to Brian and Alan for a great presentation! Don’t forget to bookmark the slides and check back every so often, as they will be updated with time to describe best practices, common mistakes, and the roadmap ahead.