Memcached is generally treated as a black box. But what if you really need to know what’s in there? Not for runtime purposes, but for optimization and capacity planning?

demo

$ sudo peep --pretty 2479 time | exptime | nbytes | clsid | key | exprd | flushd 8658 | 613458 | 272 | 5 | "c3RhdH:171:5" | false | false 8658 | 0 | 6 | 1 | "current_c3RhdH:3" | false | false 8658 | 613458 | 281 | 5 | "c3RhdH:171:26" | false | false 8678 | 95078 | 6 | 1 | "User:1:auth:m4Uq" | false | false 8658 | 0 | 8 | 2 | "user_dGltZWxp:4" | false | false 8686 | 613486 | 1278 | 9 | "User:1:6" | false | false 8658 | 613458 | 1286 | 9 | "User:1:4" | false | false ... 8658 | 613458 | 283 | 5 | "c3RhdH:171:28" | false | false 8658 | 613458 | 277 | 5 | "c3RhdH:171:30" | false | false

Peep uses ptrace to freeze a running memcached server, dump the internal key metadata, and return the server to a running state. If you have a good host ejection mechanism in your client, such as in the Twitter libmemcached builds, you won’t even have to change the production server pool. The instance is not restarted, and no data is lost.

Installation instructions are here. Requires Linux.

a note about heap management

Memcached has two separate memory management strategies:

On read, if a key is past its expiry time, return NOT FOUND .

. On write, choose an appropriate slab class for the value; if it’s full, replace the oldest-used (read or written) key with the new one.

Note that the second strategy, LRU eviction, does not check the expiry time at all. This makes eviction an O(1) operation, because the tail of the LRU linked list is immediately available; checking within that list for evicted keys first would raise the order to O(N). This optimization has subtle implications.

the problem

In the bad old days, we added a page cache to Twitter via memcached. We used a generational key scheme: a piece of user metadata was incremented on every change, and used to construct dependent keys. This eliminated direct invalidation concerns.

However, because of the orthogonal expiry and LRU strategies, we knew that the generational keys were affecting our evictions of direct keys. The continual massive onslaught of new generational keys would sweep out older direct keys, even when the generational keys were expired, but the direct keys were fresh.

But how could we prove it was happening? Eventually we ran peep on a representative server. Aggregate results showed that 75% of our cache pool was spent on expired page caches, and our cache horizon was only 5 hours. Moving the generational keys to a separate pool raised our cache horizon to two days, a 10x improvement.

aggregating results

The best way to aggregate peep results is to use MySQL’s excellent grouping abilities. Run peep with the --ugly (unformatted) switch, and pipe it to a file. Then you can load it into an approprate MySQL schema:

CREATE TABLE entries ( lru_time int(11) default NULL, expires_at_time int(11) default NULL, value_size int(11) default NULL, suffix_size int(11) default NULL, it_flag varchar(255) default NULL, class_id int(11) default NULL, key_size int(11) default NULL, key_name varchar(255) default NULL, is_expired varchar(255) default NULL, is_flushed varchar(255) default NULL ) ENGINE=InnoDB DEFAULT CHARSET=utf8; LOAD DATA LOCAL INFILE '/tmp/peep.txt' INTO TABLE entries FIELDS TERMINATED BY ' | ' LINES TERMINATED BY '

';

This will load very fast, even for millions of keys on a laptop; LOAD LOCAL INFILE is an optimized pipeline.

Here are some handy queries for revealing cache properties:

fresh vs. expired values, by count and size

SELECT is_expired, COUNT(*), SUM(value_size) FROM entries GROUP BY is_expired; +------------+----------+-----------------+ | is_expired | COUNT(*) | SUM(value_size) | +------------+----------+-----------------+ | false | 688883 | 759401296 | | true | 472755 | 2430092660 | +------------+----------+-----------------+

histogram of values by size

SELECT ROUND(ROUND(LOG10(value_size) * 5, 0) / 5, 1) AS log, ROUND(AVG(value_size), 0) AS size, COUNT(*), RPAD('', COUNT(*) / 6500, '*') AS bar FROM entries GROUP BY log ORDER BY log DESC; +------+--------+----------+----------------------------------------+ | log | size | COUNT(*) | bar | +------+--------+----------+----------------------------------------+ | 5.6 | 422238 | 3 | | ... | 1.6 | 39 | 3652 | * | | 1.4 | 23 | 103434 | ************************************** | | 1.2 | 14 | 6369 | * | | 1.0 | 11 | 36588 | ****** | | 0.8 | 6 | 55795 | ********* | | 0.6 | 4 | 90332 | ************** | | 0.4 | 2 | 239 | | +------+--------+----------+----------------------------------------+

histogram of values by LRU age

SELECT ROUND(ROUND(LOG10( (SELECT MAX(lru_time) FROM entries) - lru_time) * 5, 0) / 5, 1) AS log, ROUND(AVG( (SELECT MAX(lru_time) FROM entries) - lru_time), -2) AS lru_age, COUNT(*), RPAD('', COUNT(*) / 6500, '*') AS bar FROM entries GROUP BY log ORDER BY log DESC; +------+-----------+----------+-------------------------------------+ | log | lru_age | COUNT(*) | bar | +------+-----------+----------+-------------------------------------+ | 4.6 | 34800 | 18064 | *** | | 4.4 | 24200 | 96739 | *************** | | 4.2 | 15700 | 212865 | ********************************* | | 4.0 | 10200 | 224703 | *********************************** | | 3.8 | 6500 | 158067 | ************************ | | 3.6 | 4100 | 108034 | ***************** | ... | 0.4 | 0 | 672 | | | 0.0 | 0 | 319 | | | NULL | 0 | 13400 | ** | +------+-----------+----------+-------------------------------------+

histogram of values by size for a namespace

SELECT key_name, ROUND(ROUND(LOG10(value_size) * 5, 0) / 5, 1) AS log, ROUND(AVG(value_size), 0) AS size, COUNT(*), RPAD('', COUNT(*) / 10000, '*') AS bar FROM entries WHERE key_name LIKE '%chunk:%' GROUP BY log ORDER BY log desc; +----------------------+------+-------+----------+----------------------+ | key_name | log | size | COUNT(*) | bar | +----------------------+------+-------+----------+----------------------+ | reply_chunk:69913 | 3.4 | 2511 | 1032 | ** | | user_chunk:1868495 | 3.2 | 1530 | 972 | ** | ... | user_chunk:1405137 | 1.6 | 40 | 2822 | ****** | | reply_chunk:2084579 | 1.4 | 24 | 4361 | ********* | | user_chunk:2455162 | 1.2 | 14 | 6141 | ************ | | user_chunk:5989656 | 1.0 | 12 | 2477 | ***** | | user_chunk:2268781 | 0.8 | 6 | 16527 | ******************** | +----------------------+------+-------+----------+----------------------+

slab allocation counts by class

SELECT class_id AS slab_class, MAX(value_size) AS slab_size, CEIL(COUNT(*) / ((1024 * 1024) / MAX(value_size))) AS allocated, ( (SELECT COUNT(*) FROM entries WHERE class_id = slab_class AND is_expired = "true") * 100 / (SELECT COUNT(*) FROM entries WHERE class_id = slab_class) ) AS `%_expired` FROM entries GROUP BY class_id ORDER BY class_id; +------------+-----------+-----------+-----------+ | slab_class | slab_size | allocated | %_expired | +------------+-----------+-----------+-----------+ | 1 | 8 | 1 | 99.8938 | | 2 | 32 | 5 | 6.0320 | | 3 | 71 | 9 | 36.4859 | ... | 34 | 187783 | 28 | 61.2903 | | 35 | 234205 | 130 | 1.8998 | | 36 | 286110 | 7 | 0.0000 | | 37 | 309832 | 2 | 75.0000 | | 38 | 397500 | 1 | 100.0000 | | 39 | 496014 | 1 | 100.0000 | +------------+-----------+-----------+-----------+

This is comparable to the memcached stats slabs command, but lets you investigate things like expired ratio by slab, etc.

Play around and see what else you can come up with.

conclusion

Generational keys are often presented as the easy way to cache, but consider their implications. How else are you abusing your cache? Some suggested things to check:

Keys without expiry times (which will bite you when you alter the server pool)

Excessive use of very large slabs

Uneven LRU age by slab (indicating poor slab allocation)

Unexpected namespaces (revealing invalidation bugs)

The only unfortunate thing about peep right now is that it blocks the memcached server. I could do an incremental version, but it would be quite slow. The current version is already CPU-intensive, taking 15 minutes to dump a 4GB memcached on modern hardware (turn off your client-facing processes if you run mixed front-ends!). A version in C would be much quicker, if someone is up for it, but for now the Ruby version is sufficient.

Happy peeping.