Recently we migrated our memcached cluster to a new larger one. This needed to be done mostly for reliability and speed, but it’s also nice being able to have access to new stats like ‘reclaimed’ in 1.4.5.

We decided against migrating data from cluster to cluster because there are long-expire keys we no longer care about. Instead, we just ran some scripts to fill the new cluster with new data. After the scripts finished and the new cluster ran in production for a couple days, it only contained ~70% the data the old cluster possessed. We were curious what that old data was, so we decided to dump it and analyze.



Libmemcached has a couple of nice tools written by Brian Aker to facilitate doing the dumping (memdump and memcat). Of course dumping millions of keys sequentially would take days. So using GNU Parallel I parallelized the job.

memdump --servers="server1,server2,server3" | parallel -j64 -N1000 \

'memcat --servers="server1,server2,server3" --verbose {} >> ./logs/found 2>> ./logs/notfound'

memdump will dump all the keys from the listed servers which is passed to parallel to run up to 64 jobs in parallel ( -j64 ) and pass up to 1000 arguments ( -N1000 ). I chose 1000 arguments because on the box I was dumping to the max line length is 261666 characters (using parallel --max-line-length-allowed ) so I divided that by 255 (max memcache key length) and rounded it to a nice even number. By passing ( --verbose ) memcat will also output the key, flags, length and value to stdout; if the key is not found it prints memcat: some:key:a3e352 not found to stderr.

Average speed was approximately 70Mb/s with bursts over 100Mb/s from 3 memcached instances all 4 machines are connected to a gigabit switch.

xargs would work if memcat is wrapped in shell script since xargs stops processing jobs when the called utility exits with a non-zero status (which is what happens when memcat receives a ‘not found’ response from memcached).