Memcached is often threated as a zero-configuration system. This isn’t quite true. Proper configuration to the hardware and client application’s behavior can significantly improve overall performance.

Memcached is pretty simple, and there is a handful of statistics. You only need to know a bit of internals to use it. Luckily, it’s not a complicated thing at all.

As usual, there is no silver bullet. You’re free to experiment. I even encourage you to do this.

Stats

The starting point of performance tuning is the stats. The easiest way to see stats is to connect to the Memcached server with telnet and run stats in the telnet console.

$ telnet localhost 11211 Connected to localhost. Escape character is '^]'. telnet> stats STAT pid 12950 STAT uptime 55 STAT time 1429483226 STAT version 1.4.15 STAT libevent 2.0.21-stable STAT pointer_size 64 STAT rusage_user 4.758163 STAT rusage_system 8.206732 STAT curr_connections 10 STAT total_connections 123 STAT connection_structures 42 STAT reserved_fds 20 ...

Command Hits / Misses

There are number of hits/misses stats for every command, and these can reveal problems in application behavior. For instance, an application may aggressively cache short-living or rarely used objects, causing a hight rate of misses and evictions. The rule of thumb here is to keep the cache miss ratio close to zero.

Evictions

One of the most important stats is evictions . It is the number of non-expired Items that were removed from the cache to free up space for new items.

A high number of evictions may indicate that applications overuse the cache or that the amount of memory allocated for Items storage is not sufficient. The maximum amount of memory used to store Items can be increased by the -m command-line argument (default value is 64Mb).

Connection Yields

The number of times when a connection was yielded during batch execution.

The slowest part of the request execution is the network roundtrip. The application should use batching to achieve maximum performance. Several write requests can be combined into a single network IO operation. Read operations should utilize multi-key requests.

The downside of using batches is possible connection starvation - the situation when one connection is forced to wait until another finishes its batch. Memcached limits the number of requests per single network IO operation to prevent starvation. Every time a connection tries to execute a batch bigger than this limit; Memcached moves this connection to the back of the processing queue and increments the conn_yields stat.

The command-line parameter -R is in charge of the maximum number of requests per network IO event (default value is 20). The application should adopt its batch size according to this parameter. Please note that the requests limit does not affect multi-key reads, or the number of keys per get request.

Other Command-line Arguments

Number of threads

The most important setting influenced overall performance: number of threads -t . The default value of 4 is a good choice for almost every setup. Avoid using more than eight threads, it leads to high lock contention inside of Memcached and performance degrade.

A configuration with a single thread should never be used, even on a single-core machine.

Disable use of CAS

A compare-and-swap operation of Memcached requires a separate Item field to store unique CAS values. It takes an additional 8 bytes of memory per Item. If CAS operation is not used by your application, -C flag can save some memory.

Memcached as In-memory Storage

There is -M argument that tells Memcached to reply with an error when out-of-memory instead of evicting existing items. In that way, it’s possible to use Memcached as a consistent in-memory storage.

General Advices

Use the pool of long-living connections to the server. Memcached is designed to serve many connections simultaneously and with tens of connections you will achieve much better performance.

Store Items over TCP, retrieve over UDP. As was said: the network transportation is a main source of delays. On a high data volume, TCP packet size overhead can significantly impact overall performance.