Quoting the official documentation:

Since Redis 4.0 we started to make Redis more threaded. For now this is limited to deleting objects in the background, and to blocking commands implemented via Redis modules. For the next releases, the plan is to make Redis more and more threaded.

Redis runs multiple backend threads to perform backend cleaning works, such as cleansing the dirty data and closing file descriptors. It’s no longer single-process when you fork child processes on every background save.

To make Redis more multi-threaded, the simplest way to think of is that once Redis needs to perform any Write or Read operation, that work is performed by N, previously fanned-out, I/O threads. There’s no major complexity added since the main thread is still responsible for the major tasks, but most of the advantages, because the majority of time Redis spends is in IO, as usually, Redis workloads are either memory or network bound.

1 ) Tuning Redis VMs and finding out which Redis code-paths are hot (busy on-CPU)

To find out exactly how much time we’re spending on IO, we’ve set up two n1-highcpu-96 instances on GCP, one in which we will run redis-server ( from the threaded-io branch ) and another in which we will run redis-benchmark.

To run redis-server in the most performant way, we’ve tuned both n1-highcpu-96 instances, using tuned-adm throughput-performance profile and manual settings, by:

disabling tuned and ktune power saving mechanisms.

and power saving mechanisms. enabling sysctl settings that improve the throughput performance of your disk and network I/O, and switches to the deadline scheduler .

settings that improve the throughput performance of your disk and network I/O, and switches to the . setting CPU governor to performance.

manually disabling Transparent Huge Pages.

manually raising somaxconn to 65535.

manually setting vm.overcommit_memory from 0 (default) to 1 ( never refuse any malloc ).

The manual settings we’ve referred above can be achieved by running the following commands. The last one ensures that the sysctl settings will take effect immediately.

sudo -i

echo never > /sys/kernel/mm/transparent_hugepage/enabled sysctl -w vm.overcommit_memory=1

sysctl -w net.core.somaxconn=65535

sysctl -p

1.1 ) Profiling single threaded redis-server

1.1.1 ) redis-server VM

Now, that we’ve tuned our virtual machines for our server workloads, we can start on one of them a redis-server instance, with the following configurations:

fcosta_oliveira@n1-highcpu-96-redis-server-1:/redis-server --protected-mode no --save "" --appendonly no --daemonize yes

1.1.2 ) redis-benchmark VM

To evaluate the performance of Redis and generate the multiple workloads we require to profile Redis, we will use redis-benchmark across the entire article. The official redis-benchmark program is a quick and useful way to get some figures.

We’ve forked the official one and included tests to the new Streams data types in the following github repository. We’re currently waiting for the PR revision for it to be included in Redis.

On this section we’re only interested in validating the amount of time Redis spends is in IO, to be able to use Amdahl’s law to predict the theoretical speedup when using parallel workloads. For that, we will use the new threaded redis-benchmark for sending 10M GET commands, issued by 150 clients, with a key size of 100 Bytes.

fcosta_oliveira@n1-highcpu-96-redis-benchmark-1:~/redis-benchmark -t get -c 150 -n 10000000 — threads 46 -h {ip of redis-server vm} -d 100

1.1.3 ) Profiling stack traces while running benchmark tool

While we’re running the benchmark from n1-highcpu-96-redis-benchmark-1 VM, we can then profile the redis-server stack traces in the n1-highcpu-96-redis-server-1 VM, using Linux perf_events (aka “perf”) at a fixed sample rate of 99Hz stack samples, by running the following command:

fcosta_oliveira@n1-highcpu-96-redis-server-1:~/sudo perf record -F 99 — pid `pgrep redis-server` -g -o 100_bytes_no_iothreads

Resulting in the following flame graph: