How to choose an in-memory NoSQL solution: Performance measuring

Purpose of this paper

The main purpose of this work is to show results of benchmarking some of the leading in-memory NoSQL databases with a tool named YCSB.

We selected three popular in-memory database management systems: Redis (standalone and in-cloud named Azure Redis Cache), Tarantool and CouchBase and one cache system Memcached. Memcached is not a database management system and does not have persistence. But we decided to take it, because it is also widely used as a fast storage system. Our “firing field” was a group of four virtual machines in Microsoft Azure Cloud. Virtual machines are located close to each other, meaning they are in one datacenter. This is necessary to reduce the impact of network overhead in latency measurements. Images of these VMs can be downloaded by links: one, two, three and four (login: nosql, password: qwerty). A pair of VMs named nosql-1 and nosql-2 is useful for benchmarking Tarantool and CouchBase and another pair of VMs named nosql-3 and nosql-4 is good for Redis, Azure Redis Cache and Memcached. Databases and tests are installed and configured on these images.

Our virtual machines were the basic A3 instances with 4 cores, 7 GB RAM and 120 GB disk size.

Databases and their configurations































Append-only files in Redis and write-ahead logs in Tarantool options enable data persistence for current databases. Comparisons only for similar configurations of different databases are described in this paper. It means we don’t compare, for example, Redis with enabled append-only files and Tarantool with disabled write-ahead logs.

Yahoo! Cloud Serving Benchmark

Yahoo! Cloud Serving Benchmark, or YCSB is a powerful utility for performance measuring of a wide range of NoSQL databases including in-memory and on-disk solutions. YCSB is a branch standard for performance measuring of NoSQL solutions, which is why we are using it. We are interested in Redis and Tarantool drivers which are included in YCSB and the Memcached driver which is created by us based on the spymemcached library. The source of this YCSB branch can be seen here.

YCSB provides few core workload types that are presented in its own directory as configuration files. There are six major workload types named by letters from A to F.

Workload A is an update heavy workload. It has a mix of 50/50 reads and writes. An application example is a session store recording recent actions. Workload B is primarily a read workload. It has a 95/5 read/write mix. Application example: photo tagging; add a tag is an update, but most operations are to read tags. Workload C is 100% read. Application example: user profile cache, where profiles are constructed elsewhere (e.g., Hadoop). In Workload D, new records are inserted and the most recently inserted records are the most popular. Application example: user status updates; people want to read the latest. In Workload E, short ranges of records are queried instead of individual records. Application example: threaded conversations, where each scan is for the posts in a given thread (assumed to be clustered by thread id). In Workload F, the client will read a record, modify it, and write back the changes. Application example: user database, where user records are read and modified by the user or to record user activity.

We have changed two parameters in each of these configuration files: recordcount to 2000000 and operationcount to 5000000. YCSB is a multithreaded tester and we start it with 8, 16, 32, 64, 128 and 256 threads.

Now we will show and describe some packs of plots drawn by us in R. Sources of plot scrips can be downloaded here.

Plots

Tarantool (HASH)

Tarantool (TREE)

Redis

Azure Redis Cache

Memcached

CouchBase

NoWAL

WAL

A

A

B

B

C

C

D

D

E

E

F

F

Throughput

Throughput

Read

Read

Insert

Insert

Read-Modify-Write

Read-Modify-Write

Tarantool with both hash and tree indices is the best for all investigated workloads. It creates a lock-free in-memory engine, which does not consist of any mutexes or other concurrency primitives and uses cooperative multitasking. After considering these graphs, we can conclude that high throughput is one of the strengths of the Tarantool database.

The design of Tarantool shows the minimal average latency for read requests too. As we can see on these plots, this is true for any workload. On the 95% requests Tarantool reaches the lowest latency too. (This measure is related to average latency but they are not the same). However, on the 99% fastest requests, Tarantool does not reaches lowest latency for any workload. By this measure, Tarantool is really close to Redis in all cases and beaten by it in some of them. This situation can be described as follows: Tarantool executes part of the queries with a small latency and another part with a large latency, while Redis executes all requests with a middle latency.

For cases without write-ahead logs, Memcached and Couchbase exhibit better latency. In any case Tarantool is better than Redis by average latency and 95th percentile, but not by 99th percentile. This situation is similar with read latency and can be described in a similar way.

On insert requests and on cases without write-ahead logs, Memcached and Couchbase exhibit better results than others do in average latency and 95th percentile, but in 99th percentile we can see a completely opposite picture – these databases with Azure Redis Cache provide worse results than all others, another fact we discovered that shows Tarantool leading in comparison with Redis.

In this case, Tarantool again had the best result in average and 95th percentile and was very close to Redis in 99th percentile.

Conclusion

We described YCSB and have provided the results of comparing four popular databases, but the most significant idea considered in this paper is the way of choosing the right solution for the current workload. By looking at the plots placed within this article, it is simple to find the most suitable solution with respect to your workload type, database clients count and your expectations.

The links on our VMs images, YCSB with Memcached module and R scripts are specially published so that you can conduct your own tests and verify our results or get results for instances of different configurations (both hardware and software).

Through all tests we executed, Tarantool showed the best result for the count requests per second and for many of tests latency values on any type of examined workloads. Therefore, we can decide that for most of typical projects Tarantool suits them more that popular solutions such as Redis, CouchBase or Memcached. This is the basis of our decision to use Tarantool for our projects here at my.com.

Please enable JavaScript to view the comments powered by Disqus.