Hey!

In this article, I’d like to talk about when I use an in-memory database and when I prefer a traditional DBMS and why.

When I need to decide which DBMS to use — in-memory (let’s call it IMDB) or traditional (I’ll be calling it RDBMS) — I usually make a choice based on the type of storage where my data is going to be kept. I personally divide all the numerous options out there into three groups: RAM, solid-state drive or flash memory (SSD) and hard disk drive (HDD). First, I pick a type (or types) of storage to use and then start thinking about what database (or databases) I want to have on top of that.

Each of the three data storage types I mentioned has its own distinct access time and speed. And they all fall into different price ranges. In the long run, both the IMDB and RDBMS can work with all three. An IMDB stores a copy of data in RAM and persists this data on an SSD or HDD. An RDBMS uses RAM for caches, write-back buffers and other optimizations, and also persists data on an SSD or HDD. Since both DBMSs can use all the three storage types, we need to first decide what data we’re going to keep in each storage and how we’re going to move it between different storages — and only then do we pick a DBMS. At least that’s what I do before I make up my mind about a particular DBMS.

Before we make plans about what data to store and where to store it, let’s recall the characteristics of each storage type.

RAM

It’s well-known RAM is lightning fast, both in case of sequential and random access (it was named random-access memory for a reason). What I mean by lightning fast is that random access time is mere tens of nanoseconds (1e-8 seconds), that is RAM performs random reads/writes at 100 MB per second. Sequential reads/writes are even faster — 1 GB per second and higher. You can find more information on how RAM works here.

SSD

An SSD is slower than RAM. Random access time is a fraction of a millisecond (1e-5 seconds): you can randomly read around 10,000 blocks per second. In the worst-case scenario, where you only need one byte from each block, the speed is 10 kB per second. It’s 1,000 times slower than RAM! However, sequential reads/writes are performed at some 200–300 MB per second, which is somewhat on a par with RAM. Details about an SSD are here and here.

HDD

The slowest of the three storage types in question. Random access time is 10 milliseconds (1e-3 seconds), which is 100 times slower than an SSD: you can randomly read/write 100 blocks per second, that is if you need to read random bytes, the speed is only 100 bytes per second in the worst case! That’s because the whole block needs to be read for the sake of one random byte. Randomly writing bytes is, on average, even slower: we first read a random block, wait for one full disk rotation and then write this block back with a changed byte. Details can be found here and here.

There are other storage types, such as flash memory, SAS, magnetic tape and many more, that I’ve never used for certain reasons. Say, SAS is slower than an SSD but costs (at least considering our purchasing volumes at Mail.Ru Group) almost the same. Flash memory, on the other hand, is more expensive than an SSD, but — in my use cases at Mail.Ru Group — the speed gain it provides is negligible (space considerations don’t allow me to discuss it in more detail in this article). As for magnetic tape, in terms of storage volume, a tape drive is more expensive (again, in my use cases) than a high-capacity HDD, but at the same time slower with respect to access time. Your mileage may vary, and HDDs, SSDs and RAM may not be your main storage types. In which case I highly recommend you stop reading this article since it’s going to be totally useless for you: I don’t really have any substantial hands-on experience with any other storage types, except for those three I listed above.

Let’s talk price now. To be honest, up to a certain point in my career, I used to believe I need to think not about the price, but about technologies. I would talk to the system administrators to find out what hardware is required for my programs, then tell my managers what hardware I need. They would (or wouldn’t) allocate the funds, I would place an order with the procurement department, receive the hardware and start using it (and if no funds were allocated, I had to make do with whatever was available at the time). What a life that was!

However, at some point, I found out, to my surprise, that it can be the other way around: here’s a fixed budget — make everything work (it doesn’t strike me as surprising anymore, but back in the day it was like a cold shower). More challenging tasks sound like “Given our current spending patterns, make everything work faster” or “Given less money, make it so that performance stays the same”, but the most hardcore one is “Cut costs and make everything work faster”. But it’s much more fun when you constantly come up with ideas on how to save more money or squeeze more performance out of the hardware and bring these ideas to life without involving senior management.

So, back to the price. At first glance, the situation here is pretty straightforward: on average, each subsequent storage type is ten times more expensive than the previous one. An HDD is the cheapest storage type, an SSD is ten times more expensive than an HDD, and RAM is ten times more expensive than an SSD. These are average figures and based entirely on my own experience (given our purchasing prices). Please don’t try to find fault with my calculations or compare them to some other figures known to you — there are lots of more pleasant ways to spend your free time =)

Another thing to keep in mind is that storages aren’t floating in the air — they need to be plugged into servers that don’t come free, either. Moreover, a server can hold a certain amount of data from each storage type. On top of that, a server takes up several units in a server rack, which also costs money (electricity bills, rental fees (even in your own data center, renting a server rack isn’t free and must be accounted for in your business model: there’s loss of expected profit since you’re not leasing it to others) and so on).

Now that we’ve discussed the speed and the price of the three storages, let’s turn to when and what to store in each of them.

If I need access time of 1 ms or less, my only choice is RAM. An SSD also matches this criterion, but only in terms of seek time. However, in my typical workflow, to process a single user request, not one, but several seek operations are often needed, and then, due to some additional requirements, there may be extra requests to disk. In anticipation of this scenario, I usually budget for RAM right from the start. If I need access time of up to 100 ms, I opt for an SSD. If access time of over 100 ms is fine, I could consider using an HDD. Recall that its seek time is 10 ms (as mentioned above), but in a real-life use case, random access leads to multiple seeks; moreover, disks are busy doing other work, which diminishes the maximum number of IOPS, seek time and seeks per second that can be performed. Let’s now dwell on sequential access a little. While, in terms of random access time, an HDD is 100 times slower than an SSD that is 1,000 times slower than RAM, the situation with sequential access time is hugely different: an HDD is 2–3 times slower than an SSD that is 3–5 times slower than RAM. And to refresh your memory, RAM is 10 times more expensive than an SSD that is 10 times more expensive than an HDD. What does it tell us? Random access to slow storages is more costly (the slower the storage, the higher the cost), and sequential access to slow storages is cheaper (the slower the storage, the lower the cost). Thus, slow storages were made for sequential access (otherwise, they would have faded into oblivion a long time ago). To reiterate, if we need to sequentially access massive datasets and are satisfied with the speed of 100 MB/s, the most cost-effective storage for us is an HDD. It’s 10 times slower than RAM with respect to sequential access time, but 100 times cheaper as well. It means we could buy 10 times more (in terms of bytes) free HDD space than RAM with the same total throughput. In most cases, I’m OK with 100 MB/s, because we also need to take into account the network interface or the Internet connection speed of an end user, so I store everything that needs to be accessed sequentially on HDDs. It’s important to note that I’m talking only about my situation here. If you operate on a smaller scale, the price may have a lower priority. To sum up, I group my data into the four categories (items 1–4 above) and keep each category in separate storage. Now, the question is what DBMS(s) to use on top of it all.

Let me first share my observations of some advantages and disadvantages of both the IMDB and RDBMS:

In terms of writes and reads from disk, an IMDB is faster than an RDBMS, even if a machine hosting an RDBMS has so much memory on board that all data fits in it.

An IMDB can efficiently persist data on an HDD. An RDBMS doesn’t usually work with an HDD as efficiently and requires an SSD (or other fast storages like SAS or flash memory).

An IMDB usually scales poorly to multiple CPU cores.

An IMDB won’t work unless all the data fits in RAM.

Now, we can move on to making a choice between an IMDB and an RDBMS.

As we’ve already discussed, whatever requires online access (less than 1 ms) must be stored in RAM.

I usually put this kind of data in an IMDB, because, from my experience, that’s the fastest and most hardware-efficient solution for RAM-resident data. An RDBMS with fully cached data is tens, if not hundreds, of times slower.

As an example, let’s take benchmark results of MariaDB, which is currently one of the most high-performance RDBMSs out there (it was designed by ex-creators of MySQL), at least on the open-source market. The benchmark was run by MariaDB developers themselves, so one would hope the RDBMS has been fine-tuned to demonstrate peak performance. Benchmark details can be found here. The results show a million requests per second on a very expensive machine called IBM Power8 with 20 cores and 160 hardware threads (which is equal to having 160 cores on board). Compare it to benchmark results of an IMDB called Tarantool that performs a million transactions on a virtual single-core AWS server (by the way, the code is open-source, so you can use it yourself to reproduce the results, for example, on a t2.micro AWS instance). Benchmarks of MariaDB and Tarantool are different, but the purpose is the same — to squeeze every last drop of performance out of a database. Were it possible to speed it up even more, it would’ve been done for sure.

The difference is roughly 160 times. One can go on and on about whether Power8 has really parallel cores (however, it does have at least 20 cores) or what hardware Amazon services run on, but, on the whole, the difference is pretty evident. It’s an honest enough benchmark in the sense that both vendors have polished their products to deliver top performance.

Let me reiterate one of the main disadvantages of an IMDB — usually, it scales poorly to multiple CPU cores. For example, Tarantool uses 3 cores at best. If you want to squeeze more out of it, you need to shard it the same way as if you were distributing its workload across multiple servers. On the other hand, if Tarantool performs better on a single core than an RDBMS does on the whole server (I’m not talking about Power8 here, but a regular Supermicro server with an 8- or 16-core processor), it’s possible to keep the other cores idle or use them to run separate Tarantool instances for other services.

I have to give an RDBMS its due. If online data needs to be accessed via SQL, I’d rather put it an RDBMS with the cache size set to maximum, because hardly any IMDB can boast a stable implementation of SQL nowadays (Tarantool is going to have it soon, though). Yes, it’s going to be more memory-consuming than a pure IMDB and require more hardware, but it would save the developers’ time, which may very well turn out to be much more profitable in the long run.

All the data that doesn’t need to be accessed online (that is with allowable access time of up to 100 ms) is stored either in an RDBMS or in an IMDB, or elsewhere (more on that later) — it depends on a number of factors.

Say, if I’ve already decided to store online data in an IMDB and keeping non-online data there as well doesn’t influence the system’s total cost of ownership (TCO) much (RAM clearly has a higher price than both the HDD and SSD — and that’s why storing non-online data in an IMDB is more expensive), I’m going to store this non-online data in my IMDB in order to keep the number of different DBMSs as low as possible. Otherwise, if the impact on the TCO is significant, I’ll opt for an RDBMS or something else (see item 4).

Sometimes, seemingly illogically, I put non-online data into an IMDB, even though there’s no online data in it. Why would I do that? It’s because an IMDB uses the processor more efficiently than an RDBMS does when accessing data (it performs less copy operations and has more advanced in-memory data structures since it’s optimized for working with memory).

From time to time, I have a choice to make — store a 1 TB database on 10 machines having an IMDB and a 200 GB HDD each, with the replication factor of 2, or use 8 machines having an RDBMS and an SSD and holding a 1 TB database each, with the replication factor of 8 (because the RDBMS consumes 4 times more CPU resources in this case). 10 machines with low-capacity HDDs may cost less than 8 machines with high-capacity SSDs (if you want to know why an IMDB can make do with HDDs, whereas an RDBMS usually requires SSDs, check my article here). Moreover, compared to an RDBMS, an IMDB may have cheaper and less powerful processors on board: if I have a lot of machines when using an IMDB, it’s not because a single processor can’t handle the workload — usually, we just don’t install more than 200–300 GB of RAM on a machine to keep its restart time manageable.

Previously, I mentioned “elsewhere”. So, what alternatives to an RDBMS and an IMDB exist? One option is a simple file system.

For example, in a service called Cloud@Mail.Ru, users’ files are kept in a regular file system (XFS), with copies stored in several data centers. Why is that? These files are quite large (4 MB, on average), so storing them in a DBMS creates extra workload on the processor and disks. Moreover, the files are unstructured and always read and written in one piece. Given this pattern, any additional workload on the DBMS is regarded as superfluous and even harmful. There are a number of alternative storage types. Say, the ideal storage for historical analytical data that is accessed sequentially a lot is Hadoop on top of HDDs. But I’m digressing, let’s not forget we’re talking IMDB vs RDBMS here.

If I don’t have online data (access time of about 100 ms is fine with me) and I know I’ll mainly be using SSDs, then an RDBMS is the best option in this case (except for what I said previously).

In practice, it’s usually 1 TB or more of data under moderate workload and without strict requirements for access time.

I’d like to conclude this article by touching upon what I try not to do. I try not to mix online and non-online data in one RDBMS.

Say, I won’t be storing users’ profiles and their authentication history in one and the same RDBMS with fine-tuned caches so that users’ profiles (which is frequently requested data) are cache-resident. I’d rather use two DBMSs — an IMDB for profiles and an RDBMS for historical data — which will solve the cold start issue, consume less memory resources and ensure faster access to users’ profiles.

Thanks, and stay tuned for more articles! =)