

ZeptoBars

Kasper, Jan Peter

Konstantin Lanzet

yellowcloud

RAM (random access memory) is a component of every computer system, from tiny embedded controllers to enterprise servers. In the form of SRAM (static RAM) or DRAM (dynamic RAM), it’s where data is held temporarily while some kind of processor operates on it. But as the price of RAM falls, the model of shuttling data to and from big persistent storage and RAM may no longer hold.

RAM is highly susceptible to market fluctuations, but the long-term price trend is steadily downward. Historically, as recently as 2000 a gigabyte of memory cost over $1,000 (£800 in those days); today, it’s just under $5 (~£5). That opens up very different ways of thinking about system architecture.

Databases are traditionally held on disk, from where the required information is read into memory as needed, and then processed in some way. Memory size is usually assumed to differ from disk size by several orders of magnitude—say, gigabytes vs. terabytes. But as memory size increases, it becomes more efficient to load more data into memory, reducing the number of disk reads and writes. As RAM prices continued to fall, we began to see whole databases loaded from disk into memory, operations performed, and then written back to persistent storage. Now, however, we’re at the point where some databases are never written back to persistent storage, existing entirely in volatile RAM.

Memory access speeds are measured in nanoseconds (billionths of a second), and disk seek time is usually measured in milliseconds, making memory about a million times faster. RAM transfer speeds aren’t a million times faster, of course—gigabytes per second versus a few hundred megs per second for a quick hard drive—but RAM clearly has persistent storage beat by at least an order of magnitude.

In real-world applications the differences aren’t quite that dramatic, but reading data from disk into RAM and writing it back is a huge bottleneck, as well as introducing the potential for inconsistency. Removing this step could also permit a reduced instruction set, increasing simplicity and efficiency.

As RAM prices fall, it’s already commonplace in high-end enterprise and data centre contexts to see terabytes of memory available to a server. Besides size, however, an obvious objection to keeping a database in RAM is durability. RAM is volatile, losing its contents instantly when the power goes off or the system is compromised. That presents a challenge when meeting the standard “ACID” requirements (atomicity, consistency, isolation, durability) of a reliable database.

This can be mitigated by snapshots and logs. Just as a disk-based database may be backed up periodically, an in-memory database can be copied to storage. Creating a snapshot means competing with other processes to read data, so the frequency of checkpoints will be a trade-off between performance and resiliency. This in turn can be mitigated by transaction logging, also known as journalling, which records changes made to data so that a later state can be reconstructed from an earlier copy. Still, with the live database in volatile memory, a degree of redundancy is lost.

Database management software designed for in-memory applications (IMDBS) can also enable hybrid systems, where some tables within a database are designated in-memory while the rest lives on disk. This goes beyond caching, yet is possible even where it wouldn’t be feasible to keep the entire database in RAM.

Databases can also be compressed to make the most of the available RAM capacity, especially in column-oriented systems, which store tables as collections of columns rather than rows. Most compression techniques like it when adjacent data is of the same type (binary, string, integer, etc.), and table columns are nearly always uniform in type. While compression introduces computational overheads, columnar storage is well suited to complex queries of very large data sets, which is why big data and data science users are interested in it.

At the largest scales, companies like Google have moved to RAM to enable vast searches to be carried out at acceptable speeds. Challenges remain, though, in making really big amounts of memory available to a task, because only so much RAM can be connected to a single motherboard, and sharing brings additional latencies.

Listing image by Blake Patterson