TL;DR: use TokuDB. Always. It massively reduces storage requirements and is massively faster than InnoDB, even on low end hardware. My tables were compressed to just 12% original size and that’s fairly typical.

A very simple structure:



CREATE TABLE test (

date DATE NOT NULL,

site VARCHAR(8) NOT NULL,

order_id VARCHAR(64) NOT NULL,

time TIME NOT NULL,

kind ENUM(...) NOT NULL,

account INTEGER,

user INTEGER,

client_price BIGINT,

supplier_id INTEGER,

supplier_cost BIGINT,

client_ref VARCHAR(32),

data JSON,

PRIMARY KEY (date, site, order_id),

INDEX (account, user, client_ref) ) DEFAULT CHARSET utf8

PARTITION BY RANGE (TO_DAYS(date)) ( ...



This system is using the RDBMS as a document store; the order items are being flattened into a document store (the ‘data’ field).

The ETL is upserting order events as they are processed, using multi-row INSERT INTO … ON DUPLICATE KEY UPDATE …

And I’ve been running performance tests with some real data. 27M rows of real data, all on the same day (so all in the same partition). 37M upsert statements inserting or updating those 27M rows, issued in batches of 1K-2K per multi-row statement. This is enough rows to be able to get a grasp of upsert performance and extrapolate storage requirements.

The test box is an AWS t2.medium instance. This is a very weak VM with 2 (virtual) CPUs and 4GB RAM. Not normal DB fare, so performance beware.

The 27M rows take 10.48GB uncompressed. It takes InnoDB 10 hours to load all the data.

In the old days, MySQL InnoDB offered ‘row_compression’. This compresses individual rows. This reduces disk requirements to 5.73GB (45% space saved). So it halves the disk space! Sadly, the test now took 15 hours instead of 10 hours.

More recently, MySQL introduced ‘table compression’. (MariaDB also added it, calling it ‘page compression’; the CREATE TABLE syntax is infuriatingly slightly different).

Underneath, InnoDB is organized as ‘pages’. This table compression intercepts these page reads and writes and, as pages are written to disk, they are individually compressed and written as a ‘sparse’ file.

As these InnoDB pages are larger than the file system pages, there is the opportunity for a ‘hole’ to be ‘punched’ in each InnoDB page. Imagine an InnoDB page is 12KB, and the file system page size is 4KB. The page may compress down to 1KB, meaning it only needs the first file system page (4KB), and 8KB would be saved. This example illustrates how wasteful this approach can be; there can still be a lot of slack in each page. The total file-size as reported by ls is unchanged, but the actual used disk space (as shown by e.g. ls -s) is reduced. Page compression compresses this table to 4.07GB (61% savings). The bad news is that performance sinks awfully! I aborted the run after 32 hours, as upsert speed had dropped to under 100 orders/sec. I’ve seen this on MariaDB RDS and on my little t2.medium. Its atrocious!

TokuDB is an alternative storage engine to InnoDB. It is nicely packaged by Percona so its very straightforward to install.

TokuDB compresses databases with zlib by default. It compresses pages, but doesn’t store them as sparse files; instead, the total file size you see on disk is the total file size.

TokuDB with zlib compresses to 1.45GB (86% savings)! This is the same compression algorithm as InnoDB page compression and operating at much the same logical page level, but conferring much bigger savings because each ‘page’ is not rounded up to a multiple of the file system page size.

Because InnoDB puts the whole table (or, as in this case, partition) into a single file you cannot see how much of the storage is for the primary rows and how much is for secondary indexes. TokuDB, however, has separate files for each index. So in this case there are two files for the partition, and I can see that the 27M rows takes 1.02GB and the by-client index takes 0.44GB.

TokuDB allows you to specify other compression algorithms. The weakest (thus fastest) is ‘snappy’, which is an LZ77 compressor without entropy coding. It compresses to 2.31GB (78% savings; same ratio rows to index).

There is also QuickLZ (between snappy and zlib in CPU requirements; 1.95GB; 81% savings) and LZMA (stronger than zlib; 1.23GB; 88% savings!).

LZMA is a clear winner here. What’s fascinating about the TokuDB though is not just its space-saving but also its performance: it takes just 2 hours to run my ETL and choice of compression algorithm has no noticeable effect on that!

Now the dire uncompressed InnoDB performance is perhaps the classic “InnoDB is not good with databases that don’t fit into RAM”. I was using Percona packages so the defaults were not as mad as they used to be out-of-the-box with other linux sources, but I did up it to 2GB and a big log. Configuring InnoDB optimally is like reading tea leaves and no amount of doing it makes me confident I’ve hit the sweat spot for any particular workload. This upsert workload seems to be particularly stressful. The t2.medium has two (virtual) CPUs, and as upsert performance dropped I saw ‘top’ reporting load average over 2.0 even though ETL was waiting on the DB. Not good.

However the dire compressed InnoDB performance I saw repeated on a MariaDB RDS m4.2xlarge with the same data-set. There’s something wrong with either InnoDB page compression, or with EBS spare file performance!

On the other hand, TokuDB shows how to do things properly! The out-of-the-box defaults are very sane (it doesn’t use O_DIRECT; instead, it takes 50% to store uncompressed pages, and lets the OS cache the compressed pages in the normal file buffer cache. Inspired!). And the performance is just so stable! On the t2.medium, which is a piddling little box, I saw 5000 orders/sec sustained throughout the entire run.

I think I’ve heard that TokuDB is marginally slower than (uncompressed) InnoDB on small tables. And perhaps most users have small tables. I don’t think there’s actually that measurable a gap when everything is tiny. But the moment you go over a million rows or contemplate partitioning then I think TokuDB is the obvious choice. In fact it would make a very good default.

Oh, and for kicks, try a COUNT(*) on a big InnoDB table. And then go make a coffee! TokuDB … snap! Not based on meta-data like MyIASM did neither; just so much faster.

TokuDB redefines what a ‘small’ database is. It moves the goalposts. It means I can build bigger systems without scaling sideways. I don’t get why everyone still uses InnoDB ;)

"share"