Why do B-trees need “Tricks” to work?

Marko Mäkelä recently posted a couple of “tips and tricks” you can use to improve InnoDB performance. Tips and tricks. A general purpose relational database like MySQL shouldn’t need “tips and tricks” to perform well, and I lay the blame on design choices that were made in the early ’70s: the B-tree data structure underlying all OldSQL databases. B-trees were designed for machines that had very different performance characteristics than the machines of today. Hardware has changed, but B-trees are the same. Tips and Tricks are an attempt to make up the difference.

So B-tree implementers — InnoDB, Oracle, MS SQL Server — are fighting an uphill battle; they’re fighting the future. B-trees just aren’t meant to cope with high-bandwidth, slow-seek-time storage systems, because they perform unnecessary disk seeks. In fact, they aren’t meant to cope with lower-bandwidth, fast-seek-time storage systems (like SSDs), because they waste bandwidth. Hardware trends move computers further and further from B-trees’ sweet spot. In 1972, they were a great idea, but a lot has changed.

Don’t get me wrong. InnoDB is a great piece of software. In my own testing, its B-tree implementation outperforms the big enterprise implementations, though it lacks some features like hot indexing and hot column addition and deletion.

So the problem is OldSQL B-trees, which have two big shortcomings. These are both illustrated very well in the tips and tricks posting. They are slow to index, and they fragment. Let’s look at Marko’s points one by one:

Marko’s first point is that “Data Dictionary Language (DDL) operations have traditionally been slow in MySQL”. The slow part of DDL is mostly ALTER TABLE commands. Hot Column Addition and Deletion and Hot Indexing, recently released in TokuDB 5.0, brings the downtime of these schema changes from hours to seconds.

Marko’s second point is that insertions, deletions and updates “can be slow [with InnoDB] even despite the change buffering in MySQL 5.5.” Fast insertions, deletions and updates have been a hallmark of TokuDB’s Fractal Tree Indexes since they were introduced.

Marko’s next point is that if you populate an InnoDB index out of order, it will fragment. His proposed solution involves dropping and adding indexes, which means table downtime. The best way to deal with fragmentation is to not fragment your tables in the first place. So what’s the solution? You guessed it: TokuDB tables don’t fragment by design, so this particular chunk of maintenance is not needed.

Finally, Marko discusses fill factors in B-trees, and the impact of packed versus unpacked nodes on future modifications. TokuDB is based on cache-oblivious analysis, which means that you don’t need to worry about fill factors, and it means that there are very few parameters to set. In fact, we recommend that you leave the parameters set to the defaults.



Treats, not tricks

In short, TokuDB’s Fractal Tree indexes are fast to update (20x-80x faster than InnoDB) and never fragment. Hot schema changes, fast insertions, no fragmentation, and no tuning. That’s what happens when your solution matches your hardware.

We have sessions coming up at both O’Reilly and Collaborate 11 conferences next week, as well as a booth at O’Reilly. Please come by and tell us your “tips and tricks” (and band-aids and duct tape) for working around old database structures, and we’ll work to get you on a better path with Tokutek’s NewSQL alternative.