The InnoDB storage engine has done wonders for MySQL users that needed higher concurrency than MyISAM could provide for demanding web applications. And the automatic crash recovery is a real bonus too.

But InnoDB's performance (in terms of concurrency, not really raw speed) comes at a cost: disk space. The technique for achieving this, multiversion concurrency control, can chew up a lot of space. In fact, that Wikipedia article says:

The obvious drawback to this system is the cost of storing multiple versions of objects in the database. On the other hand reads are never blocked, which can be important for workloads mostly involving reading values from the database.

Indeed.

Imagine a set of database tables will tens of millions of rows and a non-trivial amount of churn (new records coming in and old ones being expired or removed all the time). You might see this in something like a large classifieds site, for example.

Furthermore imagine that you're using master-slave replication and the majority of reads hit the slaves. And some of those slaves are specifically used for longer running queries. It turns out that the combination of versioning, heavy churn, and long running queries can lead to a substantial difference in the size of a given InnoDB data file (.ibd) on disk.

Just how much of a difference are we talking about? Easily a factor of 4-5x or more. And when you're dealing with hundreds of gigabytes, that starts to add up!

It's no secret that InnoDB isn't the best choice for data warehouse looking applications. But the disk bloat, fragmentation, and ongoing degradation in performance may be an argument for having some slaves that keep the same data in MyISAM tables.

I know, I know. I can do the ALTER TABLE trick to make InnoDB shrink the table by copying all the rows to a new one, but that does take time. Using InnoDB is definitely not a use it and forget about it choice--but what database engine is, really?.

Looking at the documentation for the InnoDB plug-in, I expect to see a real reduction in I/O when using the new indexes and compression on a data set like this. (Others sure have.) But I don't yet have a sense of how stable it is.

Anyone out there in blog-land have much experience with it?

Posted by jzawodn at August 12, 2008 01:05 PM