The new release of Maatkit has a useful feature in mk-index-usage to help you determine how indexes are used in more flexible ways. The default report just prints out ALTER statements for removing unused indexes, which is nice, but it’s often helpful to ask more sophisticated questions about index usage. I’ll use this blog’s queries and indexes as an example.

The new feature lets you store the index usage into tables in a database, so you can query them with SQL. I logged all queries to the blog for a little while, and then secure-copied the query log to my laptop, which is across the continent. I used a variation on MySQL Sandbox to set up a little throw-away MySQL instance — that took ten seconds. And then I set up an SSH tunnel from my laptop to the MySQL Performance Blog server — another 30 seconds.

Now I’m set up: I can crunch the log locally, make it connect over the SSH tunnel to query EXPLAIN and find out the index usage on the remote server, and store the results in my MySQL sandbox instance. The new option is –save-results-database. The full mk-index-usage command looks like this:

<pre>mk-index-usage -h 127.0.0.1 -P 9999 -p XXXX slow_query.log \ --save-results-database h=127.0.0.1,P=12345,u=msandbox,p=msandbox,D=index_usage \ --create-save-results-database 1 2 3 < pre > mk - index - usage - h 127.0.0.1 - P 9999 - p XXXX slow_query .log \ -- save - results - database h = 127.0.0.1 , P = 12345 , u = msandbox , p = msandbox , D = index _ usage \ -- create - save - results - database

After that finishes, the ‘index_usage’ database contains several tables:

<pre> mysql> show tables; +-----------------------+ | Tables_in_index_usage | +-----------------------+ | index_alternatives | | index_usage | | indexes | | queries | | tables | +-----------------------+ 1 2 3 4 5 6 7 8 9 10 11 < pre > mysql > show tables ; + -- -- -- -- -- -- -- -- -- -- -- - + | Tables_in_index_usage | + -- -- -- -- -- -- -- -- -- -- -- - + | index_alternatives | | index_usage | | indexes | | queries | | tables | + -- -- -- -- -- -- -- -- -- -- -- - +

Now let’s run some queries! From the documentation, you can copy-paste the examples. Let’s start with this one: which queries sometimes use different indexes, and what fraction of the time is each index chosen?

<pre>SELECT iu.query_id, CONCAT_WS('.', iu.db, iu.tbl, iu.idx) AS idx, variations, iu.cnt, iu.cnt / total_cnt * 100 AS pct FROM index_usage AS iu INNER JOIN ( SELECT query_id, db, tbl, SUM(cnt) AS total_cnt, COUNT(*) AS variations FROM index_usage GROUP BY query_id, db, tbl HAVING COUNT(*) > 1 ) AS qv USING(query_id, db, tbl); +----------------------+--------------------------------------------------------------+------------+-----+---------+ | query_id | idx | variations | cnt | pct | +----------------------+--------------------------------------------------------------+------------+-----+---------+ | 1262633894049058504 | mpb_wordpress.wp_posts.post_status | 2 | 35 | 70.0000 | | 1262633894049058504 | mpb_wordpress.wp_posts.type_status_date | 2 | 15 | 30.0000 | | 7675136724153707161 | mpb_wordpress.wp_comments.comment_approved | 2 | 18 | 2.4129 | | 7675136724153707161 | mpb_wordpress.wp_comments.comment_post_ID | 2 | 728 | 97.5871 | ... snip ... 14 rows in set (0.00 sec) 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 < pre > SELECT iu .query_id , CONCAT_WS ( '.' , iu .db , iu .tbl , iu .idx ) AS idx , variations , iu .cnt , iu .cnt / total_cnt * 100 AS pct FROM index_usage AS iu INNER JOIN ( SELECT query_id , db , tbl , SUM ( cnt ) AS total_cnt , COUNT ( * ) AS variations FROM index_usage GROUP BY query_id , db , tbl HAVING COUNT ( * ) > 1 ) AS qv USING ( query_id , db , tbl ) ; + -- -- -- -- -- -- -- -- -- -- -- + -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- + -- -- -- -- -- -- + -- -- - + -- -- -- -- - + | query_id | idx | variations | cnt | pct | + -- -- -- -- -- -- -- -- -- -- -- + -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- + -- -- -- -- -- -- + -- -- - + -- -- -- -- - + | 1262633894049058504 | mpb_wordpress .wp_posts .post_status | 2 | 35 | 70.0000 | | 1262633894049058504 | mpb_wordpress .wp_posts .type_status_date | 2 | 15 | 30.0000 | | 7675136724153707161 | mpb_wordpress .wp_comments .comment_approved | 2 | 18 | 2.4129 | | 7675136724153707161 | mpb_wordpress .wp_comments .comment_post_ID | 2 | 728 | 97.5871 | . . . snip . . . 14 rows in set ( 0.00 sec )

Here’s how to read this — the first row says that query 1262633894049058504 has two variations. One variation uses the mpb_wordpress.wp_posts.post_status index, 70% of the time in fact. The next variation is in row 2, the same query, which uses an index on post_status the other 30% of the time. If you look a few rows down, you can see that query 7675136724153707161 has a much more skewed index usage: 2.4% of the time it uses one index, and the rest of the time it uses another. These kinds of variations in query execution plans are important. You don’t want a small fraction of queries to have very different performance, in general — you want consistent performance. Maybe the performance IS consistent, we don’t know that from looking here, but we know that there is something worth looking into.

What is query 7675136724153707161, anyway?

<pre>mysql> select * from queries where query_id = 7675136724153707161\G *************************** 1. row *************************** query_id: 7675136724153707161 fingerprint: select * from wp_comments where comment_post_id = ? and comment_type not regexp ? and comment_approved = ? sample: SELECT * FROM wp_comments WHERE comment_post_ID = 2257 AND comment_type NOT REGEXP '^(trackback|pingback)$' AND comment_approved = '1' 1 row in set (0.00 sec) 1 2 3 4 5 6 7 < pre > mysql > select * from queries where query_id = 7675136724153707161 \ G * * * * * * * * * * * * * * * * * * * * * * * * * * * 1. row * * * * * * * * * * * * * * * * * * * * * * * * * * * query_id : 7675136724153707161 fingerprint : select * from wp_comments where comment_post_id = ? and comment_type not regexp ? and comment_approved = ? sample : SELECT * FROM wp_comments WHERE comment_post_ID = 2257 AND comment_type NOT REGEXP '^(trackback|pingback)$' AND comment_approved = '1' 1 row in set ( 0.00 sec )

How about another question: which indexes have lots of alternatives, i.e. are chosen instead of other indexes, and for what queries? This time I’ll add LIMIT 2 to the query, because there are lots of them:

<pre>mysql> SELECT CONCAT_WS('.', db, tbl, idx) AS idx, -> GROUP_CONCAT(alt_idx) AS alternatives, -> GROUP_CONCAT(DISTINCT query_id) AS queries, SUM(cnt) AS cnt -> FROM index_alternatives -> GROUP BY db, tbl, idx -> HAVING COUNT(*) > 1 limit 2; +------------------------------------+---------------------------------------------------+------------------------------------------+------+ | idx | alternatives | queries | cnt | +------------------------------------+---------------------------------------------------+------------------------------------------+------+ | mpb_forum.f.PRIMARY | fud26_forum_i_c,fud26_forum_i_c,fud26_forum_i_lpi | 6095451542512376951,11680437198542055892 | 20 | | mpb_forum.fud26_msg.fud26_msg_i_ta | PRIMARY,fud26_msg_i_a | 5971938384822841613 | 2 | +------------------------------------+---------------------------------------------------+------------------------------------------+------+ 1 2 3 4 5 6 7 8 9 10 11 12 < pre > mysql > SELECT CONCAT_WS ( '.' , db , tbl , idx ) AS idx , -> GROUP_CONCAT ( alt_idx ) AS alternatives , -> GROUP_CONCAT ( DISTINCT query_id ) AS queries , SUM ( cnt ) AS cnt -> FROM index_alternatives -> GROUP BY db , tbl , idx -> HAVING COUNT ( * ) > 1 limit 2 ; + -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- + -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- - + -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- + -- -- -- + | idx | alternatives | queries | cnt | + -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- + -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- - + -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- + -- -- -- + | mpb_forum .f .PRIMARY | fud26_forum_i_c , fud26_forum_i_c , fud26_forum_i_lpi | 6095451542512376951 , 11680437198542055892 | 20 | | mpb_forum .fud26_msg .fud26_msg_i_ta | PRIMARY , fud26_msg_i_a | 5971938384822841613 | 2 | + -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- + -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- - + -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- + -- -- -- +

So the primary key on a table in the forum database is favored over a couple of other indexes, for queries 6095451542512376951 and 11680437198542055892.

I think you can see how this goes — you can query these tables any way you want. The documentation includes a number of other canned queries you can run. I found a few things that I want to improve about the canned queries while writing this blog post. If you have suggestions, please post in the comments or file issues at the Maatkit bug tracker. And enjoy learning how your indexes are used!