What better way to learn the basics of MySQL Performance Troubleshooting, than by observing a MySQL expert troubleshoot a nasty MySQL 5.7 performance bug, that a team of DBAs were chasing for weeks before giving up and calling in an external consultant for help.

The bug was causing roughly 2 minute long lock-ups for all MySQL write queries at seemingly random 20-30 hour intervals and it took me (the external consultant) only 4 hours to uncover the issue (and a couple more to find and showcase a solution).

Once upon a time…

Wait. Before we start with the story, I would like to share a couple of basic principles I obey when troubleshooting. A lot of this applies to troubleshooting other types of systems too, but since you’re here for MySQL, let’s focus on that.

F

irst of all, as you may already know from my earlier posts, I am a big fan of a systematic approach (and you’ll see me use it later as the story unfolds), however, that’s not what I usually start with when troubleshooting a new system. The reason is that over the last 12 years I have developed a pretty good intuition, and more often than not, I can find a root cause quite easily before I get to the digging part. In other words, it saves a lot of time.

An alternative to systematic approach is what’s often called the “trial and error” approach. I DO NOT recommend to use it when fixing problems, but when looking for what could be an obvious root cause, it works really well most of the time. For me anyway.

Besides the speed factor, another reason I like the “trial and error” approach in the discovery phase is that you get to know the environment pretty well while fishing around.

S

econd, regardless of how many other engineers have already looked at the problem, I always start at the beginning. Never assume others have “checked everything”, even though that’s what they will always say (because from their point of view, they DID check everything). In fact, the less you know about what “has already been checked”, the better. So, always start with the basics and double check even the “obvious” things that have been checked already.

I mean, if they had checked everything, they should have found the cause already, right? So if the problem is still there, clearly something hasn’t been checked. And it could be something obvious to you (but not them).

T

hird, if you can’t find the source of the problem within first hour or so, switch to the systematic approach. It is really important you don’t spend too much time fishing around I am guilty myself of sometimes wasting hours before I wake up and realise that if I couldn’t find anything obvious within the first hour, another 3 hours are unlikely to change that.

So now what I often do is set a timer for an hour and then once the alarm goes off, I switch to the systematic approach.

The 3 Principles in a nutshell 1. Start with the basics

2. Avoid assumptions

3. Use systematic approach



Check These Things First

I’m at my client’s office 9am sharp. As usual. It’s the first time I meet Gary (not a real name) – chief of the office. Even though he’s a manager of over 40 people in this IT company for many years now, he’s really tech-savvy, so he introduces me to the problem himself.

According to Gary, the problem is that once in a while, every 20-30 hours one of their MySQL servers gets stuck – virtually all queries stop working – and then either MySQL resumes proper operation itself, or they restart it and after that it continues operating properly. And, as usually, he tells me that his engineers have already checked it all and they couldn’t find anything obvious.

We spend another 10 minutes talking about his business, about Gary’s management principles and other topics I find fascinating. Finally he shows me around the office, lists nearby restaurants (assuming I’ll stick around here for at least few days to come) and then leaves me to work on the problem.

I start with a mini audit by checking the most common faults, as a result of which MySQL could be acting out:

1. Simple configuration issue. This is really common. I would sometimes find things like query cache set to a gigabyte or two, gigantic sort buffer sizes, flush=1 enabled and configured at flush_time=30 interval and what not. You can find a good MySQL configuration primer here. The general rule is – be suspicious about anything you don’t understand or anything that’s not at its default setting.

Nothing was out of scope on this system.

2. Saturated resources. This is kind of obvious, except that if the issue is intermittent, you have to look back at when the issue was occurring to know if the resources were saturated then. It’s not always possible, but in this case customer had VividCortex running, so I could easily go back and drill down a bit. Yet, I found that hardware was way under-utilised, so I had to look further.

3. Query doing collateral damage. Most common in this category, in my experience, is some query that does a huge sort of, say, 10GB of data, which causes extremely intensive write activity and saturates both RAID controller’s write back cache and all of the disks. Often such a query would consume all resources, but collateral damage can be done in other ways too. For example, an UPDATE that locks all the records in a table would be using one CPU core or one disk spindle at any given moment, yet it could prevent hundreds of other queries from running.

That said, a quick query review showed only a handful of queries that were somewhat intrusive, so this was clearly not leading anywhere. On the other hand, I would actually have to look at the queries while things are stuck. And while I had VividCortex to look at it, nothing really stood out.

4. Another process on the same machine doing collateral damage. This is not an easy one to check (unless you are already monitoring the system with one second granularity), because you can’t know how that other process would be called out, but I do check cron to see if the time of the issue occurrence and cron jobs line up to some degree. It isn’t the case, so I ditch the idea at this stage.

5. Malfunctioning hardware. For that, all I do is check dmesg, OS logs and hardware logs (e.g. ESM log if it’s a DELL machine). CPU and memory failures are rarely soft – usually server or processes just start crashing. Disk issues, on the other hand, do often cause various performance issues. One common issue used to be BBU battery learning cycle, but nowadays servers often have either two batteries or Super Capacitor backed non-volatile cache. But I’ve also seen drives with bad sectors causing stalls.

In this case, however, I was on a virtual machine, so I couldn’t dig deep enough. I decided to return to this later if I notice that IO requests are taking longer than they should.

6. Another process sharing same LUN on SAN or other shared resource in a virtual environment doing damage. If possible, it’s best to check this on the Host or SAN itself rather than Guest, otherwise pay close attention to the disk response times and/or CPU %steal time if you suspect resources are being stolen.

7. Locking issues. That’s certainly not least, but it’s not something you can check easily, unless you already have some data collected for it.

I didn’t, so after about one hour I gave up looking for low hanging fruits and decided it’s time to tackle this systematically. Also, I am guessing VividCortex would have shown it if it was something trivial, but it didn’t either. So it was time to move further.

Collect The Right Data, at The Right Time

By now, I know quite a lot about the system. Apparently, it is a FreeBSD 10.1 running on ZFS – quite an unusual setup for MySQL. And even though I know this won’t perform as well as Linux on ext4 or xfs, it still can’t be (rather, it’s unlikely that it’s) the cause of intermittent query pile-ups.

Actually this is where pitfalls of trial and error approach show up very clearly. Based on my experience that tells me MySQL runs significantly worse on ZFS, I could make an assumption that FreeBSD+ZFS is the root cause of all this and tell Gary we should try moving everything to Linux. It would take a lot of time to do that and in the end we would get a slightly better performance, but the real problem would still be there.

Anyway, it was time to set up a script to monitor the server and collect some data next time the issue occurs.

I know exactly what to look for – continuous increase in Threads_running, because I could clearly see a query pile-up when the issue occurred.

And I know exactly the tool for the task – pt-stalk. I use pt-stalk since day one and I love it. (In fact, I used it even when it was an aspersa stalk, and even prior to that, when it was just a 5 line bash script)

Here’s what pt-stalk does in a nutshell: it monitors the server for a certain condition to occur and when that condition occurs, pt-stalk starts collecting some data for you to review later. The default configuration is actually good enough for most cases, so starting pt-stalk can be as simple as:

# wget percona.com/get/pt-stalk # chmod +x pt-stalk # ./pt-stalk --daemonize

By default pt-stalk fires collectors whenever Threads_running reaches 25 and stays at (or beyond) 25 for more than 5 one second iterations. And once collectors start, they will be collecting data for 30 seconds. Afterwards, pt-stalk will sleep for 5 minutes before resuming the checks (and potentially firing another collector if the Threads_running is still above the threshold).

For my needs, that’s good enough.

I test pt-stalk with ./pt-stalk --no-stalk --iterations 1 real quick to make sure it’s collecting the data properly and then I start it as a daemon so I can come back to it later.

BTW, in case you’re wondering how intrusive pt-stalk is, it’s very UNintrusive. By default, it collects only data that can be collected with virtually no performance impact, things like vmstat, diskstats (using cat against /proc/diskstat), innodb status, processlist etc. You can optionally enable additional collectors that will have a greater performance impact, but you can often figure things out without them.

Once I’m done setting up pt-stalk, we have a quick chat with Gary, decide to have lunch together and after a really nice conversation (and a fine Black Angus steak), we split up for the weekend (It’s Friday afterall!).

How to Analyse The Data

Monday 9am I’m back at the office, eagerly waiting to login and check the results. I can’t hide my excitement, so Gary hands me the access and leaves me alone to party.

Do I find anything? Oh boy, find things I do.

pt-stalk caught the issue 3 times during the weekend – Friday 7:34pm, Saturday 2:16pm and Sunday 7:34pm. There were also a few false positives, where MySQL did stall too, but only for 7-8 seconds. To find the samples I’m interested in, I use a simple grep command that I use every time:

# cd /var/lib/pt-stalk # grep Threads_running *-mysqladmin 2015_11_06_15_50_05-mysqladmin:| Threads_running | 38 | 2015_11_06_15_50_05-mysqladmin:| Threads_running | 11 | 2015_11_06_15_50_05-mysqladmin:| Threads_running | 11 | <-- not interested 2015_11_06_15_50_05-mysqladmin:| Threads_running | 9 | 2015_11_06_15_50_05-mysqladmin:| Threads_running | 8 | ... 2015_11_06_19_30_07-mysqladmin:| Threads_running | 41 | 2015_11_06_19_30_07-mysqladmin:| Threads_running | 14 | <-- not interested 2015_11_06_19_30_07-mysqladmin:| Threads_running | 13 | 2015_11_06_19_30_07-mysqladmin:| Threads_running | 13 | 2015_11_06_19_30_07-mysqladmin:| Threads_running | 13 | 2015_11_06_19_30_07-mysqladmin:| Threads_running | 12 | ... 2015_11_06_19_34_14-mysqladmin:| Threads_running | 25 | 2015_11_06_19_34_14-mysqladmin:| Threads_running | 25 | <-- interested! 2015_11_06_19_34_14-mysqladmin:| Threads_running | 24 | 2015_11_06_19_34_14-mysqladmin:| Threads_running | 25 | 2015_11_06_19_34_14-mysqladmin:| Threads_running | 26 | 2015_11_06_19_34_14-mysqladmin:| Threads_running | 28 | 2015_11_06_19_34_14-mysqladmin:| Threads_running | 30 | 2015_11_06_19_34_14-mysqladmin:| Threads_running | 31 | 2015_11_06_19_34_14-mysqladmin:| Threads_running | 32 | 2015_11_06_19_34_14-mysqladmin:| Threads_running | 34 | 2015_11_06_19_34_14-mysqladmin:| Threads_running | 33 | 2015_11_06_19_34_14-mysqladmin:| Threads_running | 42 | 2015_11_06_19_34_14-mysqladmin:| Threads_running | 43 | 2015_11_06_19_34_14-mysqladmin:| Threads_running | 47 | 2015_11_06_19_34_14-mysqladmin:| Threads_running | 48 | 2015_11_06_19_34_14-mysqladmin:| Threads_running | 49 | 2015_11_06_19_34_14-mysqladmin:| Threads_running | 51 | 2015_11_06_19_34_14-mysqladmin:| Threads_running | 50 | 2015_11_06_19_34_14-mysqladmin:| Threads_running | 54 | 2015_11_06_19_34_14-mysqladmin:| Threads_running | 56 | 2015_11_06_19_34_14-mysqladmin:| Threads_running | 60 | ... 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 # cd /var/lib/pt-stalk # grep Threads_running *-mysqladmin 2015_11_06_15_50_05 - mysqladmin : | Threads_running | 38 | 2015_11_06_15_50_05 - mysqladmin : | Threads_running | 11 | 2015_11_06_15_50_05 - mysqladmin : | Threads_running | 11 | < -- not interested 2015_11_06_15_50_05 - mysqladmin : | Threads_running | 9 | 2015_11_06_15_50_05 - mysqladmin : | Threads_running | 8 | . . . 2015_11_06_19_30_07 - mysqladmin : | Threads_running | 41 | 2015_11_06_19_30_07 - mysqladmin : | Threads_running | 14 | < -- not interested 2015_11_06_19_30_07 - mysqladmin : | Threads_running | 13 | 2015_11_06_19_30_07 - mysqladmin : | Threads_running | 13 | 2015_11_06_19_30_07 - mysqladmin : | Threads_running | 13 | 2015_11_06_19_30_07 - mysqladmin : | Threads_running | 12 | . . . 2015_11_06_19_34_14 - mysqladmin : | Threads_running | 25 | 2015_11_06_19_34_14 - mysqladmin : | Threads_running | 25 | < -- interested ! 2015_11_06_19_34_14 - mysqladmin : | Threads_running | 24 | 2015_11_06_19_34_14 - mysqladmin : | Threads_running | 25 | 2015_11_06_19_34_14 - mysqladmin : | Threads_running | 26 | 2015_11_06_19_34_14 - mysqladmin : | Threads_running | 28 | 2015_11_06_19_34_14 - mysqladmin : | Threads_running | 30 | 2015_11_06_19_34_14 - mysqladmin : | Threads_running | 31 | 2015_11_06_19_34_14 - mysqladmin : | Threads_running | 32 | 2015_11_06_19_34_14 - mysqladmin : | Threads_running | 34 | 2015_11_06_19_34_14 - mysqladmin : | Threads_running | 33 | 2015_11_06_19_34_14 - mysqladmin : | Threads_running | 42 | 2015_11_06_19_34_14 - mysqladmin : | Threads_running | 43 | 2015_11_06_19_34_14 - mysqladmin : | Threads_running | 47 | 2015_11_06_19_34_14 - mysqladmin : | Threads_running | 48 | 2015_11_06_19_34_14 - mysqladmin : | Threads_running | 49 | 2015_11_06_19_34_14 - mysqladmin : | Threads_running | 51 | 2015_11_06_19_34_14 - mysqladmin : | Threads_running | 50 | 2015_11_06_19_34_14 - mysqladmin : | Threads_running | 54 | 2015_11_06_19_34_14 - mysqladmin : | Threads_running | 56 | 2015_11_06_19_34_14 - mysqladmin : | Threads_running | 60 | . . .

So now I have all the 2015_11_06_19_34_14-* files that I’m interested in. I take a quick look at vmstat, but that doesn’t look interesting – the workload is really lightweight:

procs memory page disks faults cpu r b w avm fre flt re pi po fr sr mf0 mf1 in sy cs us sy id 11 3 2 79169316 5215336 103 0 0 0 74 40 0 0 105 93 127 3 1 97 2 0 2 79189016 5219056 13268 0 1 0 13221 10131 43 39 729 41015 5858 8 4 88 1 0 2 79189016 5218964 5155 0 1 0 5981 10131 57 66 697 8437 5694 5 1 94 1 0 2 79189016 5218908 5183 0 1 0 6008 10131 53 53 546 8844 5079 5 1 94 1 0 2 79189016 5218876 5216 0 1 0 6011 10131 75 69 1250 12485 9551 5 1 95 1 0 2 79189016 5218804 6042 0 1 0 6875 10131 64 59 989 11066 7645 6 1 93 1 2 3 4 5 6 7 8 procs memory page disks faults cpu r b w avm fre flt re pi po fr sr mf0 mf1 in sy cs us sy id 11 3 2 79169316 5215336 103 0 0 0 74 40 0 0 105 93 127 3 1 97 2 0 2 79189016 5219056 13268 0 1 0 13221 10131 43 39 729 41015 5858 8 4 88 1 0 2 79189016 5218964 5155 0 1 0 5981 10131 57 66 697 8437 5694 5 1 94 1 0 2 79189016 5218908 5183 0 1 0 6008 10131 53 53 546 8844 5079 5 1 94 1 0 2 79189016 5218876 5216 0 1 0 6011 10131 75 69 1250 12485 9551 5 1 95 1 0 2 79189016 5218804 6042 0 1 0 6875 10131 64 59 989 11066 7645 6 1 93

Processlist also doesn’t reveal anything other than the fact that there’s a number of queries waiting for something. But none of the queries seem likely to cause any collateral damage, or damage at all. Most of them are in a pretty ambiguous “query end” state, with few exceptions in a “sending data”, “init” and “FULLTEXT initialization” states.

Then I switch to InnoDB status sample and that’s where things get interesting.

The Root Cause!

Here’s the interesting bits of the 2015_11_06_19_34_14-innodbstatus2 file I was now staring at (this file is captured at the end of the 30s collecting session) :

*************************** 1. row *************************** Type: InnoDB Name: Status: ===================================== 2015-11-06 19:34:45 152bab8400 INNODB MONITOR OUTPUT ===================================== Per second averages calculated from the last 31 seconds ... ---------- SEMAPHORES ---------- OS WAIT ARRAY INFO: reservation count 390394009 --Thread 90488293376 has waited at fts0que.cc line 1287 for 32.00 seconds the semaphore: X-lock on RW-latch at 0x152695fc68 created in file fts0fts.cc line 645 a writer (thread id 90491812864) has reserved it in mode exclusive number of readers 0, waiters flag 1, lock_word: 0 Last time read locked in file not yet reserved line 0 Last time write locked in file /wrkdirs/usr/ports/databases/mysql56-server/work/mysql-5.6.21/storage/innobase/fts/fts0fts.cc line 4453 --Thread 90488292352 has waited at fts0que.cc line 1287 for 20.00 seconds the semaphore: X-lock on RW-latch at 0x152695fc68 created in file fts0fts.cc line 645 a writer (thread id 90491812864) has reserved it in mode exclusive number of readers 0, waiters flag 1, lock_word: 0 Last time read locked in file not yet reserved line 0 Last time write locked in file /wrkdirs/usr/ports/databases/mysql56-server/work/mysql-5.6.21/storage/innobase/fts/fts0fts.cc line 4453 --Thread 90491826176 has waited at fts0que.cc line 1287 for 12.00 seconds the semaphore: X-lock on RW-latch at 0x152695fc68 created in file fts0fts.cc line 645 a writer (thread id 90491812864) has reserved it in mode exclusive number of readers 0, waiters flag 1, lock_word: 0 Last time read locked in file not yet reserved line 0 Last time write locked in file /wrkdirs/usr/ports/databases/mysql56-server/work/mysql-5.6.21/storage/innobase/fts/fts0fts.cc line 4453 ... ------------ TRANSACTIONS ------------ ... ---TRANSACTION 16574016757, not started mysql tables in use 3, locked 0 MySQL thread id 801773106, OS thread handle 0x1511bbb000, query id 12961650003 appX 10.x.x.x user FULLTEXT initialization SELECT `st`.* FROM `search_table` AS `st` LEFT JOIN ... INNER JOIN ... WHERE ... AND (MATCH(st.text_field) AGAINST ('search string*' IN BOOLEAN MODE)) ... ---TRANSACTION 16574017350, ACTIVE (PREPARED) 5 sec mysql tables in use 1, locked 1 1 lock struct(s), heap size 360, 0 row lock(s), undo log entries 1 MySQL thread id 801773333, OS thread handle 0x1511b8c400, query id 12961651211 appX 10.x.x.x user query end INSERT INTO `non_search_table` ... VALUES ... TABLE LOCK table `schema`.`non_search_table` trx id 16574017350 lock mode IX ... ---TRANSACTION 16574014110, ACTIVE 33 sec doing SYNC index 4 lock struct(s), heap size 1184, 0 row lock(s), undo log entries 13813 TABLE LOCK table "schema"."FTS_0000000000001483_000000000000356e_INDEX_1" trx id 16574014110 lock mode IX TABLE LOCK table "schema"."FTS_0000000000001483_000000000000356e_INDEX_2" trx id 16574014110 lock mode IX TABLE LOCK table "schema"."FTS_0000000000001483_000000000000356e_INDEX_3" trx id 16574014110 lock mode IX TABLE LOCK table "schema"."FTS_0000000000001483_000000000000356e_INDEX_4" trx id 16574014110 lock mode IX ---TRANSACTION 16574014109, ACTIVE (PREPARED) 33 sec committing mysql tables in use 1, locked 1 1 lock struct(s), heap size 360, 0 row lock(s), undo log entries 1 MySQL thread id 801772471, OS thread handle 0x1511bb7c00, query id 12961646253 appX 10.x.x.x user query end INSERT INTO search_table SET id = '11734713', text_field = 'new text' ON DUPLICATE KEY UPDATE text_field = 'new text' TABLE LOCK table `schema`.`search_table` trx id 16574014109 lock mode IX 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 * * * * * * * * * * * * * * * * * * * * * * * * * * * 1. row * * * * * * * * * * * * * * * * * * * * * * * * * * * Type : InnoDB Name : Status : === === === === === === === === === === === === = 2015 - 11 - 06 19 : 34 : 45 152bab8400 INNODB MONITOR OUTPUT === === === === === === === === === === === === = Per second averages calculated from the last 31 seconds . . . -- -- -- -- -- SEMAPHORES -- -- -- -- -- OS WAIT ARRAY INFO : reservation count 390394009 -- Thread 90488293376 has waited at fts0que . cc line 1287 for 32.00 seconds the semaphore : X - lock on RW - latch at 0x152695fc68 created in file fts0fts . cc line 645 a writer ( thread id 90491812864 ) has reserved it in mode exclusive number of readers 0 , waiters flag 1 , lock_word : 0 Last time read locked in file not yet reserved line 0 Last time write locked in file / wrkdirs / usr / ports / databases / mysql56 - server / work / mysql - 5.6.21 / storage / innobase / fts / fts0fts . cc line 4453 -- Thread 90488292352 has waited at fts0que . cc line 1287 for 20.00 seconds the semaphore : X - lock on RW - latch at 0x152695fc68 created in file fts0fts . cc line 645 a writer ( thread id 90491812864 ) has reserved it in mode exclusive number of readers 0 , waiters flag 1 , lock_word : 0 Last time read locked in file not yet reserved line 0 Last time write locked in file / wrkdirs / usr / ports / databases / mysql56 - server / work / mysql - 5.6.21 / storage / innobase / fts / fts0fts . cc line 4453 -- Thread 90491826176 has waited at fts0que . cc line 1287 for 12.00 seconds the semaphore : X - lock on RW - latch at 0x152695fc68 created in file fts0fts . cc line 645 a writer ( thread id 90491812864 ) has reserved it in mode exclusive number of readers 0 , waiters flag 1 , lock_word : 0 Last time read locked in file not yet reserved line 0 Last time write locked in file / wrkdirs / usr / ports / databases / mysql56 - server / work / mysql - 5.6.21 / storage / innobase / fts / fts0fts . cc line 4453 . . . -- -- -- -- -- -- TRANSACTIONS -- -- -- -- -- -- . . . -- - TRANSACTION 16574016757 , not started mysql tables in use 3 , locked 0 MySQL thread id 801773106 , OS thread handle 0x1511bbb000 , query id 12961650003 appX 10.x.x.x user FULLTEXT initialization SELECT ` st ` . * FROM ` search_table ` AS ` st ` LEFT JOIN . . . INNER JOIN . . . WHERE . . . AND ( MATCH ( st . text_field ) AGAINST ( 'search string*' IN BOOLEAN MODE ) ) . . . -- - TRANSACTION 16574017350 , ACTIVE ( PREPARED ) 5 sec mysql tables in use 1 , locked 1 1 lock struct ( s ) , heap size 360 , 0 row lock ( s ) , undo log entries 1 MySQL thread id 801773333 , OS thread handle 0x1511b8c400 , query id 12961651211 appX 10.x.x.x user query end INSERT INTO ` non_search_table ` . . . VALUES . . . TABLE LOCK table ` schema ` . ` non_search_table ` trx id 16574017350 lock mode IX . . . -- - TRANSACTION 16574014110 , ACTIVE 33 sec doing SYNC index 4 lock struct ( s ) , heap size 1184 , 0 row lock ( s ) , undo log entries 13813 TABLE LOCK table "schema" . "FTS_0000000000001483_000000000000356e_INDEX_1" trx id 16574014110 lock mode IX TABLE LOCK table "schema" . "FTS_0000000000001483_000000000000356e_INDEX_2" trx id 16574014110 lock mode IX TABLE LOCK table "schema" . "FTS_0000000000001483_000000000000356e_INDEX_3" trx id 16574014110 lock mode IX TABLE LOCK table "schema" . "FTS_0000000000001483_000000000000356e_INDEX_4" trx id 16574014110 lock mode IX -- - TRANSACTION 16574014109 , ACTIVE ( PREPARED ) 33 sec committing mysql tables in use 1 , locked 1 1 lock struct ( s ) , heap size 360 , 0 row lock ( s ) , undo log entries 1 MySQL thread id 801772471 , OS thread handle 0x1511bb7c00 , query id 12961646253 appX 10.x.x.x user query end INSERT INTO search_table SET id = '11734713' , text_field = 'new text' ON DUPLICATE KEY UPDATE text_field = 'new text' TABLE LOCK table ` schema ` . ` search_table ` trx id 16574014109 lock mode IX

Now we have something to work with. Apparently there’s this internal transaction #16574014110 that’s doing a full text search (FTS) index sync (which is already something quite unexpected, given how long it’s taking), but what’s more interesting is that not only reads and writes into search_table table are blocked, but also a number of completely unrelated write queries (which I didn’t list here for brevity).

I download source files of MySQL 5.6.21, so I can poke around the code. I open the storage/innobase/fts/fts0fts.cc file and go to the line 4453 (based on the lock waits in the SEMAPHORES section) and find rw_lock_x_lock(&cache->lock) – this is the mutex that FTS queries (both inserts and selects) are waiting on.

In other words, it appears that InnoDB FTS index has a serialization issue and further code review confirms that – no queries can use the search table while the index is being synced. And index is being synced every time the FTS cache fills up. Bad news if you were considering using FTS on InnoDB.

I find that one user has already filed a similar bug, so I just add a few comments from myself to help MySQL fix it.

But the really interesting question is – why does that impact other queries, if the mutex is on a FTS cache lock? And how can we fix that? I have a quick discussion with the customer at this point and he confirms that indeed his biggest concern is that completely unrelated queries are getting stuck and that’s exactly what he wants to have fixed.

Hence, I need to figure out a way to reproduce this behaviour, so I can take a backtrace of MySQL and find the exact spot where other queries are halting. Luckily, in this particular case this is much easier than I thought.

Reproducing the issue

In order to reproduce the issue, I figure, it should be enough to make a copy of this table (which was just under 1GB in size), then run a script that keeps modifying the data and at the same time create another table and do inserts into that table.

So I ask the customer to spin up a server instance I can use for these tests, I launch MySQL 5.6.28 there, make a copy of the table into a file using mysqldump --single-transaction and start loading it on the new instance. As the data is loading, I am monitoring the innodb status, using watch to see if the SYNC is occurring at this stage as well:

# watch 'mysql -e "show engine innodb status\G" | grep SYNC'

And surely soon enough I notice SYNC occasionally taking 1s, then 2s and shortly – 9-10s.

Interesting.

Maybe I don’t even need to wait for the table to load. So I create a dummy table and start inserting records into it.

Yep. Inserts are taking anywhere from 1ms to 10s, depending on whether the SYNC is happening or not. So I start mysqlslap to get more inserts waiting:

# mysqlslap -a --auto-generate-sql-load-type=write --commit=3 --concurrency=10 --create-schema=mysqlslap --iterations=10

I can easily see these transactions waiting in different stages – doing COMMIT, initialising, doing actual inserts etc. So without further ado, I install gdb and pt-pmp to get a few backtraces:

# apt-get install gdb (yes, my test system is Ubuntu, not FreeBSD) # wget percona.com/get/pt-pmp # chmod +x pt-pmp # ./pt-pmp -k mysql-backtrace.$(date +%s)

And in just a few seconds, there’s my lucky strike:

/* Here's the mysqlslap threads that are stuck: */ 9 pthread_cond_wait,inline_mysql_cond_wait(mysql_thread.h:1162),Stage_manager::enroll_for(mysql_thread.h:1162),MYSQL_BIN_LOG::change_stage(binlog.cc:6783),MYSQL_BIN_LOG::ordered_commit(binlog.cc:7187),MYSQL_BIN_LOG::commit(binlog.cc:6500),ha_commit_trans(handler.cc:1436),trans_commit(transaction.cc:228),mysql_execute_command(sql_parse.cc:4280),mysql_parse(sql_parse.cc:6386),dispatch_command(sql_parse.cc:1340),do_command(sql_parse.cc:1037),do_handle_one_connection(sql_connect.cc:982),handle_one_connection(sql_connect.cc:898),pfs_spawn_thread(pfs.cc:1860),start_thread,clone /* And the internal FTS SYNC thread: */ 1 fsync,os_file_fsync(os0file.cc:2225),os_file_flush_func(os0file.cc:2225),fil_flush(os0file.ic:352),buf_dblwr_write_single_page(buf0dblwr.cc:1164),buf_flush_write_block_low(buf0flu.cc:959),buf_flush_page(buf0flu.cc:959),buf_flush_single_page_from_LRU(buf0flu.cc:1992),buf_LRU_get_free_block(buf0lru.cc:1367),buf_page_create(buf0buf.cc:3866),fsp_page_create(fsp0fsp.cc:1296),fseg_alloc_free_page_low(fsp0fsp.cc:2570),fseg_alloc_free_page_general(fsp0fsp.cc:2639),btr_page_alloc(btr0btr.cc:1141),btr_page_split_and_insert(btr0btr.cc:1141),btr_cur_pessimistic_insert(btr0cur.cc:1604),row_ins_clust_index_entry_low(row0ins.cc:2492),row_ins_clust_index_entry(row0ins.cc:2890),row_ins(row0ins.cc:2967),row_ins_step(row0ins.cc:2967),que_thr_step(que0que.cc:1059),que_run_threads_low(que0que.cc:1059),que_run_threads(que0que.cc:1059),fts_eval_sql(fts0sql.cc:286),fts_write_node(fts0fts.cc:3877),fts_sync_write_words(fts0fts.cc:4014),fts_sync_index(fts0fts.cc:4014),fts_sync(fts0fts.cc:4014),fts_add_doc_by_id(fts0fts.cc:3565),fts_add(fts0fts.cc:3565),fts_modify(fts0fts.cc:3028),fts_commit_table(fts0fts.cc:3028),fts_commit(fts0fts.cc:3028),trx_commit_low(trx0trx.cc:1352),trx_commit(trx0trx.cc:1416),trx_commit_for_mysql(trx0trx.cc:1416),innobase_commit_low(ha_innodb.cc:3400),innobase_commit(ha_innodb.cc:3400),ha_commit_low(handler.cc:1493),MYSQL_BIN_LOG::process_commit_stage_queue(binlog.cc:6678),MYSQL_BIN_LOG::ordered_commit(binlog.cc:7198),MYSQL_BIN_LOG::commit(binlog.cc:6500),ha_commit_trans(handler.cc:1436),trans_commit_stmt(transaction.cc:434),mysql_execute_command(sql_parse.cc:5037),mysql_parse(sql_parse.cc:6386),dispatch_command(sql_parse.cc:1340),do_command(sql_parse.cc:1037),do_handle_one_connection(sql_connect.cc:982),handle_one_connection(sql_connect.cc:898),pfs_spawn_thread(pfs.cc:1860),start_thread,clone 1 2 3 4 5 /* Here's the mysqlslap threads that are stuck: */ 9 pthread_cond_wait , inline_mysql_cond_wait ( mysql_thread . h : 1162 ) , Stage_manager :: enroll_for ( mysql_thread . h : 1162 ) , MYSQL_BIN_LOG :: change_stage ( binlog . cc : 6783 ) , MYSQL_BIN_LOG :: ordered_commit ( binlog . cc : 7187 ) , MYSQL_BIN_LOG :: commit ( binlog . cc : 6500 ) , ha_commit_trans ( handler . cc : 1436 ) , trans_commit ( transaction . cc : 228 ) , mysql_execute_command ( sql_parse . cc : 4280 ) , mysql_parse ( sql_parse . cc : 6386 ) , dispatch_command ( sql_parse . cc : 1340 ) , do_command ( sql_parse . cc : 1037 ) , do_handle_one_connection ( sql_connect . cc : 982 ) , handle_one_connection ( sql_connect . cc : 898 ) , pfs_spawn_thread ( pfs . cc : 1860 ) , start_thread , clone /* And the internal FTS SYNC thread: */ 1 fsync , os_file_fsync ( os0file . cc : 2225 ) , os_file_flush_func ( os0file . cc : 2225 ) , fil_flush ( os0file . ic : 352 ) , buf_dblwr_write_single_page ( buf0dblwr . cc : 1164 ) , buf_flush_write_block_low ( buf0flu . cc : 959 ) , buf_flush_page ( buf0flu . cc : 959 ) , buf_flush_single_page_from_LRU ( buf0flu . cc : 1992 ) , buf_LRU_get_free_block ( buf0lru . cc : 1367 ) , buf_page_create ( buf0buf . cc : 3866 ) , fsp_page_create ( fsp0fsp . cc : 1296 ) , fseg_alloc_free_page_low ( fsp0fsp . cc : 2570 ) , fseg_alloc_free_page_general ( fsp0fsp . cc : 2639 ) , btr_page_alloc ( btr0btr . cc : 1141 ) , btr_page_split_and_insert ( btr0btr . cc : 1141 ) , btr_cur_pessimistic_insert ( btr0cur . cc : 1604 ) , row_ins_clust_index_entry_low ( row0ins . cc : 2492 ) , row_ins_clust_index_entry ( row0ins . cc : 2890 ) , row_ins ( row0ins . cc : 2967 ) , row_ins_step ( row0ins . cc : 2967 ) , que_thr_step ( que0que . cc : 1059 ) , que_run_threads_low ( que0que . cc : 1059 ) , que_run_threads ( que0que . cc : 1059 ) , fts_eval_sql ( fts0sql . cc : 286 ) , fts_write_node ( fts0fts . cc : 3877 ) , fts_sync_write_words ( fts0fts . cc : 4014 ) , fts_sync_index ( fts0fts . cc : 4014 ) , fts_sync ( fts0fts . cc : 4014 ) , fts_add_doc_by_id ( fts0fts . cc : 3565 ) , fts_add ( fts0fts . cc : 3565 ) , fts_modify ( fts0fts . cc : 3028 ) , fts_commit_table ( fts0fts . cc : 3028 ) , fts_commit ( fts0fts . cc : 3028 ) , trx_commit_low ( trx0trx . cc : 1352 ) , trx_commit ( trx0trx . cc : 1416 ) , trx_commit_for_mysql ( trx0trx . cc : 1416 ) , innobase_commit_low ( ha_innodb . cc : 3400 ) , innobase_commit ( ha_innodb . cc : 3400 ) , ha_commit_low ( handler . cc : 1493 ) , MYSQL_BIN_LOG :: process_commit_stage_queue ( binlog . cc : 6678 ) , MYSQL_BIN_LOG :: ordered_commit ( binlog . cc : 7198 ) , MYSQL_BIN_LOG :: commit ( binlog . cc : 6500 ) , ha_commit_trans ( handler . cc : 1436 ) , trans_commit_stmt ( transaction . cc : 434 ) , mysql_execute_command ( sql_parse . cc : 5037 ) , mysql_parse ( sql_parse . cc : 6386 ) , dispatch_command ( sql_parse . cc : 1340 ) , do_command ( sql_parse . cc : 1037 ) , do_handle_one_connection ( sql_connect . cc : 982 ) , handle_one_connection ( sql_connect . cc : 898 ) , pfs_spawn_thread ( pfs . cc : 1860 ) , start_thread , clone

See full pmp output, full gdb backtrace.

So there in the sql/binlog.cc:6783 lies the answer, why these threads are waiting:

int MYSQL_BIN_LOG::ordered_commit(THD *thd, bool all, bool skip_commit) { ... if (opt_binlog_order_commits && (sync_error == 0 || binlog_error_action != ABORT_SERVER)) { if (change_stage(thd, Stage_manager::COMMIT_STAGE, final_queue, leave_mutex_before_commit_stage, &LOCK_commit)) ... 1 2 3 4 5 6 7 8 9 10 int MYSQL_BIN_LOG :: ordered_commit ( THD * thd , bool all , bool skip_commit ) { . . . if ( opt_binlog_order_commits && ( sync_error == 0 || binlog_error_action != ABORT_SERVER ) ) { if ( change_stage ( thd , Stage_manager :: COMMIT_STAGE , final_queue , leave_mutex_before_commit_stage , & LOCK_commit ) ) . . .

In reality these queries are waiting in the MYSQL_BIN_LOG::change_stage function, but one interesting observation here in the ordered_commit function is that you can control whether you want commits to be ordered or not. Here’s what MySQL manual says about it:

If this variable is enabled (the default), transactions are committed in the same order they are written to the binary log. If disabled, transactions may be committed in parallel. In some cases, disabling this variable might produce a performance increment.

I decide then and there to test this out. Variable is dynamic, so I SET global binlog_order_commits=0; while the table is still being loaded and find that my inserts are no longer stalling!

Then I spend some time trying to understand the pitfalls of disabling binlog_order_commits and find none, if you’re not interested in parallel replication replicating commits in the exact same order.

Finally, even though I have already compared FTS code between MySQL 5.6 and 5.7 (and found it hasn’t changed much when it comes to index sync), I try to reproduce the same behaviour on MySQL 5.7.10 and unfortunately, behaviour is exactly the same, so I conclude that there’s no reason to upgrade and disabling the binlog_order_commits is the best customer can do to at this point.

With Gary’s approval, I implement the change in production and after a few days I came in for a final check. Issue is gone. The job is done.

Final thoughts

If you found this crash course useful, I’m glad. If not, leave a comment and I’ll do my best to help you solve your problem. Actually, do leave a comment even if you did find it useful :). I have more of these practical guides coming soon, so do subscribe below as an encouragement.

Chances are you found this blog post because you are also suffering from the same issue. If that’s the case, then besides disabling the binlog_order_commits, the next best thing you can do is press “Affects me” button here – that increases chances of having this fixed sooner rather than later.

Finally, if you’re considering using InnoDB Full Text Search for some serious project, don’t. Apparently it’s too early for that – use Sphinx search, Solr, Elasticsearch or whatever works well for you, but not InnoDB. Not yet at least.

Share this Post