I’ve been running a benchmark today on my old test box with conventional hard drives (no raid with BBU) and noticed something unusual in the CPU utilization statistics being reported.

The benchmark was run like this:

sysbench --num-threads=64 --max-requests=0 --max-time=600000 --report-interval=10 --test=oltp --db-driver=mysql --oltp-dist-type=special --oltp-table-size=1000000 --mysql-user=root --mysql-password=password run 1 sysbench -- num - threads = 64 -- max - requests = 0 -- max - time = 600000 -- report - interval = 10 -- test = oltp -- db - driver = mysql -- oltp - dist - type = special -- oltp - table - size = 1000000 -- mysql - user = root -- mysql - password = password run

Which means: create 64 threads and hammer the database with queries as quickly as possible. As the test was run on the localhost I would expect the benchmark to completely saturate the system – being either using CPU or being blocked on IO nature of this benchmark so it does not spend a lot on database locks, especially as this system has just 2 cores.

Looking at VMSTAT however I noticed this:

[root@smt1 mysql]# vmstat 10 procs -----------memory---------- ---swap-- -----io---- --system-- -----cpu----- r b swpd free buff cache si so bi bo in cs us sy id wa st 13 0 0 880156 40060 2140824 0 0 12 750 895 1045 32 8 54 6 0 0 0 0 877116 40060 2141312 0 0 0 1783 2185 23112 44 10 41 5 0 15 0 0 872648 40068 2141960 0 0 0 1747 2204 25743 41 11 46 2 0 0 0 0 868056 40068 2142604 0 0 0 1803 2164 26224 40 11 44 5 0 17 1 0 863216 40068 2143160 0 0 0 1875 1948 23020 36 9 50 5 0 0 0 0 858384 40168 2143656 0 0 0 1063 1855 21116 32 9 45 14 0 23 0 0 855848 40176 2144232 0 0 0 1755 2036 23181 36 10 48 6 0 49 0 0 851248 40184 2144648 0 0 0 1679 2313 22832 45 10 40 5 0 10 0 0 846292 40192 2145248 0 0 0 1911 1980 23185 36 9 50 4 0 0 0 0 844260 40196 2145868 0 0 0 1757 2152 26387 39 11 45 5 0 0 3 0 839296 40196 2146520 0 0 0 1439 2104 25096 38 10 50 1 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 [ root @ smt1 mysql ] # vmstat 10 procs -- -- -- -- -- - memory -- -- -- -- -- -- - swap -- -- -- - io -- -- -- system -- -- -- - cpu -- -- - r b swpd free buff cache si so bi bo in cs us sy id wa st 13 0 0 880156 40060 2140824 0 0 12 750 895 1045 32 8 54 6 0 0 0 0 877116 40060 2141312 0 0 0 1783 2185 23112 44 10 41 5 0 15 0 0 872648 40068 2141960 0 0 0 1747 2204 25743 41 11 46 2 0 0 0 0 868056 40068 2142604 0 0 0 1803 2164 26224 40 11 44 5 0 17 1 0 863216 40068 2143160 0 0 0 1875 1948 23020 36 9 50 5 0 0 0 0 858384 40168 2143656 0 0 0 1063 1855 21116 32 9 45 14 0 23 0 0 855848 40176 2144232 0 0 0 1755 2036 23181 36 10 48 6 0 49 0 0 851248 40184 2144648 0 0 0 1679 2313 22832 45 10 40 5 0 10 0 0 846292 40192 2145248 0 0 0 1911 1980 23185 36 9 50 4 0 0 0 0 844260 40196 2145868 0 0 0 1757 2152 26387 39 11 45 5 0 0 3 0 839296 40196 2146520 0 0 0 1439 2104 25096 38 10 50 1 0

Looking at last few columns we have IO wait typically being in single digit percents where the “idle” CPU floats in 40-50% range. You can also see it visually in the graph from Percona Cloud Tools



So the system should be completely saturated with the load but instead we have 50% of CPU idle and iowait is typically in single digits?

Running SHOW PROCESSLIST can give us some good idea in this case:

mysql> show processlist; +-----+---------------+-----------------+--------+---------+------+----------------+--------------------------------------------------------------------+-----------+---------------+-----------+ | Id | User | Host | db | Command | Time | State | Info | Rows_sent | Rows_examined | Rows_read | +-----+---------------+-----------------+--------+---------+------+----------------+--------------------------------------------------------------------+-----------+---------------+-----------+ | 1 | root | localhost:39624 | NULL | Sleep | 0 | | NULL | 0 | 0 | 0 | | ... | 32 | root | localhost | sbtest | Execute | 0 | NULL | COMMIT | 0 | 0 | 0 | | 33 | root | localhost | sbtest | Execute | 0 | NULL | COMMIT | 0 | 0 | 0 | | 34 | root | localhost | sbtest | Execute | 0 | Sorting result | SELECT c from sbtest where id between 365260 and 365359 order by c | 0 | 0 | 0 | | 35 | root | localhost | sbtest | Execute | 0 | NULL | COMMIT | 0 | 0 | 0 | | 36 | root | localhost | sbtest | Execute | 0 | NULL | COMMIT | 0 | 0 | 0 | | 37 | root | localhost | sbtest | Execute | 0 | NULL | COMMIT | 0 | 0 | 0 | | 38 | root | localhost | sbtest | Execute | 0 | Writing to net | DELETE from sbtest where id=496460 | 0 | 1 | 1 | | 39 | root | localhost | sbtest | Execute | 0 | NULL | COMMIT | 0 | 0 | 0 | ... | 89 | root | localhost | sbtest | Execute | 0 | NULL | COMMIT | 0 | 0 | 0 | | 90 | root | localhost | sbtest | Execute | 0 | NULL | COMMIT | 0 | 0 | 0 | | 91 | root | localhost | sbtest | Execute | 0 | NULL | COMMIT | 0 | 0 | 0 | | 268 | root | localhost | NULL | Query | 0 | NULL | show processlist | 0 | 0 | 0 | +-----+---------------+-----------------+--------+---------+------+----------------+--------------------------------------------------------------------+-----------+---------------+-----------+ 68 rows in set (0.00 sec) 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 mysql > show processlist ; + -- -- - + -- -- -- -- -- -- -- - + -- -- -- -- -- -- -- -- - + -- -- -- -- + -- -- -- -- - + -- -- -- + -- -- -- -- -- -- -- -- + -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- + -- -- -- -- -- - + -- -- -- -- -- -- -- - + -- -- -- -- -- - + | Id | User | Host | db | Command | Time | State | Info | Rows_sent | Rows_examined | Rows_read | + -- -- - + -- -- -- -- -- -- -- - + -- -- -- -- -- -- -- -- - + -- -- -- -- + -- -- -- -- - + -- -- -- + -- -- -- -- -- -- -- -- + -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- + -- -- -- -- -- - + -- -- -- -- -- -- -- - + -- -- -- -- -- - + | 1 | root | localhost : 39624 | NULL | Sleep | 0 | | NULL | 0 | 0 | 0 | | . . . | 32 | root | localhost | sbtest | Execute | 0 | NULL | COMMIT | 0 | 0 | 0 | | 33 | root | localhost | sbtest | Execute | 0 | NULL | COMMIT | 0 | 0 | 0 | | 34 | root | localhost | sbtest | Execute | 0 | Sorting result | SELECT c from sbtest where id between 365260 and 365359 order by c | 0 | 0 | 0 | | 35 | root | localhost | sbtest | Execute | 0 | NULL | COMMIT | 0 | 0 | 0 | | 36 | root | localhost | sbtest | Execute | 0 | NULL | COMMIT | 0 | 0 | 0 | | 37 | root | localhost | sbtest | Execute | 0 | NULL | COMMIT | 0 | 0 | 0 | | 38 | root | localhost | sbtest | Execute | 0 | Writing to net | DELETE from sbtest where id = 496460 | 0 | 1 | 1 | | 39 | root | localhost | sbtest | Execute | 0 | NULL | COMMIT | 0 | 0 | 0 | . . . | 89 | root | localhost | sbtest | Execute | 0 | NULL | COMMIT | 0 | 0 | 0 | | 90 | root | localhost | sbtest | Execute | 0 | NULL | COMMIT | 0 | 0 | 0 | | 91 | root | localhost | sbtest | Execute | 0 | NULL | COMMIT | 0 | 0 | 0 | | 268 | root | localhost | NULL | Query | 0 | NULL | show processlist | 0 | 0 | 0 | + -- -- - + -- -- -- -- -- -- -- - + -- -- -- -- -- -- -- -- - + -- -- -- -- + -- -- -- -- - + -- -- -- + -- -- -- -- -- -- -- -- + -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- + -- -- -- -- -- - + -- -- -- -- -- -- -- - + -- -- -- -- -- - + 68 rows in set ( 0.00 sec )

So most of the queries are stalling in COMMIT stage. Checking setting we can see:

mysql> select @@innodb_flush_log_at_trx_commit; +----------------------------------+ | @@innodb_flush_log_at_trx_commit | +----------------------------------+ | 1 | +----------------------------------+ 1 row in set (0.00 sec) 1 2 3 4 5 6 7 mysql > select @ @ innodb_flush_log_at_trx_commit ; + -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- + | @ @ innodb_flush_log_at_trx_commit | + -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- + | 1 | + -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- + 1 row in set ( 0.00 sec )

Which means Percona Server is running in the fully ACID mode, which with slow hard drives without BBU means commit is causing syncing data to the disk which is very expensive.

So we figured out what is really happening… yet if we’re really waiting on disk a lot why iowait does not show that?

To answer that lets see how IOwait is really computed.

IOwait is type of Idle time (and in earlier Linux versions it wasincluded in Idle) – when we have the process/thread which would be runable if it would not be waiting on IO we consider this iowait, if it is not runable because of something else (waiting on mutex, network etc) it is considered idle.

Writes to Innodb log file are serialized – so only one process will be doing log write (and sync) while all others waiting to be able to commit will be waiting on some form of synchronization mechanism to be notified when the commit is complete as part of group commit.

What is the take away? Take iowait% numbers from vmstat/iostat with grain of salt. It is possible there is much more to gain than faster IO subsystem (or proper configuration) than iowait numbers show. Consider the iowait%+idle% of the CPU as “cpu available” which could be potentially put to the good use with improving IO subsystem and minimizing contention

To illustrate that lets take a look at the CPU graph again – now as I’ve set innodb_flush_log_at_trx_commit=2

As we can see from the graph we have reclaimed much more than 6-7% of idle time we had – the user CPU usage increased to 70% and idle+iowait is now 10-15%; the throughput of the system has increased about 60% – much more than “eliminating iowait” would allow.