The other day I was working on an issue where one of the slaves was showing unexpected lag. Interestingly with only the IO thread running the slave was doing significantly more IO as compared to the rate at which the IO thread was fetching the binary log events from the master.

I found this out by polling the SLAVE STATUS and monitoring the value of Read_Master_Log_Pos as it changed over time. Then compared it to the actual IO being done by the server using the pt-diskstats tool from the excellent Percona Toolkit. Note that, when doing this analysis, I had already stopped the slave SQL thread and made sure that there were no dirty InnoDB pages, otherwise my analysis would have been skewed.

A closer inspection showed that this had to do with the error log getting filled up with error messages apparently suggesting that the slave IO thread was disconnecting and reconnecting repeatedly.

Below is the type of error messages that were being generated and were filling up the error log:

2014-01-18 03:33:46 22921 [Note] Slave: received end packet from server, apparent master shutdown: 2014-01-18 03:33:46 22921 [Note] Slave I/O thread: Failed reading log event, reconnecting to retry, log 'mysql-bin.000001' at position 4239 2014-01-18 03:33:46 22921 [Warning] Storing MySQL user name or password information in the master info repository is not secure and is therefore not recommended. Please consider using the USER and PASSWORD connection options for START SLAVE; see the 'START SLAVE Syntax' in the MySQL Manual for more information. 2014-01-18 03:33:46 22921 [Note] Slave: received end packet from server, apparent master shutdown: 2014-01-18 03:33:46 22921 [Note] Slave I/O thread: Failed reading log event, reconnecting to retry, log 'mysql-bin.000001' at position 4239 2014-01-18 03:33:46 22921 [Warning] Storing MySQL user name or password information in the master info repository is not secure and is therefore not recommended. Please consider using the USER and PASSWORD connection options for START SLAVE; see the 'START SLAVE Syntax' in the MySQL Manual for more information. 1 2 3 4 5 6 2014 - 01 - 18 03 : 33 : 46 22921 [ Note ] Slave : received end packet from server , apparent master shutdown : 2014 - 01 - 18 03 : 33 : 46 22921 [ Note ] Slave I / O thread : Failed reading log event , reconnecting to retry , log 'mysql-bin.000001' at position 4239 2014 - 01 - 18 03 : 33 : 46 22921 [ Warning ] Storing MySQL user name or password information in the master info repository is not secure and is therefore not recommended . Please consider using the USER and PASSWORD connection options for START SLAVE ; see the 'START SLAVE Syntax' in the MySQL Manual for more information . 2014 - 01 - 18 03 : 33 : 46 22921 [ Note ] Slave : received end packet from server , apparent master shutdown : 2014 - 01 - 18 03 : 33 : 46 22921 [ Note ] Slave I / O thread : Failed reading log event , reconnecting to retry , log 'mysql-bin.000001' at position 4239 2014 - 01 - 18 03 : 33 : 46 22921 [ Warning ] Storing MySQL user name or password information in the master info repository is not secure and is therefore not recommended . Please consider using the USER and PASSWORD connection options for START SLAVE ; see the 'START SLAVE Syntax' in the MySQL Manual for more information .

These error messages are quite vague and do not suggest the actual problem at all. The error messages appear to point out to the disconnection and reconnection of the IO thread. The rate at which these error messages were being generated was close to 20MB/s which was unnecessarily producing IO load.

Importance of a Unique MySQL 5.6 Server UUID

But anyway coming back to the problem. Looking at the output of running SHOW SLAVE HOSTS on the master showed what the problem was:

master [localhost] {msandbox} (test) > show slave hosts; +-----------+----------+-------+-----------+--------------------------------------+ | Server_id | Host | Port | Master_id | Slave_UUID | +-----------+----------+-------+-----------+--------------------------------------+ | 101 | SBslave1 | 19681 | 1 | 6c27ed6d-7ee1-11e3-be39-6c626d957cff | +-----------+----------+-------+-----------+--------------------------------------+ 1 row in set (0.00 sec) master [localhost] {msandbox} (test) > show slave hosts; +-----------+----------+-------+-----------+--------------------------------------+ | Server_id | Host | Port | Master_id | Slave_UUID | +-----------+----------+-------+-----------+--------------------------------------+ | 102 | SBslave2 | 19682 | 1 | 6c27ed6d-7ee1-11e3-be39-6c626d957cff | +-----------+----------+-------+-----------+--------------------------------------+ 1 row in set (0.00 sec) 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 master [ localhost ] { msandbox } ( test ) > show slave hosts ; + -- -- -- -- -- - + -- -- -- -- -- + -- -- -- - + -- -- -- -- -- - + -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- + | Server_id | Host | Port | Master_id | Slave_UUID | + -- -- -- -- -- - + -- -- -- -- -- + -- -- -- - + -- -- -- -- -- - + -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- + | 101 | SBslave1 | 19681 | 1 | 6c27ed6d - 7ee1 - 11e3 - be39 - 6c626d957cff | + -- -- -- -- -- - + -- -- -- -- -- + -- -- -- - + -- -- -- -- -- - + -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- + 1 row in set ( 0.00 sec ) master [ localhost ] { msandbox } ( test ) > show slave hosts ; + -- -- -- -- -- - + -- -- -- -- -- + -- -- -- - + -- -- -- -- -- - + -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- + | Server_id | Host | Port | Master_id | Slave_UUID | + -- -- -- -- -- - + -- -- -- -- -- + -- -- -- - + -- -- -- -- -- - + -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- + | 102 | SBslave2 | 19682 | 1 | 6c27ed6d - 7ee1 - 11e3 - be39 - 6c626d957cff | + -- -- -- -- -- - + -- -- -- -- -- + -- -- -- - + -- -- -- -- -- - + -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- + 1 row in set ( 0.00 sec )

The slave hosts for the UUID “6c27ed6d-7ee1-11e3-be39-6c626d957cff” were fluctuating between “SBslave1” and “SBslave2”. So the actual problem was that there were two slaves running with the same server UUID and that was confusing the master. This was caused by the new slave “SBslave2” being cloned from the slave “SBslave1”. The cloning process ended up copying the file “auto.cnf” which is present in the MySQL data directory. The auto.cnf file stores the server’s UUID which is used to uniquely identify a server. This is a new feature in MySQL 5.6 and is explained very well in the MySQL manual here.

So essentially when a server starts up, it reads the UUID value from the auto.cnf file. If the auto.cnf file is not present or the UUID value cannot be read, then the server generates a new UUID. When a new UUID is generated you should see a message similar to the following in the MySQL error log:

2014-01-18 03:44:43 27365 [Warning] No existing UUID has been found, so we assume that this is the first time that this server has been started. Generating a new UUID: 7c6858fa-7fea-11e3-84fa-6c626d957cff. 1 2014 - 01 - 18 03 : 44 : 43 27365 [ Warning ] No existing UUID has been found , so we assume that this is the first time that this server has been started . Generating a new UUID : 7c6858fa - 7fea - 11e3 - 84fa - 6c626d957cff.

Conclusion

So the important lesson is to remember to not reuse the auto.cnf file when cloning a server, as the MySQL 5.6 server UUID is an important identification of a particular server. This is used by the master for the purpose of server identification. I also believe that MySQL should show a different error message that is more pertinent to the actual error. The master could in theory send a different type of event to the slave when it notices one with a duplicate UUID.