Fixing The Problem

When following the above-mentioned steps, the result turned out to be the one mentioned above, then you can try out the following steps, to fix your character-set encoding problem.

Note: Before trying out the steps in your production database, run the mentioned steps on your Local database, then on a replica/backup of the production database and then finally your production database but after taking a backup of the production database first.

At the end of the article, I would cite the sources that I went through to fix the database character-set issue at Gridle.

Disconnect all the applications that connect to your database, while you’re performing a fix for you surely don’t want any new inserts or updates in the database during that phase since it would simply result in wrong data being inserted. Also, take a backup of the database, in case if things go wrong. Make everything UTF-8 everywhere. Update mysqld , mysql and client settings in the /etc/mysql/*.cnf file as follows:[1]

[client]

default-character-set = utf8mb4



[mysql]

default-character-set = utf8mb4



[mysqld]

character_set_server=utf8mb4

collation_server=utf8mb4_unicode_ci

Note: If you are using mysql version 5.6 or lower, also add the following settings in the mysqld section [2]

#The following should be set if you are using mysql version 5.6 or lower

innodb_file_format=barracuda

innodb_file_per_table=1

innodb_large_prefix=1

Note: If on an AWS RDS, the following settings have to be made in a parameter group, which would then be applied on an RDS instance [3]

[mysqld]

character_set_client: utf8mb4

character_set_database: utf8mb4

character_set_results: utf8mb4

character_set_connection: utf8mb4

character_set_server: utf8mb4 collation_connection: utf8mb4_unicode_ci

collation_server: utf8mb4_unicode_ci innodb_file_format=barracuda

innodb_file_per_table=1

innodb_large_prefix=1

3. Restart mysql service.

4. Connect to mysql using the mysql-client CLI:

mysql -h host -u username -p

In the mysql command:

The -h option is used to specify the host where the database resides. It defaults to localhost ( 127.0.0.1 ) if it is not specified.

option is used to specify the host where the database resides. It defaults to ( ) if it is not specified. The -u and -p options specify the username and password of the database user respectively.

and options specify the username and password of the database user respectively. For more information: Checkout the Mysql official doc for mysql.

5. Execute the following query: [1]

SHOW VARIABLES WHERE Variable_name LIKE 'character\_set\_%' OR Variable_name LIKE 'collation%';

The output should be as follows:

+--------------------------+--------------------+

| Variable_name | Value |

+--------------------------+--------------------+

| character_set_client | utf8mb4 |

| character_set_connection | utf8mb4 |

| character_set_database | utf8mb4 |

| character_set_filesystem | binary |

| character_set_results | utf8mb4 |

| character_set_server | utf8mb4 |

| character_set_system | utf8 |

| collation_connection | utf8mb4_general_ci |

| collation_database | utf8mb4_unicode_ci |

| collation_server | utf8mb4_unicode_ci |

+--------------------------+--------------------+

10 rows in set (0.05 sec)

6. Convert your Latin1 collated Database and associated tables to UTF-8.

For this part, actually there are two possible ways:

Method 1 : Run individual queries on each table you have, altering the collation and character-set of the database, followed by the table, followed by each column in the table, where collation is latin1_swedish_ci and/or character-set is latin1 . This method is useful, if you are sure, that your database table fields with type varchar, char, text etc. do not contain many non-english characters. This method is fast, but on a personal note, I am not sure how intact the data would turn out to be after completing the steps when performed on a large database containing millions of entries as Gridle does. Go for this method, if your database is small and any abnormalities could be easily detected. The detailed explanation of this method is out of the scope of this article. To know more about it, read what Wordpress Codex has to say. [4]

: Run individual queries on each table you have, altering the collation and character-set of the database, followed by the table, followed by each column in the table, where collation is and/or character-set is . This method is useful, if you are sure, that your database table fields with type varchar, char, text etc. do not contain many non-english characters. This method is fast, but on a personal note, I am not sure how intact the data would turn out to be after completing the steps when performed on a large database containing millions of entries as Gridle does. Go for this method, if your database is small and any abnormalities could be easily detected. The detailed explanation of this method is out of the scope of this article. To know more about it, read what Wordpress Codex has to say. [4] Method 2: I find this method more reliable, although it is 100% more time consuming than Method 1. This method involves taking a dump of the current database (database, character-set, table structure and table data) using the mysqldump CLI tool, updating the character-set and the collation in the dump file and finally re-inserting the fixed SQL dump back, replacing the original database and its content(associated tables and data). When following Method 2, make sure that any active applications connecting to the applications are DISCONNECTED.[5]

7. Create a database dump

mysqldump --verbose -h host -u username -p --add-drop-database --opt --skip-set-charset --default-character-set=latin1 --skip-extended-insert --databases databasename > dump_file.sql

In the mysqldump command:

The --verbose option is used to enable the mysqldump command to print out every action that it performs.

option is used to enable the command to print out every action that it performs. The --opt command specifies a list of other default arguments for mysqldump. Read here for more info.

command specifies a list of other default arguments for mysqldump. Read here for more info. The --skip-set-charset and -- default-char-set=latin1 options prevent MySQL from taking the already Latin1 collated table and “helpfully” converting it to any other character set for you. This should ensure that your mysqldump is really in the Latin1 character encoding scheme.

and options prevent MySQL from taking the already Latin1 collated table and “helpfully” converting it to any other character set for you. This should ensure that your is really in the Latin1 character encoding scheme. The --skip-extended-insert option forces mysqldump to put each INSERT command in the dump on its own line. This will make the dump take much longer to re-import, however, in my experimentation, adding this option was enough to prevent the dump from having syntax errors in anywhere. And in any case, should the re-import fail for any reason, having each row’s data on its own line really helps to be able to zero-in on which rows are causing you problems (and gives you easier options to workaround the problem rows).

option forces mysqldump to put each INSERT command in the dump on its own line. This will make the dump take much longer to re-import, however, in my experimentation, adding this option was enough to prevent the dump from having syntax errors in anywhere. And in any case, should the re-import fail for any reason, having each row’s data on its own line really helps to be able to zero-in on which rows are causing you problems (and gives you easier options to workaround the problem rows). The --databases allow dumping multiple databases. Furthermore, it also adds a CREATE and USE database command in the dump, which would allow us to change the collation and character-set of the Database as well.

allow dumping multiple databases. Furthermore, it also adds a and database command in the dump, which would allow us to change the collation and character-set of the Database as well. The dump_file.sql is the file to which the mysqldump command would write the contents of the database dump. Not specifying an output file would simply lead the database dump to be printed in the terminal.

is the file to which the command would write the contents of the database dump. For more information: Checkout the Mysql official doc for mysqldump.

8. Replace all latin1 instances with utf8mb4

On the dump file, execute the following command

perl -i -pe 's/DEFAULT CHARACTER SET latin1/DEFAULT CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci/' dump_file.sql

This command replaces all instances of DEFAULT CHARACTER SET latin1 with DEFAULT CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci. This is used to fix up the database’s default charset and collation.[6]

perl -i -pe 's/DEFAULT CHARSET=latin1/DEFAULT CHARSET=utf8mb4 COLLATE utf8mb4_unicode_ci ROW_FORMAT=DYNAMIC/' dump_file.sql

This command replaces all instances of DEFAULT CHARSET=latin1 with DEFAULT CHARSET=utf8mb4. This converts all tables from latin1 to utf8mb4. [6]

9. Restore the database dump

mysql --verbose -h host -u username -p < dump_file.sql

10. Repair the tables for any problems that you have faced or would face.[7]

mysqlcheck --verbose -h host -u root -p --auto-repair --optimize --all-databases

In the mysqlcheck command:

The --all-databases option informs the mysqlcheck command to check all the databases stored in MySQL

option informs the command to check all the databases stored in MySQL The --auto-repair option informs mysqlcheck to automatically fix a table, if the table, when checked is found to be corrupted or has errors.

option informs to automatically fix a table, if the table, when checked is found to be corrupted or has errors. The --optimize option informs mysqlcheck to optimize the checked tables

option informs to optimize the checked tables The above options enable mysqlcheck to run the CHECK TABLE , REPAIR TABLE , ANALYZE TABLE and OPTIMIZE TABLE queries in a convenient way for the user.

to run the , , and queries in a convenient way for the user. For more information: Checkout the Mysql official doc for mysqlcheck.

11. From within your running application, Execute the following query: [1]

SHOW VARIABLES WHERE Variable_name LIKE 'character\_set\_%' OR Variable_name LIKE 'collation%';

The output should be as follows:

+--------------------------+--------------------+

| Variable_name | Value |

+--------------------------+--------------------+

| character_set_client | utf8mb4 |

| character_set_connection | utf8mb4 |

| character_set_database | utf8mb4 |

| character_set_filesystem | binary |

| character_set_results | utf8mb4 |

| character_set_server | utf8mb4 |

| character_set_system | utf8 |

| collation_connection | utf8mb4_general_ci |

| collation_database | utf8mb4_unicode_ci |

| collation_server | utf8mb4_unicode_ci |

+--------------------------+--------------------+

10 rows in set (0.05 sec)

If the results are different, i.e showing latin1 any VARIABLE_NAME, then in the database configuration file of your application, set the encoding type to utf8mb4 or during application boot, execute the following query: