This document describes how to convert your MySQL database (DB) from the Latin-1 character set to the Unicode format (UTF-8). This document is adapted from https://docs.moodle.org/32/en/Converting_your_MySQL_database_to_UTF8#Converting_an_empty_database .
The CloudBees CD (CloudBees Flow) server requires UTF-8 for better multilingual support. It requires that you store all your data in UTF-8 and case-insensitive collation (utf8_general_ci).
A test case to convert a CHARACTER SET latin1 COLLATE latin1_swedish_ci
database to CHARACTER SET utf8 COLLATE utf8_general_ci
is at the end of this document. The test case files are in the testcase_convert_latin1_charset_to_utf8_and_ci_collation.zip
attachment.
Changing the Default MySQL Character Set to UTF-8
You must change MySQL to use UTF-8 as its character set and change your DB to UTF-8. The procedures below also cover making the UTF-8 DB versions using mysqldump
.
-
Edit the
client
section of/etc/my.cnf
(the MySQL configuration file) to read:
[client] ... .... ... character-set-server=utf8 ....
-
Edit the
mysqld
section in/etc/my.cnf
to read:
[mysqld] ... ... character-set-server=utf8 collation-server=utf8_general_ci ... ...
After changing your default character set to UTF-8, you will use mysqldump
to restore your DB with the --skip-character-set
parameter to restore it with your new default UTF-8 character set.
The default-character-set and default-collation options were deprecated in MySQL 5.5.43. If you’re using MySQL version 5.5.43 or newer, you must remove these options from your my.cnf or my.ini configuration files. Otherwise, you may see the following error message:
|
Operation exception: org.hibernate.exception.GenericJDBCException: could not execute batch org.hibernate.exception.GenericJDBCException: could not execute batch at org.hibernate.exception.internal.StandardSQLExceptionConverter.convert(StandardSQLExceptionConverter.java:54) ...... Caused by: java.sql.BatchUpdateException: Incorrect string value: '\x96d "|"...' for column 'command_clob' at row 1
Converting an Empty Database
If you created your DB schema and your DB is still empty, complete the following steps:
-
Log in to your MySQL instance.
-
Run the following query in your DB to convert it to UTF-8:
ALTER DATABASE mydatabasename charset=utf8;
Converting a Database Containing Tables
If your CloudBees CD (CloudBees Flow) server already has a populated MySQL DB and you are getting an error, use the following procedures to convert your database.
Linux and MacOS
The following steps show how to create a DB dump, edit the DB dump so that the correct charset and collation are used, and then restore the new DB.
-
Open a terminal window and move to a temporary directory.
-
Dump out the DB:
mysqldump -u username -p password -c -e --default-character-set=utf8 --single-transaction --skip-set-charset --add-drop-database -B dbname > dump.sql
The arguments in this command clean up the character sets and provide a dump that will not cause problems if you are moving the DB to a different DB server or need to restore the DB on a reverted system. For complete descriptions of these arguments, see https://dev.mysql.com/doc/refman/5.7/en/mysqldump.html.
When you run this command, a DB dump is generated into dump.sql
.
-
Next, create a backup of
dump.sql
by entering:
cp dump.sql dump-fixed.sql
You will modify dump-fixed.sql
and will keep dump.sql
as a backup.
-
Edit the dump file and correct the incorrect character that have been used. You can use any search-and-replace editor or program such as VIM:
vim dump-fixed.sql :%s/DEFAULT CHARACTER SET latin1/DEFAULT CHARACTER SET utf8 COLLATE utf8_general_ci/ :%s/DEFAULT CHARSET=latin1/DEFAULT CHARSET=utf8/ :wq
The first command modifies the DB’s default character set and collation by replacing all instances of DEFAULT CHARACTER SET latin1
with DEFAULT CHARACTER SET utf8 COLLATE utf8_general_ci
. The second command converts all tables from Latin-1 to UTF-8 by replacing all instances of DEFAULT CHARSET=latin1
with DEFAULT CHARSET=utf8
. The third command saves the file and exits the editor.
-
Restore the DB over the top of the existing DB:
mysql -u username -p password < dump-fixed.sql
Examples
mysqldump -u username -p password -c -e --default-character-set=utf8 --single-transaction --skip-set-charset --add-drop-database -B dbname > dump.sql cp dump.sql dump-fixed.sql vim dump-fixed.sql :%s/DEFAULT CHARACTER SET latin1/DEFAULT CHARACTER SET utf8 COLLATE utf8_general_ci/ :%s/DEFAULT CHARSET=latin1/DEFAULT CHARSET=utf8/ :wq mysql -u username -p password < dump-fixed.sql
or alternatively using sed
:
# $1-dbusername $2-password $3-dbname mysqldump -u$1 -p$2 -c -e --default-character-set=utf8 --single-transaction --skip-set-charset --add-drop-database -B $3 > dump.sql sed -i.bak -e 's/DEFAULT CHARACTER SET latin1/DEFAULT CHARACTER SET utf8 COLLATE utf8_general_ci/' -e 's/DEFAULT CHARSET=latin1/DEFAULT CHARSET=utf8/' dump.sql mysql -u$1 -p$2 < dump.sql
Windows
The following steps show how to create a DB dump, edit it so that the correct character set (utf8
) and collation (utf8_general_ci
) are used, and then restore the new DB . For this, first download Super Sed (Win32 executable, zipped).
-
Open a command window and move to a temporary folder.
-
Enter:
\bin\mysqldump -u [username] -p [password] -c -e --default-character-set=utf8 --single-transaction --skip-set-charset --add-drop-database -B [dbname] -r dbdump.sql ssed -e "s/DEFAULT CHARACTER SET latin1/DEFAULT CHARACTER SET utf8 COLLATE utf8_general_ci/" dbdump.sql | ssed -e "s/DEFAULT CHARSET=latin1/DEFAULT CHARSET=utf8/" >dbdump_w.sql [your mysql dir]\bin\mysql -u [username] -p[password] [dbname] < dbdump_w.sql
For complete descriptions of these arguments, see https://dev.mysql.com/doc/refman/5.7/en/mysqldump.html.
If Special Characters Do Not Import Correctly
Under certain circumstances, when you restore a UTF-8-encoded MySQL dump, international special characters (which you can see correctly using, for example, the VIM editor), might not import correctly. In such cases, you may want to try the following under a Linux or UNIX system:
-
Log in to MySQL.
-
Create a DB with UTF-8 encoding.
-
Import your dump by using the MySQL
source
command:
# cd /folder_where_your_dump_is/ # mysql -u your_user -p > create database yourdb charset=utf8; > use yourdb; > SET NAMES 'utf8'; > source db_dump.sql > quit;
Test Case to Convert a CHARACTER SET latin1 COLLATE latin1_swedish_ci database to CHARACTER SET utf8 COLLATE utf8_general_ci
This test case converts a CHARACTER SET latin1 COLLATE latin1_swedish_ci
DB to CHARACTER SET utf8 COLLATE utf8_general_ci
. The test case files are in the testcase_convert_latin1_charset_to_utf8_and_ci_collation.zip
attachment.
-
Log in to your MySQL instance.
-
Create a DB that contains
DEFAULT CHARACTER SET latin1 COLLATE latin1_swedish_ci
:
CREATE DATABASE latin1_test_db DEFAULT CHARACTER SET latin1 COLLATE latin1_swedish_ci;
-
Switch to that DB:
USE latin1_test_db;
-
Check that the collation in the DB is
latin1
:
show variables like '%collation%';
The output should appear as follows:
collation_connection utf8_general_ci collation_database latin1_swedish_ci collation_server utf8_general_ci
The collation_database latin1_swedish_ci
line shows the collation.
-
Create a table with
CHARSET=latin1
:
CREATE TABLE `DATABASECHANGELOG` ( `ID` varchar(255) NOT NULL, `AUTHOR` varchar(255) NOT NULL, `FILENAME` varchar(255) NOT NULL, `DATEEXECUTED` datetime NOT NULL, `ORDEREXECUTED` int(11) NOT NULL, `EXECTYPE` varchar(10) NOT NULL, `MD5SUM` varchar(35) DEFAULT NULL, `DESCRIPTION` varchar(255) DEFAULT NULL, `COMMENTS` varchar(255) DEFAULT NULL, `TAG` varchar(255) DEFAULT NULL, `LIQUIBASE` varchar(20) DEFAULT NULL ) ENGINE=InnoDB DEFAULT CHARSET=latin1;
-
From an OS command line, enter the following command to export the schema:
mysql --host=localhost --port=3306 --user=root --password=password --html --execute "select '-----' ;SELECT default_character_set_name, default_collation_name FROM information_schema.SCHEMATA WHERE schema_name = 'latin1_test_db';select '-----'; SELECT T.table_name, T.table_collation, CCSA.character_set_name FROM information_schema.TABLES T, information_schema.COLLATION_CHARACTER_SET_APPLICABILITY CCSA WHERE CCSA.collation_name = T.table_collation AND T.table_schema = 'latin1_test_db'; select '-----'; SELECT table_name, column_name, character_set_name, collation_name FROM information_schema.COLUMNS WHERE character_set_name != 'NULL' AND table_schema = 'latin1_test_db'; select '-----' ;" > 2_schema_latin1_test_db.html
7 Check the CHARACTER SET
and collation for the DB, table, and columns.
-
Insert test data to make sure the table is not empty:
insert into databasechangelog(id,author,filename,dateexecuted,orderexecuted,exectype) values('1','suresh','a.txt',sysdate(),1,'xml')
-
From an OS command line, enter the following command to export the dump as mentioned above:
mysqldump --host=localhost --port=3306 --user=root --password=password -c -e --default-character-set=utf8 --single-transaction --skip-set-charset --add-drop-database -B latin1_test_db -r 3_latin1_test_dbdump.sql
-
From an OS command line, enter the following command to fix the character set and collation in the exported file to
utf8
:
ssed -e "s/DEFAULT CHARACTER SET latin1/DEFAULT CHARACTER SET utf8 COLLATE utf8_general_ci/" 3_latin1_test_dbdump.sql | ssed -e "s/DEFAULT CHARSET=latin1/DEFAULT CHARSET=utf8/" > 4_latin1_test_dbdump_now_utf8.sql
-
From an OS command line, enter the following command to import the new dump, which now has the
utf8
CHARACTER SET
and collation for the DB, table, and columns:
mysql --host=localhost --port=3306 --user=root --password=password --database=latin1_test_db < 4_latin1_test_dbdump_now_utf8.sql
-
Exit the DB and then reconnect and check that the collation in the DB is
utf8
:
show variables like '%collation%';
The output should appear as follows:
collation_connection utf8_general_ci collation_database utf8_general_ci collation_server utf8_general_ci
The collation_database utf8_general_ci
line shows the collation.
-
From an OS command line, enter the following command to export the schema:
mysql --host=localhost --port=3306 --user=root --password=password --html --execute "select '-----' ;SELECT default_character_set_name, default_collation_name FROM information_schema.SCHEMATA WHERE schema_name = 'latin1_test_db';select '-----'; SELECT T.table_name, T.table_collation, CCSA.character_set_name FROM information_schema.TABLES T, information_schema.COLLATION_CHARACTER_SET_APPLICABILITY CCSA WHERE CCSA.collation_name = T.table_collation AND T.table_schema = 'latin1_test_db'; select '-----'; SELECT table_name, column_name, character_set_name, collation_name FROM information_schema.COLUMNS WHERE character_set_name != 'NULL' AND table_schema = 'latin1_test_db'; select '-----' ;" > 5_schema_latin1_test_db_now_utf8.html
-
Check that the
CHARACTER SET
and collation for the DB, table, and columns are nowutf8
and case-insensitive (*_ci
) collation.