I have a string with some special characters I am trying to put in a database. Namely chessgames from François-André Danican Philidor. When I try to put this name in my MySQL database using DBI, I get this error...
HAND CHECK: Fran�ois Andr� Philidor||NN||1||0||Fran�ois Andr� Philidor||NN||
DBD::mysql::st execute failed: Incorrect string value: '\xE7ois A...' for column 'white_player' at row 1 at chessgames.pl line 110, <GEN0> line 2360.
SQL Error: Incorrect string value: '\xE7ois A...' for column 'white_player' at row 1
Fran�ois Andr� Philidor||NN||1||0||Fran�ois Andr� Philidor||NN||
MySQL is having trouble understanding the special characters in the name, specifically the ç and the é. The first thing that helped was to add the following to my script...
use utf8; #some names have utf8 characters
binmode(STDOUT, ':utf8');
These commands changed the output so that when Perl printed the name with special characters, it printed the special characters properly. But MySQL still did not understand the special characters. Output changed slightly to this...
HAND CHECK: François André Philidor||NN||1||0||François André Philidor||NN||
DBD::mysql::st execute failed: Incorrect string value: '\xE7ois A...' for column 'white_player' at row 1 at chessgames.pl line 110, <GEN0> line 2360.
SQL Error: Incorrect string value: '\xE7ois A...' for column 'white_player' at row 1
Fran�ois Andr� Philidor||NN||1||0||Fran�ois Andr� Philidor||NN||
My manual check was working, but the MySQL query was still returning an error. I tried some of the other solutions listed above...
$dbh->do('SET NAMES utf8');
This solution did not work for me and produced the same errors.
$dbh->{'mysql_enable_utf8'} = 1;
This solution also did not work. A different question suggested when creating the table in SQL, to use this command. How to store unicode in MySQL?
white_player VARCHAR(128) CHARACTER SET utf8 COLLATE utf8_unicode_ci NOT NULL,
This did not work as well. I decided to look at the Unicode section in perldoc. I read through
$ perldoc perlunicode
Which pointed to
$ perldoc perlunitut
Which in turn pointed to
$ perldoc perlpacktut
Which is where I found the solution that worked for me. Here is the relevant text from perldoc perlpacktut under the Unicode section...
Please note: in the general case, you're better off using
"Encode::decode('UTF-8', $utf)" to decode a UTF-8 encoded byte string to
a Perl Unicode string, and "Encode::encode('UTF-8', $str)" to encode a
Perl Unicode string to UTF-8 bytes. These functions provide means of
handling invalid byte sequences and generally have a friendlier
interface.
Encoding (as a verb) is the conversion from *text* to *binary*.
Decoding is the conversion from *binary* to *text*.
I added the following command to my code
use Encode qw(encode decode); #suggestion from perldoc perlpacktut
$whiteplayer = decode("UTF-8",$pgn->white);
However this also didn't work, database still shows Fran�ois Andr� Danican Philidor instead of François André Danican Philidor. I eventually found this answer
execute failed: Incorrect string value: '\xE4rvine...' with mariadb and perl DBD
And saw that this encoding was not UTF8, but a similar encoding iso-8859-1, also known as latin-1. I changed my decode statement to
$whiteplayer = decode("iso-8859-1",$pgn->white);
and finally it was working! The code is long and non-obvious, but I will sum up the solution as the following.
- include
use utf8;
- include
binmode(STDOUT, ':utf8');
- do a little reading namely
perldoc perlunicode, perldoc perlunitut, and perldoc perlpacktut
- find the correct encoding of the text you want to enter into the database. Usually
UTF-8 but check to make sure. Note: Careful not to mix up terms in the decode function. It is UTF-8 and not UTF8.
- include
use Encode qw(encode decode); and decode the string with special characters using something like $newstring=decode("UTF-8",$oldstring);
Here is some code to check if your string contains non ASCII characters...
while ( $string =~ /([^\x00-\x7f])/g ){
print "string: $string contains non-ASCII character: $1\n"
}
print "\n";
As long as CPAN is configured correctly, you can install the necessary packages by running
$ cpan
$ cpan[1]> install utf8 Encode
use utf8; binmode(STDOUT, ':utf8');this makes Perl output UTF8 characters correctly. And the UTF8 string that you insert into the database, make sure you runuse Encode qw(encode decode); $newstring=decode("UTF-8",$oldstring);. This is currently working for me. I talked about this more in my answer at the bottom, but neither the accepted solution nor any of the other solutions on either page worked. The database itself didnt require any configuration.