1

I'm not sure if it's a bug or I'm doing something wrong:

I read data per

open my $fh, "<:encoding(iso-latin1)", $file or die "Failed to open $file: $!";

$file is definitely in iso-latin1.

Then I have a mysql table which is

ENGINE=InnoDB AUTO_INCREMENT=53072 DEFAULT CHARSET=latin1

I check the connection settings:

$dbh->prepare("show variables");

Which gives

character_set_client, latin1
character_set_connection, latin1
character_set_database, latin1
character_set_filesystem, binary
character_set_results, latin1
character_set_server, latin1
character_set_system, utf8

So to me everything should be fine:

  • Table is iso-latin1
  • Data was iso-latin1 (should be perl internal character format now)
  • Connection info shows the right settings
  • Output to STDOUT (terminal is iso-latin1) is correct

But: Data in table is plain utf8 (most probably perl's internal format in this case).

Did I miss something is this maybe a bug in DBI/DBD::mysql?

1 Answer 1

1

My guess would be that you're right and this data is in Perl's internal character format. The sequence goes like this.

  • Data in input file stored as Latin-1 bytes
  • Data read from input file and auto-converted to Perl characters because of the encoding option on your open statement
  • Data sent to MySQL as Perl characters
  • MySQL slightly confused by getting UTF8 instead of Latin-1, but stores it anyway as best it can

The step your missing is to encode you Perl characters back into Latin-1 before sending them to the database. The obvious solution is to call encode('iso-885901', $string) on every value you sent to the database. It would be nice if there was some kind auto-encode option. But I can't find one.

Of course, if your data is all going to be Latin-1, then you could consider just ignoring any decoding/encoding issues. It should all just work without that complication.

Sign up to request clarification or add additional context in comments.

2 Comments

I understand that encoding would probably solve the issue but shouldn't perl handle this autmatically? I mean there is no ambiguity here: Perl knows that the databse expects iso latin bytes and not utf8 characters.
Why do you think that Perl knows that?

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.