A software is producing UTF-8 files, but writing content to the file that isn't unicode. I can't change that software and have to take the output as it is now. Don' t know if this will show up here correctly, but an german umlaut "ä" is shown in the file as "ä".
If I open the file in Notepad++, it tells me the file is UTF-8 (without BOM) encoded. Now, if I say "convert to ANSI" in Notepad and then switch the file encoding back to UTF-8 (without converting), the German umlauts in the file are correct. How can I achieve the exact same behaviour in Perl? Whatever I tried up to now, the umlaut mess just got worse.
To reproduce, create yourself an UTF-8 encoded file and write content to it:
Ok, I'll try. Create yourself a UTF-8 file and write this to it: Männer Schüle Vöogel SüÃ
Then, on an UTF-8 mysql database, create a table with varchar field an UTF8_unicode encoding. Now, use this script:
use utf8;
use DBI;
use Encode;
if (open FILE, "test.csv") {
my $db = DBI->connect(
'DBI:mysql:your_db;host=127.0.0.1;mysql_compression=1', 'root', 'Yourpass',
{ PrintError => 1 }
);
my $sql="";
my $sql = qq{SET NAMES 'utf8';};
$db->do($sql);
while (my $line = <FILE>) {
my $sth = $db->prepare("INSERT IGNORE INTO testtable (testline) VALUES (?);");
$sth->execute($line);
}
}
The exact contents of file will get written to the database. But, the output I expect in database is with German umlauts:
Männer Schüler Vögel Süß
So, how can I convert that correctly?
prepareoutside of thewhile-loop. The way you to it, theprepareis expensive and will be done for each line of your file. See the DBI doc outline section for more info.$line = utf8::decode($line);line as the first statement of yourwhileloop.