Thank you Hugh! Your email was helpful as I am looking for other backup tools that could be better.
I figured it out in the end. The problem was in fact me not being familiar with MySQL and being quick to assume that the encoding is UTF8. The charset of the source MySQL was in fact latin1 and not UTF8, the client connection charset was UTF8. So once I changed the destination server encoding to latin1, created mysqldump using --default-character-set=latin1, then restore worked for all chars.
_____
From: Webmaster Comunitatea Română [mailto:webmaster@comunitatea-romana.com] Sent: Saturday, July 29, 2006 12:01 AM To: mediawiki-l@Wikimedia.org; webmaster@comunitatea-romana.com Subject: Database backup and restore corrupts page titles with special characters Importance: High
Hi, I am testing my backup and restore procedure and to my surprise I discovered that the page titles that contain special characters (in my case special Romanian characters) are being corrupted.
I performed a backup of the database (using phpMyAdmin and also mysqldump) and then a restore and all the article titles that contained special Romanian characters were corrupted. Titles without special characters were not affected, and the body of all articles WITH or WITHOUT special characters were not affected. I'm running Wiki 1.6.3, source system is Linux with My SQL 4.1.19, destination system is Windows with My SQL 4.1.20. Other people have run into this issue before, and an OS change was not involved so honestly I don't believe the issues is caused by the different source and destination OS.
MySQL server collation is UTF-8, the database collation is latin1_swedish_ci, and columns are latin1_bin. Using MySQLHotCopy is not an option because my wiki is hosted at an ISP provider so MySQLHotCopy doesn't work.
So far I unsuccessfully tried the following:
* downgraded one of MySQL installations (so that the machines have 4.1.19 and 4.1.20. Before, the Windows machine was running MySQL 5.*).
* mysqldump with --default-character-set=latin1
* update the generated SQL file and replaced 'corrupted sequences' with the correct characters, for instance ă with ă, ÅŸ with ş, .... That didn't work either.
Here are some links where same or similar issues are being mentioned: http://meta.wikimedia.org/w/index.php?title=Talk:SpecialDeleteOldRevisions§ion=4#Side_effects_on_Page_Titles.3F http://meta.wikimedia.org/w/index.php?title=Talk:SpecialDeleteOldRevisions%C...
http://sourceforge.net/project/shownotes.php?release_id=322146 http://sourceforge.net/project/shownotes.php?release_id=322146 under READ THIS FIRST, TOO: MySQL 4.1 AND 5.0, The mysqldump backup generator now applies an automatic conversion to UTF-8, which may irretrivably corrupt your data. Pass the -charset option with the original default charset (eg 'latin1') to skip the conversion.
http://ez.no/community/forum/install_configuration/mysqldump_charset_problem http://ez.no/community/forum/install_configuration/mysqldump_charset_problem Any ideas or suggestions are more than welcome! Adrian