Thank you Hugh! Your email was helpful as I am looking for other backup tools that could
be better.
I figured it out in the end. The problem was in fact me not being familiar with MySQL and
being quick to assume that the encoding is UTF8.
The charset of the source MySQL was in fact latin1 and not UTF8, the client connection
charset was UTF8. So once I changed the destination server encoding to latin1, created
mysqldump using --default-character-set=latin1, then restore worked for all chars.
_____
From: Webmaster Comunitatea Română [mailto:webmaster@comunitatea-romana.com]
Sent: Saturday, July 29, 2006 12:01 AM
To: mediawiki-l(a)Wikimedia.org; webmaster(a)comunitatea-romana.com
Subject: Database backup and restore corrupts page titles with special characters
Importance: High
Hi,
I am testing my backup and restore procedure and to my surprise I discovered that the page
titles that contain special characters (in my case special Romanian characters) are being
corrupted.
I performed a backup of the database (using phpMyAdmin and also mysqldump) and then a
restore and all the article titles that contained special Romanian characters were
corrupted. Titles without special characters were not affected, and the body of all
articles WITH or WITHOUT special characters were not affected. I'm running Wiki 1.6.3,
source system is Linux with My SQL 4.1.19, destination system is Windows with My SQL
4.1.20. Other people have run into this issue before, and an OS change was not involved so
honestly I don't believe the issues is caused by the different source and destination
OS.
MySQL server collation is UTF-8, the database collation is latin1_swedish_ci, and columns
are latin1_bin.
Using MySQLHotCopy is not an option because my wiki is hosted at an ISP provider so
MySQLHotCopy doesn't work.
So far I unsuccessfully tried the following:
* downgraded one of MySQL installations (so that the machines have 4.1.19 and 4.1.20.
Before, the Windows machine was running MySQL 5.*).
* mysqldump with --default-character-set=latin1
* update the generated SQL file and replaced 'corrupted sequences' with the
correct characters, for instance ă with ă, ÅŸ with ş, .... That didn't work either.
Here are some links where same or similar issues are being mentioned:
<http://meta.wikimedia.org/w/index.php?title=Talk:SpecialDeleteOldRevisions§ion=4#Side_effects_on_Page_Titles.3F>
http://meta.wikimedia.org/w/index.php?title=Talk:SpecialDeleteOldRevisions&…
<http://sourceforge.net/project/shownotes.php?release_id=322146>
http://sourceforge.net/project/shownotes.php?release_id=322146
under READ THIS FIRST, TOO: MySQL 4.1 AND 5.0,
The mysqldump backup generator now applies an automatic conversion to
UTF-8, which may irretrivably corrupt your data. Pass the -charset option
with the original default charset (eg 'latin1') to skip the conversion.
<http://ez.no/community/forum/install_configuration/mysqldump_charset_problem>
http://ez.no/community/forum/install_configuration/mysqldump_charset_problem
Any ideas or suggestions are more than welcome!
Adrian