[Mediawiki-l] Database backup and restore corrupts page titles with special characters

Webmaster Comunitatea Română webmaster at comunitatea-romana.com
Sat Jul 29 07:00:35 UTC 2006


Hi,
I am testing my backup and restore procedure and to my surprise I discovered that the page titles that contain special characters (in my case special Romanian characters) are being corrupted. 

I performed a backup of the database (using phpMyAdmin and also mysqldump) and then a restore and all the article titles that contained special Romanian characters were corrupted. Titles without special characters were not affected, and the body of all articles WITH or WITHOUT special characters were not affected. I'm running Wiki 1.6.3, source system is Linux with My SQL 4.1.19, destination system is Windows with My SQL 4.1.20. Other people have run into this issue before, and an OS change was not involved so honestly I don't believe the issues is caused by the different source and destination OS.

MySQL server collation is UTF-8, the database collation is latin1_swedish_ci, and columns are latin1_bin. 
Using MySQLHotCopy is not an option because my wiki is hosted at an ISP provider so MySQLHotCopy doesn't work.

So far I unsuccessfully tried the following:
*	downgraded one of MySQL installations (so that the machines have 4.1.19 and 4.1.20. Before, the Windows machine was running MySQL 5.*).
*	mysqldump with --default-character-set=latin1
*	update the generated SQL file and replaced 'corrupted sequences' with the correct characters, for instance ă with ă, ÅŸ with ş, .... That didn't work either. 

Here are some links where same or similar issues are being mentioned:
http://meta.wikimedia.org/w/index.php?title=Talk:SpecialDeleteOldRevisions&section=4#Side_effects_on_Page_Titles.3F

http://sourceforge.net/project/shownotes.php?release_id=322146
under READ THIS FIRST, TOO: MySQL 4.1 AND 5.0, 
The mysqldump backup generator now applies an automatic conversion to
UTF-8, which may irretrivably corrupt your data. Pass the -charset option
with the original default charset (eg 'latin1') to skip the conversion.

http://ez.no/community/forum/install_configuration/mysqldump_charset_problem
Any ideas or suggestions are more than welcome!
Adrian



More information about the MediaWiki-l mailing list