[Mediawiki-l] MySQL 4.1 & MediaWiki backup corruption warning

Brion Vibber brion at pobox.com
Thu Nov 18 09:30:00 UTC 2004


Just a note of warning for those of you using MySQL 4.1: changes in the 
new charset options may result in mysqldump outputting bogus data into 
backups which can't be restored without data loss.*

This may affect some Unicode text, and certainly can irretrievably 
corrupt compressed old revision text (using $wgCompressRevisions 
option). If you're using MySQL 4.1, you should probably examine and 
test your backup dumps to make sure they can be restored and used 
successfully.

Passing an option like --default-character-set=latin1 may stop 
mysqldump from trying to 'convert' (and thus corrupt) your data. (If 
your server is not set to the defaults, this may or may not be the 
correct value for you.) In the future hopefully we'll be able to play 
nicer with the new character set settings, but for now MediaWiki 
follows prior practice for older versions of MySQL where there was (and 
remains) no ability to correctly indicate the charset used in a 
particular database, table, or field.

* Specifically, a default "latin-1" to UTF-8 conversion silently 
corrupts all bytes with the values 0x81, 0x8d, 0x8f, 0x90, or 0x9d by 
turning them into literal question marks. The question marks cannot be 
returned to their original byte values when the data is re-imported.

-- brion vibber (brion @ pobox.com)
-------------- next part --------------
A non-text attachment was scrubbed...
Name: PGP.sig
Type: application/pgp-signature
Size: 186 bytes
Desc: This is a digitally signed message part
Url : http://lists.wikimedia.org/pipermail/mediawiki-l/attachments/20041118/6eed6081/attachment.pgp 


More information about the MediaWiki-l mailing list