[Mediawiki-l] The best way to backup a Mediawiki ?

Frames Project frames at lct.jussieu.fr
Sun May 6 09:18:42 UTC 2007


Brion Vibber a écrit :
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
>
> Sylvain Machefert wrote:
>   
>> Hi Brion,
>> what is strange, is that only the titles are affected, not the content of
>> the pages. Is that normal ?
>>     
>
> Yes -- the page text is in a binary BLOB field, which will not undergo
> the bogus lossy conversion.
>
> The summarize, the problem is roughly:
>
> * MediaWiki assumes that MySQL will preserve data that is put into it
> * MySQL sometimes corrupts the data
>
> in a little more detail:
>
> * Due to the limitations of MySQL's Unicode support, but default we
> continue to treat MySQL fields as binary and store pure UTF-8 Unicode in
> them, although MySQL may have them listed as Latin-1 depending on your
> server's defaults.
>
> * The mysqldump backup program by default in 4.1 and later applies a
> conversion of non-binary fields to UTF-8, with a marker to have them
> appropriately converted back when read in.
>
> * This conversion is lossy -- it treats Latin-1 as the Windows-1252 code
> page, which is an extension of ISO 8859-1 with additional characters in
> the 128-159 range which in ISO 8859 and Unicode is supposed to contain
> non-printing control characters. Four of the code points in this range
> are not assigned in Windows-1252, and so cannot be converted to UTF-8
> Unicode -- these characters are silently corrupted into "?" characters
> during the conversion if they appear.
>
> * The UTF-8 encoding of Unicode uses the byte values which correspond to
> those four non-convertible characters.
>
> * As a result, UTF-8 text in a Latin-1 field may be corrupted, as some
> characters are destroyed in the conversion back and forth.
>
> Use the --default-charset=latin1 option on mysqldump when creating your
> database dumps to avoid this lossy conversion. (And/or find another way
> to dump/copy databases or another equivalent option to avoid the
> unnecessary conversion.)
>
> Since it appears that your hosting provider did this for you, you may
> need to ask them to redo it. Alternatively, you may be able to rig up a
> statistical fix based on which characters are being corrupted, though
> I'm not sure how easy that would be.
>
> - -- brion vibber (brion @ wikimedia.org)
>
>   

Thank you Brion for these explanations.
I now understand why I had problems using mysqldump.

It seems that few people are aware of this dangerous behaviour of mysql 
and mysqldump.
Is there any documentation on the best way to backup a Mediawiki and its 
database, except
the message upper ?

Francois Colonna



More information about the MediaWiki-l mailing list