I recently dumped (well, someone else dumped it) the data from a
MediaWiki database stored in a MySQL v3.23.58 server and imported it
into a v4.0.21 server.
I noticed on the new instance of the wiki, running on MySQL v4.0.21,
that there were some pages where the text wasn't displaying properly
and on Editing the page, the 'bad' data was replaced with a number of
question marks. Without saving the Edit, I noticed that characters in
the database were garbled (I temporarily do not have access to the
original site so I can't verify the exact original data but it was
served correctly there just before that server was taken off-line and
the dump produced.) and that the "garbling" originated in the dump file
(probably created in the dump process).
Is this a known issue and is there a way to prevent/correct it? I know
that MySQL v4.0.x is recommended for MediaWiki but the reasons given
seem to be related to performance.
I saw a note in the list archive related to similar issues with MySQL
but I'm not certain this is exactly the same issue since the dump
didn't actually turn them into question marks and I can't positively
identify the characters from the original database that were corrupted.
I think one of them was 0xe8 or 0xe9 (è or é, if those display
correctly in this email -- è or é in HTML encoding) but,
being a hopeless English speaking monoglot, I don't know for sure which
would have been used.