-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1
Ian Smith wrote: [snip]
The table spec says:
CREATE TABLE `mywiki_text` ( ... ) ENGINE=MyISAM AUTO_INCREMENT=18452 DEFAULT CHARSET=latin1
You can load the current revision of a particular page and save it to
a
file [snip] (You can use maintenance/eval.php to run code within the MediaWiki framework from the command line.)
Sweet! I've done that, and this is the offending section:
20 52 75 6e 20 74 79 70 65 20 e2 80 9c 67 70 65 > Run type ...gpe< 64 69 74 2e 6d 73 63 e2 80 3f 20 61 6e 64 20 70 >dit.msc..? and p<
The bad sequence (after "gpedit.msc") is "e2 80 3f": the same as what I got with my hex dump in the code.
Ok, can you confirm whether you have dumped this database from another MySQL instance (for instance with mysqldump or phpmyadmin) and loaded it into the current one?
In that case, it's possible that your data was corrupted during this transfer. The corruption is caused by the two-way conversion between Windows-1252 (Latin-1) to UTF-8 and back. Unlike a simple conversion from ISO 8859-1 (Latin-1) to UTF-8 and back, this will irrecoverably destroy four byte values in the 0x80-0x9f range which do not have assigned characters in Windows-1252.
To prevent the corruption, use the --default-charset=latin1 option while dumping the original database with mysqldump. This prevents it from corrupting your data by applying false encoding conversions to the raw data.
- -- brion vibber (brion @ wikimedia.org)