-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
Ian Smith wrote:
[snip]
The table spec says:
CREATE TABLE `mywiki_text` (
...
) ENGINE=MyISAM AUTO_INCREMENT=18452 DEFAULT CHARSET=latin1
You can load the current revision of a particular
page and save it to
a
file [snip] (You can use maintenance/eval.php to
run code within the
MediaWiki framework from the command line.)
Sweet! I've done that, and this is the offending section:
20 52 75 6e 20 74 79 70 65 20 e2 80 9c 67 70 65 > Run type ...gpe<
64 69 74 2e 6d 73 63 e2 80 3f 20 61 6e 64 20 70 >dit.msc..? and p<
The bad sequence (after "gpedit.msc") is "e2 80 3f": the same as what
I
got with my hex dump in the code.
Ok, can you confirm whether you have dumped this database from another
MySQL instance (for instance with mysqldump or phpmyadmin) and loaded it
into the current one?
In that case, it's possible that your data was corrupted during this
transfer. The corruption is caused by the two-way conversion between
Windows-1252 (Latin-1) to UTF-8 and back. Unlike a simple conversion
from ISO 8859-1 (Latin-1) to UTF-8 and back, this will irrecoverably
destroy four byte values in the 0x80-0x9f range which do not have
assigned characters in Windows-1252.
To prevent the corruption, use the --default-charset=latin1 option while
dumping the original database with mysqldump. This prevents it from
corrupting your data by applying false encoding conversions to the raw data.
- -- brion vibber (brion @
wikimedia.org)
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.3 (GNU/Linux)
Comment: Using GnuPG with Mozilla -
http://enigmail.mozdev.org
iD8DBQFGAEp0wRnhpk1wk44RAj8RAJ0TMjO5Hk/wAsukuQWPBi49CQImJQCfRStJ
sujXhijcpAVMeptD/4VoGQ0=
=w7fK
-----END PGP SIGNATURE-----