Ok, can you confirm whether you have dumped this database from another MySQL instance (for instance with mysqldump or phpmyadmin) and loaded
it
into the current one?
Aha!!! Yes, that is indeed what I did. I was upgrading Apache, PHP and MySQL, so I cloned the old Wiki to the new setup on the same machine, then ran the MediaWiki upgrade.
In that case, it's possible that your data was corrupted during this transfer. The corruption is caused by the two-way conversion between Windows-1252 (Latin-1) to UTF-8 and back. Unlike a simple conversion from ISO 8859-1 (Latin-1) to UTF-8 and back, this will irrecoverably destroy four byte values in the 0x80-0x9f range which do not have assigned characters in Windows-1252.
OK, that makes perfect sense! Sorry, I guess I assumed that the mysqldump / restore -- particularly on the same machine -- would preserve the data. Clearly not.
To prevent the corruption, use the --default-charset=latin1 option
while
dumping the original database with mysqldump. This prevents it from corrupting your data by applying false encoding conversions to the raw data.
Sounds like excellent advice for people following in my footsteps... too late for me, but thanks for getting to the bottom of this, anyhow.
Maybe a note about this should be added to the wiki-moving instructions:
http://www.mediawiki.org/wiki/Manual:Moving_a_wiki
Ian