A lot of the character-encoding stuff in the present code is a mess too. I understand well and can handle all the details between server and browser, but two things I don't know all the quirks of are PHP and MySQL, so this is my attempt to pick the brains of those who have already found those problems:
(1) Is MySQL 8-bit clean? If I store a chunk of 8-bit bytes in a text field, will I get them back unmolested, or will MySQL try to be "helpful" and fuck them up? If the latter, what are the limitations of what can be stored in a text field and where is that documented?
(2) Are PHP strings 8-bit clean? I'd be amazed if they weren't, considering how much of PHP is modelled on Perl.
(3) Is the PHP on wikipedia.com compiled with the "iconv" library (an optional thing), and does PHP use it as documented?