Brion Vibber wrote:
Add a quick output filter to Special:Export; there's a particular character one's supposed to use for invalid chars (check the Unicode specs).
Here's the one: U+FFFD REPLACEMENT CHARACTER • used to replace an incoming character whose value is unknown or unrepresentable in Unicode
In UTF-8 that should be "\xEF\xBF\xBD".
Note that LanguageUtf8.php contains a regexp for checking whether a string is valid UTF-8, you may find this useful.
-- brion vibber (brion @ pobox.com)