Brion Vibber wrote:
Ævar Arnfjörð Bjarmason wrote:
Without having actually looked at the code but it should be using the truncate() function from the language class, however the Language.php version of that function is not Unicode aware so stuff like this will continue happening until bug 2069 is solved (http://bugzilla.wikimedia.org/show_bug.cgi?id=2069)
I don't understand this claim. The LanguageUtf8 truncate *is* already UTF-8 aware; 2069 is a code layout issue only and does not affect functionality.
If there's a bug here, it's from failing to call the function in the first place and letting the database crop the field.
Right, to put it another way, LanguageUtf8 is the base class for every language class except LanguageLatin1. $wgLang->truncate() will always use the correct encoding for the wiki, it's only if you call it with Language::truncate() that you'll run into trouble.
Note that you can use mb_substr() in MediaWiki if you like, I implemented a simulation of it for systems without mbstring, using the /./u trick.
-- Tim Starling