MySQL 5 is scheduled to come out of beta next month, and we're going to be looking at upgrading sometime in the coming months. Among other things we're probably going to want to start making use of the support for Unicode collation, so we can get better sorting and perhaps use it for case-insensitive matching.
There is however a compatibility issue: MySQL's Unicode support is limited to the 16-bit character range (basic multilingual plane), both for ucs2 and utf8 storage modes.
Characters beyond the BMP are relatively rare, but they do occur. Mostly in there are ancient/dead scripts, some invented scripts, and a bunch of rare Han characters which sometimes turn up in Chinese and Japanese.
This won't affect page _contents_; our content is stored in binary blobs and can have any wacky characters we want. But to support these high characters in page titles, usernames, and such might require jumping through a lot of hoops.
It would be relatively simple to disable use of titles and usernames with these high characters; to assess possible impact I did a check through all our current wikis and found 99 extant pages:
43 in en.wiktionary.org 31 in got.wikipedia.org 10 in la.wiktionary.org 9 in zh.wikipedia.org 3 in so.wikipedia.org 1 in en.wikibooks.org 1 in ja.wikipedia.org 1 in nl.wikibooks.org
I've put the full list of pages here: http://meta.wikimedia.org/wiki/User:Brion_VIBBER/Unicode_high_chars
Most of the en.wiktionary entries are individual letters in the Deseret and Shavian alphabets (invented alphabets for English; historical curiosities).
The Gothic alphabet is entirely in the high-character area, but it's a long-dead language and not exactly an active wiki. Perhaps we should just close it down...
Latin Wiktionary contains several Gothic terms...
The Chinese Wikipedia contains several apparently legitimate articles (from what I can tell) using high characters; these might have to be moved. The Japanese Wikipedia has one redirect with such a character.
The Somali Wikipedia contains three one-sentence stub pages pages using the Osmanya script; Omniglot's article on it says this script is no longer in use since adoption of the Latin alphabet in 1972.
English Wikibooks has a user account with a Gothic-script name, which has edited a number of pages about the Gothic language and has a user page.
Dutch Wikibooks has one Gothic-titled redirect.
-- brion vibber (brion @ pobox.com)