Neil Harris wrote: [snip]
Although these rules sound restrictive, they will actually allow almost any reasonable username through whilst preventing many, many nasty tricks. (Yes, Coptic disunification is a problem, but I know how to fix that special case ;-)
Although these rules sound hairy, they can actually be reduced to just one range-lookup table, and about thirty lines of code.
Would you mind whipping up a table to work with? (Or a list of regexes; PCRE understands UTF-8 character ranges with the /u modifier.) I've already got a Unicode normalization library in MediaWiki so NFKC is easy.
-- brion vibber (brion @ pobox.com)