There's an impersonator on dewiki who added NBSP (U+00A0) to another user's username. The impersonator's account seems to be unblockable by normal means as the trailing NBSP at some stage of blocking gets stripped and the legitimate user is blocked.
Is this problem known?
Is there a known workaround?
Regards, Peter Jacobi
Peter Jacobi wrote:
There's an impersonator on dewiki who added NBSP (U+00A0) to another user's username. The impersonator's account seems to be unblockable by normal means as the trailing NBSP at some stage of blocking gets stripped and the legitimate user is blocked.
Is this problem known?
Is there a known workaround?
Regards, Peter Jacobi
Given that this is an arms-race between the vandals and the software developers, it might be a good idea to implement the following restrictions on creating new usernames.
Usernames should first be NKFC normalized, and then be restricted to contain only:
SPACE The printing ASCII punctuation characters The ASCII digits MIDDLE DOT (for Catalan) Combining marks
and script-specific characters from _exactly one_ writing system; namely, codepoints that belong only to a single script, or any subset of one of the following three script combinations:
HAN + BOPOMOFO
HAN + KATAKANA + HIRAGANA
HAN + HANGUL
In addition, characters from the Hangul Jamo and Braille blocks and any characters from any compatibility-character, presentation form, half width, full width, spacing modifier, small, phonetic, superscript, enclosed, symbol, mathematical, musical, and punctuation Unicode blocks, as well as all of the exotic metacharacters and undefined characters, should be explicitly forbidden.
(This is based on the recommendations of UTR #36, with some tweaks, based on a _lot_ of other research into name-spoofing.)
Although these rules sound restrictive, they will actually allow almost any reasonable username through whilst preventing many, many nasty tricks. (Yes, Coptic disunification is a problem, but I know how to fix that special case ;-)
Although these rules sound hairy, they can actually be reduced to just one range-lookup table, and about thirty lines of code.
_That_ should slow the more devious vandals down a bit, and I have more tricks up my sleeve for later.
-- Neil
Neil Harris wrote: [snip]
Although these rules sound restrictive, they will actually allow almost any reasonable username through whilst preventing many, many nasty tricks. (Yes, Coptic disunification is a problem, but I know how to fix that special case ;-)
Although these rules sound hairy, they can actually be reduced to just one range-lookup table, and about thirty lines of code.
Would you mind whipping up a table to work with? (Or a list of regexes; PCRE understands UTF-8 character ranges with the /u modifier.) I've already got a Unicode normalization library in MediaWiki so NFKC is easy.
-- brion vibber (brion @ pobox.com)
Neil Harris wrote:
Given that this is an arms-race between the vandals and the software developers, it might be a good idea to implement the following restrictions on creating new usernames.
Usernames should first be NKFC normalized, and then be restricted to contain only:
[snip]
Although these rules sound hairy, they can actually be reduced to just one range-lookup table, and about thirty lines of code.
_That_ should slow the more devious vandals down a bit, and I have more tricks up my sleeve for later.
And, before you ask, yes, I _do_ have some GPL-licenced code for doing this. It's in C, but it's rather straightforward, and should be easily translatable to the language of your choice. If anyone is seriously interested in addressing the issue of bogus and spoofed usernames, I'd be happy to send it to them.
-- Neil
wikitech-l@lists.wikimedia.org