[WikiEN-l] Eliminating homographs in usernames: was: Re: Pruning "dead" accounts (was Re: New York Times article)
Neil Harris
usenet at tonal.clara.co.uk
Tue Jun 20 18:14:16 UTC 2006
Chris Lüer wrote:
> At 09:59 AM 6/19/2006, Guettarda wrote:
>
>
>> Don't forget the accounts which exist to prevent impersonation in one's main
>> language - names with capital I's to mimic L's and Cyrillic characters.
>> Deleting these would just open things up for abuse again.
>>
>
> Is there are a reason why user names with weird Unicode characters
> are even allowed? It would seem sensible to limit user names on each
> Wikipedia to the alphabet that is used in that language.
>
> Chl
>
>
>
A suggestion, based on practices used for IDN registration:
Restrict new usernames on the en: Wikipedia to characters from the Latin
alphabet and selected punctuation only (and possibly digits as well).
Before allowing a username to be registered, generate a canonical
comparison form by Unicode normalization, lowercasing, punctuation and
space suppression and accent-stripping, followed by homograph
canonicalizations such as mapping both digit zero and letter O to the
latter, digit 1 and letter L to the latter, eth to lowercase d, etc.
A new username should then only be allowed to be registered if the
comparison form of the proposed new username is different from the
comparison form of every existing username (which are stored in an
indexed table, alongside the full, uncanonicalized name that actually
gets registered).
Doing this will eliminate the vast majority of all simple username
spoofing hacks.
Existing usernames get grandfathered in, of course.
-- Neil
More information about the WikiEN-l
mailing list