[WikiEN-l] Eliminating homographs in usernames: was: Re: Pruning "dead" accounts (was Re: New York Times article)

20 Jun 2006


      Chris Lüer wrote:
...
At 09:59 AM 6/19/2006, Guettarda wrote:
...
Don't forget the accounts which exist to prevent impersonation in one's main
language - names with capital I's to mimic L's and Cyrillic characters.
Deleting these would just open things up for abuse again.
Is there are a reason why user names with weird Unicode characters 
are even allowed? It would seem sensible to limit user names on each 
Wikipedia to the alphabet that is used in that language.
     Chl 


A suggestion, based on practices used for IDN registration:
Restrict new usernames on the en: Wikipedia to characters from the Latin 
alphabet and selected punctuation only (and possibly digits as well).
Before allowing a username to be registered, generate a canonical 
comparison form by Unicode normalization, lowercasing, punctuation and 
space suppression and accent-stripping, followed by homograph 
canonicalizations such as mapping both digit zero and letter O to the 
latter, digit 1 and letter L to the latter, eth to lowercase d, etc.
A new username should then only be allowed to be registered if the 
comparison form of the proposed new username is different from the 
comparison form of every existing username (which are stored in an 
indexed table, alongside the full, uncanonicalized name that actually 
gets registered).
Doing this will eliminate the vast majority of all simple username 
spoofing hacks.
Existing usernames get grandfathered in, of course.
-- Neil

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

2003

2002

2001

[WikiEN-l] Eliminating homographs in usernames: was: Re: Pruning "dead" accounts (was Re: New York Times article)