On Fri, Jun 06, 2003 at 01:07:40PM -0700, Brion Vibber wrote:
Currently, before creating a new user account we do some limited validation on the given name:
Trim beginning and trailing whitespace
Check if it looks like an IP address (four sequences of 1-3 digits with
dots between them), if so reject it.
- Check if there's a slash character, if so reject it [I just added this
check; it was I think supposed to be added when we set up partial subpage support for userspace, which conflicts with the slash character in names if remaining problems with the contribs/email sidebar links are fixed. Unless there's some huge objection... there don't appear to be any valid usernames on this pattern. Note also that this check applies only to new names; existing ones which are legitimate would be grandfathered in.]
*Canonicalize the name (run through the title canonicalizer and take the version without underscores) and check for an exact existing match. If there is one, reject it.
We may wish to do a case-_in_sensitive check, and/or a same-except-for-accents check. Or not. Anyway, I think it could use some tidying up.
For ISO 8859 wikis it sounds right, but we may need stronger checks for Unicode wikis to protect against people pretending to be someone else. There are many ways how to make 2 binary different Unicode strings look the same. Converting to one of normal forms and then checking using allowed range of characters should do the thing. We may use different allowed sets per-wiki, so that Asians can use Han characters, Russians can use Cyrillic etc.
But it's not very important right now.