[Foundation-l] Automatic username transliteration for SUL

Stephanie Erin Daugherty stephanie at sosdg.org
Thu Dec 21 03:50:33 UTC 2006

> Even "ko-ho-ba-fay-jo-muh" is better than ????????
> Of course, we could always take the approach of putting them in IPA, 
> thus annoying everyone equally.
> -- Neil
I think the most important part of the objection to "foreign" character 
sets is that many people on en in particular are unaware of what 
facilities exist for dealing with them. As people in countries using 
Latin character sets rarely see, and almost never have to work with 
anything that's in another character set, they are usually unaware of 
what tools exists for interoperating with them. This is especially true 
when abusive users have deliberately taken advantage of this fact in 
order to make the lives of administrators and of other Wikipedians as 
difficult as possible, and it's also true for those handful of users 
that will mix character sets to "look cool" at the inconvenience of others.

The fact that usernames in foreign character sets pose special technical 
challenges for users unfamiliar with them, and that mainstream 
multilingual support is especially lacking in applications with a Latin 
language family ethnocentricity (in particular large number of Windows 
applications and Windows itself, at least for en-US locale), means that 
functions for working with usernames need to be looked at carefully.

One of the more obvious ways that this can be made to work is to make 
numeric userids more visible and more useful for various operations 
where a username may be near impossible to type, and may even be 
difficult to see. I'd strongly recommend this for usernames using 
characters outside the ranges that a typical user on a given project 
will be able to enter with a "normal" input method. Displaying something 
like  User:???????? <#2352562> would help considerably for those 
situations, but it needs to work consistently for things like accessing 
talk pages, accessing contributions, accessing userpages, and accessing 

"Nicknames" or aliases, as proposed by someone else in this thread would 
also help - they could be constrained to the characters typically usable 
on a given wiki. I'd also suggest that when showing a "non-native" 
username, that we indicate clearly what character set or even what 
language it's in, this will become even more useful once SUL is 
implemented (hopefully it's going to be part of SUL anyway, dealing with 
namespace collisions otherwise will be insane.)

Transliterations would be useful in some situations, but I'd suggest we 
make this a display option - to be respectful of other cultures means 
that we should respect their writing and their culture wherever 
possible. IPA could be handled in the same way, and this would probably 
be appreciated by at least a few people who would otherwise be confused 
on how to pronounce a given name.

Would a format for  usernames with foreign characters like 
"User:???????? (Arabic) <#2352562>" really be so bad? (This would apply 
equally to projects where latin characters might not be displayable).


More information about the foundation-l mailing list