On Mon, Dec 8, 2008 at 12:55 PM, Brion Vibber <brion(a)wikimedia.org> wrote:
Will social gender (often, but not always aligned with
biological sex)
always track with grammatical gender?
I believe there are some languages with more complicated gender
systems than just male/female. On the other hand, getting just the
male/female/unknown distinction right would be a huge improvement for
a lot of languages, so it's probably worth it to do just that.
If there are significant other uses of gender in some languages, we
might consider offering different preferences for them somehow. So
for instance, if the hypothetical language Zulitutsi used different
verbs for people who were above or below the age of 30, we could add
an extra option when either the interface language *or* the user
language is Zulitutsi, "Are you above the age of 30?" This could then
be handled in the same framework as gender generally (see below for
thoughts on that). Of course, a Zulitutsi speaker on enwiki would
then find that almost all other users would fall back to the "unknown"
case, but that's acceptable.
I would wait for actual use-cases to come up to bother with this,
though. We should just make sure that nothing in the basic framework
really strictly assumes a male/female/unknown trichotomy.
What are the cascading implications of user gender on
localization
(adjective agreement in Romance languages, verb agreement in Semitic
languages) and what technical requirements would be imposed?
You'd likely need to write messages like "Automatically block the last
IP address used by this user, and any subsequent IPs {{#gender:$1|he
tries|she tries|they try}} to edit from", where $1 is the *username*
of the affected user. This could be handled the way we handle plurals
right now, no huge deal in that sense: things like verb/adjective
agreement would come free, since they'd just be manually specified in
the messages. If there are more genders somehow, those could be dealt
with the same way as with plurals (specify that in the Language class
somehow).
The difficulty here is of course that the genders of all users would
have to be stored in the database, and every call to #gender would
require a query. In particular, for many languages this would require
a query for every link to a user page (of a user who exists). We
would have to batch these somehow; that might be the only tricky part
of writing this. Most likely we'd want to store this as a new field,
not in user_options, because we'd be fetching it pretty often when we
wouldn't want the rest of the user's options.
Note that I made the first argument to {{#gender:}} a username, not an
actual gender. The latter would be much, much more invasive, because
it would require every single message that could possibly use the
gender of a particular user to pass that gender in as a parameter,
which would be a huge mess. It's better from a functionality/code
cleanliness point of view to accept the username as an argument and
treat initialization of the gender as an optimization.
On the other hand, this might have serious performance implications if
users begin spamming genders everywhere (as is likely). If some way
could be contrived that we wouldn't mind retrieving the gender of
hundreds of (random and unpredictable) users per request, that would
be great. In principle it's little enough info that we could easily
cache it locally on all the app servers, but that seems like a lot of
effort. We might want to initially be optimistic and just fetch from
the database, precaching with batch queries where appropriate, and see
if it becomes a problem. We could add it as an option first, off by
default, and try enabling it gradually on large wikis to see if it
becomes a problem. Fetching unbatched genders from memcached would
likely be a good idea (but large batches should likely still go to the
DB so they can be fetched all at once).
You'd need to special-case the User namespace, undoubtedly, because I
doubt we want to support parser functions in namespace names, but
pretty much everything else should be possible to handle by "just" the
addition of that one parser function and the appropriate database
field.