On Mon, Dec 8, 2008 at 12:55 PM, Brion Vibber brion@wikimedia.org wrote:
Will social gender (often, but not always aligned with biological sex) always track with grammatical gender?
I believe there are some languages with more complicated gender systems than just male/female. On the other hand, getting just the male/female/unknown distinction right would be a huge improvement for a lot of languages, so it's probably worth it to do just that.
If there are significant other uses of gender in some languages, we might consider offering different preferences for them somehow. So for instance, if the hypothetical language Zulitutsi used different verbs for people who were above or below the age of 30, we could add an extra option when either the interface language *or* the user language is Zulitutsi, "Are you above the age of 30?" This could then be handled in the same framework as gender generally (see below for thoughts on that). Of course, a Zulitutsi speaker on enwiki would then find that almost all other users would fall back to the "unknown" case, but that's acceptable.
I would wait for actual use-cases to come up to bother with this, though. We should just make sure that nothing in the basic framework really strictly assumes a male/female/unknown trichotomy.
What are the cascading implications of user gender on localization (adjective agreement in Romance languages, verb agreement in Semitic languages) and what technical requirements would be imposed?
You'd likely need to write messages like "Automatically block the last IP address used by this user, and any subsequent IPs {{#gender:$1|he tries|she tries|they try}} to edit from", where $1 is the *username* of the affected user. This could be handled the way we handle plurals right now, no huge deal in that sense: things like verb/adjective agreement would come free, since they'd just be manually specified in the messages. If there are more genders somehow, those could be dealt with the same way as with plurals (specify that in the Language class somehow).
The difficulty here is of course that the genders of all users would have to be stored in the database, and every call to #gender would require a query. In particular, for many languages this would require a query for every link to a user page (of a user who exists). We would have to batch these somehow; that might be the only tricky part of writing this. Most likely we'd want to store this as a new field, not in user_options, because we'd be fetching it pretty often when we wouldn't want the rest of the user's options.
Note that I made the first argument to {{#gender:}} a username, not an actual gender. The latter would be much, much more invasive, because it would require every single message that could possibly use the gender of a particular user to pass that gender in as a parameter, which would be a huge mess. It's better from a functionality/code cleanliness point of view to accept the username as an argument and treat initialization of the gender as an optimization.
On the other hand, this might have serious performance implications if users begin spamming genders everywhere (as is likely). If some way could be contrived that we wouldn't mind retrieving the gender of hundreds of (random and unpredictable) users per request, that would be great. In principle it's little enough info that we could easily cache it locally on all the app servers, but that seems like a lot of effort. We might want to initially be optimistic and just fetch from the database, precaching with batch queries where appropriate, and see if it becomes a problem. We could add it as an option first, off by default, and try enabling it gradually on large wikis to see if it becomes a problem. Fetching unbatched genders from memcached would likely be a good idea (but large batches should likely still go to the DB so they can be fetched all at once).
You'd need to special-case the User namespace, undoubtedly, because I doubt we want to support parser functions in namespace names, but pretty much everything else should be possible to handle by "just" the addition of that one parser function and the appropriate database field.