On 9/10/09 10:06 AM, Aryeh Gregor wrote:
On Wed, Sep 9, 2009 at 6:50 PM, Tim
Starling<tstarling(a)wikimedia.org> wrote:
I don't know why you're writing this
nonsense, you obviously haven't
looked at the code at all.
This paragraph is unnecessary.
Seriously! Please read things aloud before clicking send. You will
hopefully then be able to better detect when it's time to take a break,
eat some fruit and take it down a notch.
The language
variant system that we have could easily convert between
US and UK English. In fact it already does convert between a language
pair with a far more complex relationship, that is Simplified and
Traditional Chinese.
The language conversion system is very simple, it's just a table of
translated pairs, where the longest match takes precedence. The
translation table in one direction (e.g. UK -> US) can be different to
the table in the other direction (US -> UK). You would not list "ize
-> ise", you would list every word in the dictionary with an -ize
ending that can be translated to -ise without controversy. The current
software could handle 50k pairs or so without serious performance
problems, and it could be extended and optimised to allow millions of
pairs if there was a need for that.
It's possible to handle any pair of languages which are separated only
by vocabulary, and transliteration or spelling. It's only differences
in grammar, such as word order, that would give it trouble.
Is there any reason nobody's tried adding such support for us/uk
English? It would resolve some long-standing tension on enwiki.
Would anons have to be given one variant or the other, or would they
get untransformed text or what? Does the variant transformation apply
to the edit page as well?
_______________________________________________
Wikitech-l mailing list
Wikitech-l(a)lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l
The variant system seems poorly understood by most people (including me)
which often tends to cause something (like it for instance) to also be
under-utilized...
Perhaps we need more information on what it intends to provide the user.
All I find in Google on this topic are blurbs about configuration
variables and lots of people confused as to what language variants even
are...
Is there some awesome documentation somewhere I have yet to find?
- Trevor