Re: [Wikitech-l] Language variants

9 Sep 2009

Roan Kattouw wrote:
...
  That's the alphabet variant thing I mentioned
earlier. If the majority
 of the differences between pt and pt-br can be summed up with simple
 rules that a computer can handle, we might be able to work something
 out. However, that's usually not the case; I don't know Portugese, but
 I do know that handling even simple differences between en-us and
 en-gb is too complex already: a system that would successfully convert
 'realise' to 'realize' may also try to wrongfully convert
'disguise'. 
I don't know why you're writing this nonsense, you obviously haven't
looked at the code at all.

The language variant system that we have could easily convert between
US and UK English. In fact it already does convert between a language
pair with a far more complex relationship, that is Simplified and
Traditional Chinese.

The language conversion system is very simple, it's just a table of
translated pairs, where the longest match takes precedence. The
translation table in one direction (e.g. UK -> US) can be different to
the table in the other direction (US -> UK). You would not list "ize
-> ise", you would list every word in the dictionary with an -ize
ending that can be translated to -ise without controversy. The current
software could handle 50k pairs or so without serious performance
problems, and it could be extended and optimised to allow millions of
pairs if there was a need for that.

It's possible to handle any pair of languages which are separated only
by vocabulary, and transliteration or spelling. It's only differences
in grammar, such as word order, that would give it trouble.

-- Tim Starling

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

2003

2002

Re: [Wikitech-l] Language variants