Re: [Wikitech-l] Language variants

11 Sep 2009

Aryeh Gregor wrote:
...
  On Thu, Sep 10, 2009 at 6:44 PM, Roan Kattouw
&lt;roan.kattouw(a)gmail.com&gt; wrote:
  Seems I'm not the only one who had a
completely wrong idea about how
 variants work. We definitely need more documentation and fame for this
 system, so its potential doesn't go to waste.  
 I theoretically knew that it was just a string-replace system, but it
 didn't occur to me that it would be useful for more than
 transliteration.  It makes sense now that Tim pointed that out.  How
 would it handle word breaks, though?  It would just ignore them, so
 color -> colour also changes uncolored -> uncoloured?  
Neither of the implementations so far has required any knowledge of
word breaks, and so it has not been implemented. In theory you could
just list every larger word that contains a smaller transformed word, e.g.

humor -> humour
humorous -> humorous

But it might be better to just add a word segmentation feature.

...
  What about
 things like HTML id's or even attribute/property names (<span
 style="color:red">)?  I'm sure I could dig through the code to find
 the answers to these, but actually I'm not even sure offhand where the
 code *is*. 
languages/LanguageConverter.php. There are some rather inelegant
regexes to deal with cases like these, they seem to work. The
converter operates at a near-HTML stage of the parser, so it's not too
hard to skip attributes.

Note that the FastStringSearch extension is important for acheiving
good performance, especially in Chinese.

-- Tim Starling

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

2003

2002

Re: [Wikitech-l] Language variants