On Apr 12, 2005 2:06 PM, zhengzhu <zhengzhu(a)gmail.com> wrote:
The Chinese wikipedia has similar problems, although
for different
reasons. For example, sometimes people's names shouldn't be converted
at all no matter what variant is in use, sometimes different variants
translates foreign words differently. So there is a user customizable
dictionary for each language variant that can be used to define such
special conversion rules. There is also a special markup that can be
used in the text to define specific conversion rules just for that
piece of text.
In the case of converting Latin to Cyrillics, I think the same thing
can be used. The conversion table can be augmented with words and
phrases that should not be converted to Cyrillics under any condition.
Those words that can both be English and Serbian (or Belarussian) can
be manually marked up in the text.
1. I can just guess what is written in Chinese interface, so how did
you cover article names? Do you have both names: in Simplified and
Traditional Chinese?
2. I think MediaWiki should have one general module for
transliteration with extensions for specific languages. General module
should be based on Chinese module. Is it possible to start to work in
such way?
3. Also, we should try to make system clever: Some formal and some
statistic methods can help in recognizing should we transliterate
something or not (i.e.: if system find some non-Serbian Cyrillic
letters, it should not transliterate it into Latin and vice versa).