On Apr 12, 2005 2:06 PM, zhengzhu zhengzhu@gmail.com wrote:
The Chinese wikipedia has similar problems, although for different reasons. For example, sometimes people's names shouldn't be converted at all no matter what variant is in use, sometimes different variants translates foreign words differently. So there is a user customizable dictionary for each language variant that can be used to define such special conversion rules. There is also a special markup that can be used in the text to define specific conversion rules just for that piece of text.
In the case of converting Latin to Cyrillics, I think the same thing can be used. The conversion table can be augmented with words and phrases that should not be converted to Cyrillics under any condition. Those words that can both be English and Serbian (or Belarussian) can be manually marked up in the text.
1. I can just guess what is written in Chinese interface, so how did you cover article names? Do you have both names: in Simplified and Traditional Chinese?
2. I think MediaWiki should have one general module for transliteration with extensions for specific languages. General module should be based on Chinese module. Is it possible to start to work in such way?
3. Also, we should try to make system clever: Some formal and some statistic methods can help in recognizing should we transliterate something or not (i.e.: if system find some non-Serbian Cyrillic letters, it should not transliterate it into Latin and vice versa).