On Apr 11, 2005 5:27 PM, Milos Rancic millosh@gmail.com wrote:
I am implementing the transliteration from Cyrillic to Latin for Serbian Wikipedia. I am not sure that Serbian or Belarussian transliteration can pass "live transliteration" without problems.
Chinese situation is almost clean: One set of characters should be changed into another set. If they use Latin (or Cyrillic) alphabet (for referencing) or Arabic numbers, they would not change it during transliteration.
Transliteration between Cyrillic and Latin alphabets is complicated because of a number of problems (Serbian Latin and Cyrillic ortographies have some differences, too): If Latin alphabet in Belarussian has equal status (such as in Serbian, almost) with Cyrillic, you can't forbid writing in Latin. And if you want to transliterate from Latin to Cyrillic, you'll have a lot of English words transliterated in Cyrillic. Also, what about referencing in Cyrillic? If you add some Russian bibliography, you'll have Russian text transliterated into Latin.
The Chinese wikipedia has similar problems, although for different reasons. For example, sometimes people's names shouldn't be converted at all no matter what variant is in use, sometimes different variants translates foreign words differently. So there is a user customizable dictionary for each language variant that can be used to define such special conversion rules. There is also a special markup that can be used in the text to define specific conversion rules just for that piece of text.
In the case of converting Latin to Cyrillics, I think the same thing can be used. The conversion table can be augmented with words and phrases that should not be converted to Cyrillics under any condition. Those words that can both be English and Serbian (or Belarussian) can be manually marked up in the text.