On Apr 11, 2005 5:27 PM, Milos Rancic <millosh(a)gmail.com> wrote:
I am implementing the transliteration from Cyrillic to
Latin for
Serbian Wikipedia. I am not sure that Serbian or Belarussian
transliteration can pass "live transliteration" without problems.
Chinese situation is almost clean: One set of characters should be
changed into another set. If they use Latin (or Cyrillic) alphabet
(for referencing) or Arabic numbers, they would not change it during
transliteration.
Transliteration between Cyrillic and Latin alphabets is complicated
because of a number of problems (Serbian Latin and Cyrillic
ortographies have some differences, too): If Latin alphabet in
Belarussian has equal status (such as in Serbian, almost) with
Cyrillic, you can't forbid writing in Latin. And if you want to
transliterate from Latin to Cyrillic, you'll have a lot of English
words transliterated in Cyrillic. Also, what about referencing in
Cyrillic? If you add some Russian bibliography, you'll have Russian
text transliterated into Latin.
The Chinese wikipedia has similar problems, although for different
reasons. For example, sometimes people's names shouldn't be converted
at all no matter what variant is in use, sometimes different variants
translates foreign words differently. So there is a user customizable
dictionary for each language variant that can be used to define such
special conversion rules. There is also a special markup that can be
used in the text to define specific conversion rules just for that
piece of text.
In the case of converting Latin to Cyrillics, I think the same thing
can be used. The conversion table can be augmented with words and
phrases that should not be converted to Cyrillics under any condition.
Those words that can both be English and Serbian (or Belarussian) can
be manually marked up in the text.
--
zhengzhu