Re: [Wikitech-l] Different alphabets for the same language

12 Apr 2005

      ...
In most cases titles are converted using the same conversion system,
i.e. using the conversion table and do a strtr(). For wierd
situations, there is also support for manually specifying title
conversion inside the article body, using this syntax:
 -{T|zh-cn:foo; zh-tw: bar}-
I am thinking about more general solution: To make database table with
exceptions. And, more general, to make some kind of interaction with
Wiktionary.
...
Here is an example: Let's say that in the conversion table, "foo" in
zh-cn is converted to "bar" in zh-tw and vice-versa. Now someone
writing in zh-cn wrote an article titled "foo". When someone with
zh-tw preferred sees the article, "bar" will be shown as the article
title. Further, say someone using zh-tw edited some article which has
a link [[bar]]. The system will identify that the article "foo" should
be used for linking, if "bar" is not already created as a redirect.
What do you keep in database? Simplified, traditional or both?
...
btw, you should be able to change the interface at zh after you
register an account;)
I remember that I was looking few minutes at left up corner of MS
Excel when I tried to find position of "File" in Hebrew MS Office :)
(it is at right up corner). The situation with Chinese interface is
similar :)
I saw a couple of days ago that if I click on Traditional Chinese,
I'll get "ugly" link with parameter "variant=zh-tw". Is it possible
that Simplified Chinese has URL in Simplified and Traditional Chinese
in Traditional Chinese? Or mod_rewrite redirection:
http://zh.wikipedia.org/wiki/<something in Traditional Chinese> is
shown, but http://zh.wikipedia.org/wiki/<something in Simplified
Chinese>...?variant=zh-tw is read?
...
Indeed. I have always anticipated that the Chinese system can be
generalized to other languages. Most of the code for the Chinese
system is not specifically tied to the Chinese language, and some code
refactoring can be done to provide better support for different
languages. Please watch CVS HEAD for the next couple weeks for this to
happen.
I knew that Chinese have two alphabets, but I didn't have in my mind
that problems are similar to Serbian :) Of course, I found it a couple
of months ago...
...
...

Also, we should try to make system clever: Some formal and some

statistic methods can help in recognizing should we transliterate
something or not (i.e.: if system find some non-Serbian Cyrillic
letters, it should not transliterate it into Latin and vice versa).
That's certainly doable within the current system framework, but will
require more specialized algorithms.
Inside of my extension to pywikipedia bot
(http://millosh.org/software/ltafos/pos/), I have statistic guesser:
algorithm gueses distance between two texts (something like so called
edit distances, but stochastic, so it can compare texts in real time,
not only words and phrases). I am using it to guess if page is in
Serbian or not. However, it can be used (in future forms) for other
kinds of stochastic guessing.

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

2003

2002

Re: [Wikitech-l] Different alphabets for the same language