zhengzhu wrote:
I have started to write some documentation about the Chinese conversion system at meta:
http://meta.wikimedia.org/wiki/Chinese_conversion
People interested in implementing conversion systems for other languages should take a look at it. Note that most features implemented are not Chinese specific.
Fantastic work!
Regarding the word segmentation problem: Googling for "Chinese word segmentation" gives a large number of references to various interesting research on this field. There seems to be a lively research community in this field: perhaps some of them might want to earn kudos points by collaborating on improving the performance of this module by tackling some of the common segmentation issues.
Dynamic programming and Hidden Markov methods seem to be popular, and we must, by now, have a quite large corpus of our own in the zh: wikipedia.
-- Neil