zhengzhu wrote:
I have started to write some documentation about the
Chinese
conversion system at meta:
http://meta.wikimedia.org/wiki/Chinese_conversion
People interested in implementing conversion systems for other
languages should take a look at it. Note that most features
implemented are not Chinese specific.
Fantastic work!
Regarding the word segmentation problem: Googling for "Chinese word
segmentation" gives a large number of references to various interesting
research on this field. There seems to be a lively research community in
this field: perhaps some of them might want to earn kudos points by
collaborating on improving the performance of this module by tackling
some of the common segmentation issues.
Dynamic programming and Hidden Markov methods seem to be popular, and we
must, by now, have a quite large corpus of our own in the zh: wikipedia.
-- Neil