yuanml wrote:
To Mark Williamson:
By the way, since when am I trying to compare en/jp and tc/sc? I was merely responding to something somebody else said about SC and TC users "living in the same universe" or something.
I don't think I lose my point.
tc/sc users enjoy the same concept structure of the universe, but en/jp, en/tc or en/sc are not same. For example planet Venus in English is a term related to a goddess, but in both sc/tc planet Venus is related to the same things - gold and star. In one word, tc/sc is the same language. This is my point.
The tc/sc users not only enjoy the same grammar of language, but also most part of their knowledge systems. Let us not talk about Chinese native knowledge, such as Chinese history, Foreklore, but let us talk about mordern science. Terminologies of mordern science are introduced to China since Ming Dynasty hundreds of years ago, and increased vastly after 1900. The Chinese knowledge system evolve into their morder form just after the New Culture Movment around 1920. But the split of tc/sc is about at 1956, then the tc/sc enjoy the same backgroud of their knowledge systems.
From 1949 to 1980s tc/sc evolved independently for lack of communication,
then some new terminologies are different, such as in computer science. But after 1980s, the communication between tc/sc increased comparatively.
Disclaimer: I can't read Chinese, so I don't know whether this is similar to any of the current or proposed solutions, but I have read some of the literature on the subject. My apologies if I'm going over old territory.
The best analogy is (I think) the difference between en-us and en-gb: the differences are mostly "spelling" and idioms. Automatic conversion is entirely possible, but occasionally imperfect. However, it should be possible to paraphrase around these problems where they occur and produce a single text that can be displayed (and edited) in either language and converted to-and-fro.
Perhaps one way to do it would be as in this fictitious example: if I have a (say) simplified word that means "fish", but can be transformed to either (say) "FISH" or "STONE" in the traditional script. Suppose we auto-convert this '''into the Wiki source''' at edit time to markup like
[fish=FISH|STONE]
which would display as "fish" highlighted in some way when the page is rendered in simplified script to show there is a potential transliteration problem, and as [FISH|STONE] when rendered in traditional script.
Then it can be cleaned up in markup by writing:
[fish=FISH]
or similar markup, which will force the traditional rendering to the correct word, and remove the warning flag for simplified rendering, since there is now a one-to-one mapping. The same would apply for in reverse for ambiguous conversions in the opposite direction. With any luck, this could be entirely lexicon-driven, and would need no AI research, because we would be find all pages containing ambiguities automatically, and then harness the copyediting skills of Wikipedians to find and disambiguate all the problematic text. We could even harness this when idioms or short phrases differ, to go:
[idiom in simplified=IDIOM IN TRADITIONAL]
-- Neil