On Sep 10, 2004, at 5:15 PM, Mark Williamson wrote:
But the difference between the two isn't merely a "difference of character sets". Rather than converting on the level of the individual character which will inevitably produce poor results, it is nessecary to convert documents on the level of lexemes, for which one needs some sort of artificial intelligence capable of separating Chinese texts into individual lexemes before conversion.
Having done document conversion, the number of cases is managable here, and belong in the realm of "things that can be searched for and tagged. A cumbersome, but not hard, problem.
It is also nessecary to convert names of countries, special terminology (including Wikipedia terminology:
True, but again an enumerateable change, solveable in software, or by the same process that we have now for proofing documents: namely people go through and make intelligent changes. The process would be to flip the toggle, scan the document for problems and then edit the underlying wiki-material, inserting the metacodes needed. Much as people now scan documents to find broken, redirected or ambiguated links, spelling errors and so on.
the first two characters in the Simplified Chinese name for "wikipedia" would be translated alone into English as the name "Vicky", which would be converted into Traditional in a specific way, but the current way to write "wikipedia" in Traditional Chinese is not like that), etc; also Simplified Chinese is more tolerant of the usage of English words in the Roman alphabet than is Traditional (except perhaps in Hong Kong where anglicisms are often even more frequent) as is exemplified by various article texts.
That's a dialectical, not linguistic, issue.
Some people here are saying that "if I read this text in simplified aloud, a Taiwanese person can understand it". That is not the issue at hand. If zh: were in Pinyin, perhaps, that would be the issue, or if it was a spoken encyclopedia, maybe. But this is a written encyclopedia. zh-cn: and zh-tw: may be largely the same spoken language, but they are hardly the same written language.
--Jin Junshu/Mark
The general consensus of linguists is that you are overstating the differences - that traditional and simplified represent the same "written" language because the grammar is the same, most of the syntax is the same. The visual difference is rather like the difference between using the Latinate Greek characters, the one most people associate with greek, and the older characters used in the classical age. A person who can read one can't read the other, but translation between the two is mainly a mechanical process that needs intervention occassionally. While the traditional/simplified problem is a couple of orders of magnitude more complicated, it isn't more complex in lexical theory.
Which is not to minimize the differences - if the community consensus is just "squash this!" then that is a mistake as larger as simply brute force creating two versions. There are technical and methodological hurdles that should be addressed, otherwise someone will reach the same conclusion that Jin Junshu has - namely that a traditional Wiki is needed, because there is a user community not well served by the simplified version. Part of this is based on political forces that are in operation out there: there is no desire among the Chinese reading and writing community to break chinese into separate written languages - that is to continue increasing differences until mutual intelligibility is a difficult hurdle to pass. At the same time, there is a desire among traditional users to continue to use traditional characters, and there is a larger corpus of texts, many of them fundamental texts, which exist as originals in traditional characters, and which argue for wiki handling traditional characters in a appropriate way.