On Tue, 14 Sep 2004 20:31:11 -0700, Mark Williamson node.ue@gmail.com wrote: <snip>
I propose to store all text in Traditional but convert it to Simplified (perhaps with some sort of caching so articles do not have to be re-generated each time) because TC>SC conversion is less ambiguous than SC>TC conversion. If somebody adds text to an article but they are typing in SC, it will be converted to TC when it adds it to the database. In the edit window even though, text will appear as whichever domain you are at. Titles of articles should be converted too. If a mistake is made in conversion when a Simplified text is added to the database, eventually somebody browsing at http://zh-tw.wikipedia.org/ will notice this error and hopefully fix it. In the mean time this error won't cause any problems on zh-cn because it will convert back the same way.
This is more or less the concept I was mulling over as a very general solution, but I realised that it does have a big disadvantage: naive users 'correcting' the translation may simply shift the error into the opposite version. Or, more specifically, there is no way of distinguishing a translational correction from a factual one. For example:
Say you have a database in English, but with automated conversion to a dialect, we'll call it Blinglish. The English database contains the text "...while eating an apple...", and this is viewed by a Blinglish user. They replace the word 'apple' (in the Blinglish version) with 'orange'. The software now has no way of knowing whether the use is saying that 'orange' is the Blinglish word for 'apple', or whether the Blinglish user is correcting a fact, and the English version should be updated to say 'orange'.
Obviously, the translation corrections *should* be labelled using special markup, but the majority of users find special markup very hard to learn, and huge numbers of users pass through who have no idea how to use such things. In order to encourage them to return and contribute more, we need to not only make the system work *despite* them, but to actively fit them into it.
If, to continue my example, we translate 'orange' back to English, when it is in fact supposed to be an idiomatic translation, another user may come along on the English site and correct it back to 'apple'. The Blinglish version will then be in its original state, and the cycle will continue until a more experienced user spots the ambiguity and marks it up appropriately. A waste of everyone's time, and a definite turn-off for the casual users whose changes keep disappearing.
If we can rely on a majority of the users understanding more than one of the languages involved, we could more-or-less avoid this by providing some obvious mechanism for saying "this change is because of a translation issue", that even technophobes can use. But anyone that only understands one version will not know themselves whether it is a translation issue - only that it is, within the version they are looking at, a mistake...