[Wikipedia-l] Re: One Chinese Wikipedia

16 Sep 2004


      That's the thing with the ambiguity. That's the reason I'm suggesting
articles be stored in Traditional and converted into Simplified
on-the-run.
Traditional-to-Simplified conversions will have a much much much
smaller number with ambiguous conversions than
Simplified-to-Traditional. If a Simplified user writes additional
content in Simplified, but parts of it are converted incorrectly
before being added to the database, then a special process will take
effect:
Since the wrong character and the right character in Traditional are
the same character in Simplified, it won't have any effect on
Simplified users who will continue browsing it as it is. However, the
error *will* show up to Traditional users, who will then correct it.
This correction will have no effect on the apperance of the text to a
Simplified user, but it will make it so it uses the correct character
for a Traditional user.
This eliminates the need for special semantic markup.
--Jin Junshu/Mark
On Wed, 15 Sep 2004 13:51:54 +0100, Rowan Collins
rowan.collins@gmail.com wrote:
...
On Tue, 14 Sep 2004 20:31:11 -0700, Mark Williamson node.ue@gmail.com wrote:
<snip>
> I propose to store all text in Traditional but convert it to
> Simplified (perhaps with some sort of caching so articles do not have
> to be re-generated each time) because TC>SC conversion is less
> ambiguous than SC>TC conversion. If somebody adds text to an article
> but they are typing in SC, it will be converted to TC when it adds it
> to the database. In the edit window even though, text will appear as
> whichever domain you are at. Titles of articles should be converted
> too. If a mistake is made in conversion when a Simplified text is
> added to the database, eventually somebody browsing at
> http://zh-tw.wikipedia.org/ will notice this error and hopefully fix
> it. In the mean time this error won't cause any problems on zh-cn
> because it will convert back the same way.
This is more or less the concept I was mulling over as a very general
solution, but I realised that it does have a big disadvantage: naive
users 'correcting' the translation may simply shift the error into the
opposite version. Or, more specifically, there is no way of
distinguishing a translational correction from a factual one. For
example:
Say you have a database in English, but with automated conversion to a
dialect, we'll call it Blinglish. The English database contains the
text "...while eating an apple...", and this is viewed by a Blinglish
user. They replace the word 'apple' (in the Blinglish version) with
'orange'. The software now has no way of knowing whether the use is
saying that 'orange' is the Blinglish word for 'apple', or whether the
Blinglish user is correcting a fact, and the English version should be
updated to say 'orange'.
Obviously, the translation corrections *should* be labelled using
special markup, but the majority of users find special markup very
hard to learn, and huge numbers of users pass through who have no idea
how to use such things. In order to encourage them to return and
contribute more, we need to not only make the system work *despite*
them, but to actively fit them into it.
If, to continue my example, we translate 'orange' back to English,
when it is in fact supposed to be an idiomatic translation, another
user may come along on the English site and correct it back to
'apple'. The Blinglish version will then be in its original state, and
the cycle will continue until a more experienced user spots the
ambiguity and marks it up appropriately. A waste of everyone's time,
and a definite turn-off for the casual users whose changes keep
disappearing.
If we can rely on a majority of the users understanding more than one
of the languages involved, we could more-or-less avoid this by
providing some obvious mechanism for saying "this change is because of
a translation issue", that even technophobes can use. But anyone that
only understands one version will not know themselves whether it is a
translation issue - only that it is, within the version they are
looking at, a mistake...
--
Rowan Collins BSc
[IMSoP]

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

2003

2002

2001

[Wikipedia-l] Re: One Chinese Wikipedia