I think we also need to make a distinction between two different "two versions" depends on what is split.
There can be two versions of the UI, or two versions of the article content, or two versions of both, which is what the original split was.
Two versions of the UI/Skin is what someone is working on the software for (sorry I deleted the email already so I don't remember who it was). This is for the ease of the readers more for the editors. Editors are familiar with the layout, and can generally make out what the page is, even on completely unknown languages. It is also the most important part to a completely new person to wiki, to showcase what wiki is about.
I haven't heard anyone who actually objects to this.
There can also be two versions of the content like the current Chinese wiki. You really won't know which charset something is in until you read it.
For people who are exclusively SC or TC users, the mixed version can be mildly annoying to completely revolting. Each person has different tolerance level. I can read TC, but can't edit in TC, since I don't know how to get my computer to take Pinyin and output TC. A person who is well versed in both to the degree to edit articles in both is extremely rare.
This is a constant debate on Chinese wiki, and the most recent decision is to reduce duplicate articles in Chinese wiki where the tranditional and simplified version differs in charset only. Pages that does not already have two versions does not get one created. Instead, both charset goes on one page. For Chinese wiki, this is a possible though imperfect solution since the two versions came from the same language source.
This is for the convience of the editors to not having to type in the same thing twice, also for the readers to get the most complete information currently available on wiki. It is voted in by the editors, as only the editors vote, like any other voting pages on wiki.
Hence the ch-tw split which is a split for both, UI and articles. What is being opposed by a lot of people is the complete split for articles. It's a headache for editors but also a dis-service to a lot of readers. Most Chinese readers would rather get a non-preferred version of TC or SC article, rather than not getting anything at all. Interwiki links between the versions only work if there is a corresponding article. Redirects does not bring you to the other wiki, but can be used to redirect between TC/SC on the same wiki.
-Vina
-----Original Message-----
From: Stirling Newberry <stirling.newberry(a)xigenics.net>
To: wikipedia-l(a)Wikimedia.org
Date: Sat, 11 Sep 2004 12:35:11 -0400
Subject: Re: [Wikipedia-l] Re: One Chinese Wikipedia
On Sep 11, 2004, at 12:25 PM, yuanml wrote:
> let us sum up basic fact, pro and con we had discussed in a NPOV
> manner.
>
> ==basic fact==
> *The same language: Chinese.
> *Two writing systems: TC & SC.
> *Chinese Wikipedia now mix TC and SC together.
> *Chinese Wikipedia community now have a consensus on unity.
> *SC artical vs TC artical at Chinese Wikipedia is about 10:1.
> *a unified solution is workable, although it is also acknowledged that
> there are significant challenges.
> *Someone tried to develop a solution for the differences in vocabulary
> among Taiwan, Hong Kong, Mainland, Singapore and Macau,
> the basic idea of the solution works.
> *there are huge and permanent implications of splitting up the two
>
> ==pros of a single version==
> *it makes the most sense from a language point of view
> *it makes the most sense from the point of view of having a top
> quality NPOV resource.
> *zh would work towards a traditional-simplified solution.
> *The advantage of a technical fix to the character problem
> to have one version is that the problems don't grow with time
> *The problem set of making two characters sets work isn't
> a fast moving target.
>
Someone mentioned another pro:
Technical work is of interest to other wikiprojects
> ==cons of a single version==
> *The SC UI is uncomfortable and scared away some of the TC users.
> *It is difficult for the TC users to look up some material such as
> names of places and people from traditional.
Also has been mentioned
Commits resources to a significant and non-trivial technical project.
Has political implications.
> ==pros of two seperate version==
> *attract more TC users.
> *there is a larger corpus of texts, many of them
> fundamental texts, which exist as originals
> in traditional characters.
>
> ==cons of two seperate version==
> *low quality on NPOV.
> *a technical fix to the character problem to
> diverged versions grow with time.
> *keeping up with two sets of wikipedians is
> a fast moving target.
>
>
> _______________________________________________
> Wikipedia-l mailing list
> Wikipedia-l(a)Wikimedia.org
> http://mail.wikipedia.org/mailman/listinfo/wikipedia-l
>
>
_______________________________________________
Wikipedia-l mailing list
Wikipedia-l(a)Wikimedia.org
http://mail.wikipedia.org/mailman/listinfo/wikipedia-l
>So, you have two choices here: we can run Wikipedia like it is the PRC
>and hold a sham vote where one group of people gets to decide the fate
>of another group of people, or we can run it *fairly* where we have
>private e-mail discussions between Traditional users and the relevant
>Wikipedia people, ie Tim Starling, Jimbo, etc etc.
Deciding this fairly over email is going to be quite impossible, as most people who should be discussing this does not read/write English, and a lot of the relevant fine points are lost in translation.
Also, I'm assuming that Tim and Jimbo is joining the discussion mainly from the software capability perspective, why would fair come into consideration at all? Whichever way the Chinese wiki communitiy decide to go, they'll be looking at the amount of work to create/maintain software to support it (sorry to put words in your mouths :)
The other perspective that you have look at, is the amount of work that the sysops of the wikis will be looking at the amount of hours to maintain the wiki's quality. 100 hours is the same regardless language or nationality.
Sysops were the ones who had to keep up with the software development, and deal with the day to day issues of having a large public website where anybody can come edit pages. This has been alluded to in various emails about having a separate wiki, but let me spell it out here.
As a sysop for a new wiki, not only do you need to worry about how many people will contribute content, you also need to think about whether you have the time, and can find other people with the time and inclination to maintain the site, deal with vandals, correct typos, format, and guide the new users through the wiki system. This is a lot regular, time-consuming and tedious work.
We've just started translating the dozens of pages related to categories. It's already taken me personally over 20 hours to read through the English wiki, talk about it a bit on the Chinese wiki to see if there are interests, then start translating contents that fits, so Chinese wiki participant who don't read English can see what is being proposed and decide on what to do. There are several others actively participating in it as well, and I have no idea how much time they sunk into this piece alone.
-Vina
>So, you have two choices here: we can run Wikipedia like it is the PRC
>and hold a sham vote where one group of people gets to decide the fate
>of another group of people, or we can run it *fairly* where we have
>private e-mail discussions between Traditional users and the relevant
>Wikipedia people, ie Tim Starling, Jimbo, etc etc.
Fairly over email is going to be quite impossible, as most people who should be discussing this does not read/write English, and a lot of the relevant fine points are lost in translation.
Also, I'm assuming that Tim and Jimbo is joining the discussion mainly from the software capability perspective, why would fair come into consideration at all? Whichever way the Chinese wiki communitiy decide to go, they'll be looking at the amount of work to create/maintain software to support it (sorry to put words in your mouths :)
The other perspective that you have look at, is the amount of work that the sysops of the wikis will be looking at the amount of hours to maintain the wiki's quality. 100 hours is the same regardless language or nationality.
Sysops were the ones who had to keep up with the software development, keep versions in sync, run the interlanguage bot, etc. etc. Before starting a new wiki, not only do you need to worry about how many people will contribute content (less than 100 currently active on Chinese), you also need to think about whether you have the time, and can find other people with the time and inclination to maintain the site, deal with vandals, correct typos, format, and guide the new users through the wiki system. This is a lot regular, time-consuming and tedious work.
We've just started translating the dozens of pages related to categories. It's already taken me personally over 20 hours to read through the English wiki, talk about it a bit on the Chinese wiki to see if there are interests, then start translating contents that fits, so Chinese wiki participant who don't read English can see what is being proposed and decide on what to do. There are several others actively participating in it as well, and I have no idea how much time they sunk into this piece alone.
-Vina
let us sum up basic fact, pro and con we had discussed in a NPOV manner.
==basic fact==
*The same language: Chinese.
*Two writing systems: TC & SC.
*Chinese Wikipedia now mix TC and SC together.
*Chinese Wikipedia community now have a consensus on unity.
*SC artical vs TC artical at Chinese Wikipedia is about 10:1.
*a unified solution is workable, although it is also acknowledged that
there are significant challenges.
*Someone tried to develop a solution for the differences in vocabulary
among Taiwan, Hong Kong, Mainland, Singapore and Macau,
the basic idea of the solution works.
*there are huge and permanent implications of splitting up the two
==pros of a single version==
*it makes the most sense from a language point of view
*it makes the most sense from the point of view of having a top
quality NPOV resource.
*zh would work towards a traditional-simplified solution.
*The advantage of a technical fix to the character problem
to have one version is that the problems don't grow with time
*The problem set of making two characters sets work isn't
a fast moving target.
==cons of a single version==
*The SC UI is uncomfortable and scared away some of the TC users.
*It is difficult for the TC users to look up some material such as
names of places and people from traditional.
==pros of two seperate version==
*attract more TC users.
*there is a larger corpus of texts, many of them
fundamental texts, which exist as originals
in traditional characters.
==cons of two seperate version==
*low quality on NPOV.
*a technical fix to the character problem to
diverged versions grow with time.
*keeping up with two sets of wikipedians is
a fast moving target.
Sort of a tangential issue: Wikipedians should keep in mind here that
whatever they do is a positive political and social/cultural act; there
is no correct answer to this question.
Whether Chinese should be one language or diverge into multiple
languages is somewhat similar to the question 180 years ago over whether
the various Greeks spoken throughout the former Ottoman Empire should
remain distinct, or be merged into a "common Greek", and perhaps also
similar to the eternal disputes in France over the differences between
the various regional dialects. If we, for example, say we support
having a separate Wikipedia for a particular French dialect (which we
do), but we oppose having one for a particular Chinese dialect, then
we're making a rather odd political statement. Not that we can't do
that, but it should be done on purpose.
-Mark
>English people and Japanese people also have the same universe, sun,
>planet, species, maths, logic, and universal history, yet we have
>separate Wikipedias for English and Japanese...
>By your logic, there shouldn't even be a zh: and we should only have
>one Wikipedia (which would probably be en: although I would much
>prefer is: or lb: or something of that sort)
You just missed my point. That's my fault not to express my point clearly.
I mean we have the same vocabulary in most cases,
we call things using the same name and same concepts in most cases,
this is important, this means we enjoy the same language.
The writing system is different, but most of them can be mapped each other.
>So what? One could make the same arguments for not having separate
>Wikipedias for different languages.
You can't synchronize en: and jp: easily,
because they are different language.
But we can someday find a way to synchronize zh-cn: and zh-tw: easily.
That is the different.
>But the difference between the two isn't merely a "difference of
>character sets". Rather than converting on the level of the individual
>character which will inevitably produce poor results, it is nessecary
>to convert documents on the level of lexemes, for which one needs some
>sort of artificial intelligence capable of separating Chinese texts
>into individual lexemes before conversion. It is also nessecary to
>convert names of countries, special terminology
Did you not try the link I mentioned above?
please visit http://fengzz.net/wiki/ to try.
We are using some markup to solve this lexemes problem,
and it works.
I agree with you that it is inconvenient for traditional Chinese users now.
and I think the request on creating zh-tw version is proper.
But we’d better have a good evaluation of different solutions before we decide.
In my opinion, to keep a single version is benefit for future,
and this is also the consensus of the Chinese Wikipedia community now.
Maybe we should discuss this issue more.
Henry H. Tan-Tenn wrote:
> No technical solution will, of course, address the differences in
> vocabulary among Taiwan, PRC-Hong Kong, PRC-Mainland (and God forbid,
> Singapore and PRC-Macau). The differences are sometimes considerable in
> certain technical and pop cultural fields, though they should not be
> exaggerated.
In fact, some member of the Chinese Wikipedia community, is just working hard to solve
this problem. We discussed this problem since the beginning of the community.
I don't think it is a difficult problem, We can introduce new markup to solve it easily.
Some programmer had setup a Wiki to test this idea, and it was worked,
please visit http://fengzz.net/wiki/ to try.
I think it is really important not to split the Chinese Wikipedia community,
we have the same language, and we needn't to write the same thing twice.
Automatic conversition between SC and TC is very important for Chinese Wikipedia.
we have been discussing this problem for a long time. From our discussing, you can see that:
* CJK Unified Ideographs and CJK Compatibility Ideographs in Unicode chart contains
about thirty thousand Chinese characters, both simplified and tradational Chinese characters.
* simplified Chinese are only 2235 frequent characters.
* most of the 2235 simplified characters can be mapped to the tradational version
in a one-to-one way.
* only about 100 simp-trad pairs is not one-to-one.
In fact completely automatic conversition between SC and TC is really difficult,
but we can convert the difficult part manually, we can introduce some markup.
I really want to solve this problem, and I think I have the ability,
but just now i only have no time. The other difficulty for me is that
I don't have a thorough understand of the Squid cache.
I think, to solve this problem, we must design a solution are conformable with the
url, database and Squid cache requirement. Chinese Wikipedia community had discussed these problems
for a long time, The real problem so far is that no programmer take the real step.
I plan th solve it in the next year when I finnished my graduate thesis, if no one take the real step.
It is very welcome that someone is interesting about the problem and work hard to solve it.
To Jin Junshu/Mark
>English people and Japanese people also have the same universe, sun,
>planet, species, maths, logic, and universal history, yet we have
>separate Wikipedias for English and Japanese...
>By your logic, there shouldn't even be a zh: and we should only have
>one Wikipedia (which would probably be en: although I would much
>prefer is: or lb: or something of that sort)
You just missed my point. That's my fault not to express my point clearly.
I mean we have the same vocabulary in most cases,
we call things using the same name and same concepts in most cases,
this is important, this means we enjoy the same language.
The writing system is different, but most of characters can be mapped each other.
>So what? One could make the same arguments for not having separate
>Wikipedias for different languages.
You can't synchronize en: and jp: easily,
because they are different language.
But we can someday find a way to synchronize zh-cn: and zh-tw: easily.
That is the different.
>But the difference between the two isn't merely a "difference of
>character sets". Rather than converting on the level of the individual
>character which will inevitably produce poor results, it is nessecary
>to convert documents on the level of lexemes, for which one needs some
>sort of artificial intelligence capable of separating Chinese texts
>into individual lexemes before conversion. It is also nessecary to
>convert names of countries, special terminology
Did you not try the link I mentioned above?
please visit http://fengzz.net/wiki/ to try.
We are using some markup to solve this lexemes problem,
and it works.
I agree with you that it is inconvenient for traditional Chinese users now.
and I think the request on creating zh-tw version is proper.
But wed better have a good evaluation of different solutions before we decide.
In my opinion, to keep a single version is benefit for future,
and this is also the consensus of the Chinese Wikipedia community now.
Maybe we should discuss this issue more.
>This is not the case - the two writing systems in the same documents do
>cause problems - merely different problems. The advantage of a
>technical fix to the character problem to have one version is that the
>problems don't grow with time, where as diverged versions do. The
>problem set of making two characters sets work isn't a fast moving
>target, where as keeping up with two sets of wikipedians is.
Stirling, What you said is what I want to say, but my English is poor.
>> Yes, if you setup zh-tw, people will go there and write articles.
>> But the split of the community will only weaken the growth of the
>> small project
>> and bring more difficult in the future. Suppose two project zh-cn and
>> zh-tw now,
>> and someday we want to synchronize them, and you will find it is very
>> difficult.
>>
>> Yes, if you don’t want synchronize the two, there is no problem.
>
>This is not the case - the two writing systems in the same documents do
>cause problems - merely different problems. The advantage of a
>technical fix to the character problem to have one version is that the
>problems don't grow with time, where as diverged versions do. The
>problem set of making two characters sets work isn't a fast moving
>target, where as keeping up with two sets of wikipedians is.
Stirling, What you said is what I want to say, but my English is poor.