[Foundation-l] 1.3 billion of humans don't have Wikipedia in their native...

M. Williamson node.ue at gmail.com
Mon May 23 13:04:31 UTC 2011

When words are from the same root, the same character is generally
used regardless of modern pronunciation. In Traditional Chinese,
phonetic elements are mostly based on older pronunciations which might
not make sense in all modern Sinitic languages; sometimes in
Simplified Chinese these are replaced by phonetic elements based on
Mandarin pronunciation.

However, Milos, I believe you have misinterpreted "logophonetic" here.
Although the script has phonetic elements, this does not mean that the
phonetic elements are based on modern pronunciations. So for example,
西瓜 is the word for watermelon in every Sinitic language (as far as I'm
aware). In Mandarin it is pronounced "xi gua"; in Cantonese it is "sai
gwaa", in Min Nan it is "sai koe", in Shanghainese Wu it is "si kwo"
(I have not noted tones here due to different tone systems in these
languages). In spite of differing words, since they are all from the
same etymological root, they are all written exactly the same way with
the same characters. This is probably not the best example since
neither of these characters has a phonetic element, but that is
irrelevant because even if they did the case would be the same.

What DOES make Sinitic (Chinese) languages different when written is
the following (*this is important*): Words that are not etymologically
related to the equivalent in other Sinitic languages are often/usually
written differently; grammar and syntax can be different (as an
example, in Shanghai Wu you can say "We drink coffee" as "Ala kafi
che" which is literally "We coffee drink"; in Mandarin it would be
said as "Women he kafei", literally "We drink coffee", notice the
different word order), including grammatical particles which have no
direct equivalent.

Imagine for a moment that English and Spanish used a similar writing
system. "I want you to give me a piece of bread" and "Quiero que me
des un pedacito de pan" would be written differently due to differing

"I want you to give me a piece of bread" would be written as "[I]
"Quiero que me des un pedacito de pan" would be written as

Also, "Cuando va a llegar Maria?" (accents missing) and "When is Maria
going to arrive?"

"Cuando va a llegar Maria?" would be written as "[WHEN] [GO]-[THIRD
"When is Maria going to arrive?" would be written as "[WHEN] [IS]
[MARIA] [GOING TO] [ARRIVE]" or something like that. Note here that
the "arrive" comes after "Maria" in English, but before in Spanish.

These are relatively simple examples, but although in many ways
English and Spanish (and many other Western European languages) have
relatively similar syntax (as compared to, say, Asian, African or
American languages) and are related, due to these grammar differences
it would be impossible to unify them in writing.

It is essentially the same case with Sinitic languages.

However, there is another issue at play here: the classification of
Sinitic languages and dialects is a bit controversial, and it is
possible that some of these "languages" identified by the Ethnologue
would not want or need a separate version. Jin Chinese, for example,
is often identified as a divergent dialect of Mandarin, and I'm
doubtful that a Wikipedia written in Jin in Chinese characters would
differ substantially from zh.wp, and almost certain (though I am
willing to be proven wrong) that they would not differ enough in
writing to merit separate Wikipedias.

Also, I am somewhat doubtful that varieties such as Puxian, with 2.5
million speakers who are almost all highly literate in Standard
Chinese (=written Mandarin), would ever have enough editors or readers
to amount to much. Sinitic Wikipedias we currently have, such as
Cantonese, Wu and two of the Min languages, are fortunate to have much
larger numbers of speakers, existing tradition of literature written
in them, and a very high degree of regional linguistic pride
(especially noted for Cantonese). So varieties such as Puxian, Min
Zhong, Pinghua and Huizhou seem unlikely to attract enough attention
to be viable projects.

I also wonder, with regards to Arabic varities, if it is really in our
best interests to follow Ethnologue classifications, which often
follow national borders rather than linguistic boundaries. For
example, I have been told that Moroccan, Tunisian, Libyan and Algerian
Arabic are all easily mutually intelligible, often considered a single
language called "Derija".

I am certainly in favor of having Wikipedias in colloquial varities of
Arabic, but I don't know that it is wise to encourage maximal
linguistic balkanization and division of resources when it is possible
to allow people to coalesce around a common language. Rather than
blindly following the Ethnologue, I would advocate a greater reliance
on expert opinion and advice, and advocacy with ISO committee when
necessary to get codes changed/created/merged/deleted.

2011/5/23 Milos Rancic <millosh at gmail.com>
> On 05/23/2011 10:55 AM, Nikola Smolenski wrote:
> > On 05/23/2011 10:33 AM, Milos Rancic wrote:
> >>> In Chinese writing a character shows a word, irrespective of how the
> >>> word is pronounced. So if we would use a Chinese style writing system,
> >>> you could write [your] [dog] [is] [dead], and a Frenchman would write
> >>> exactly the same, even though he would pronounce [your] [dog] [is]
> >>> [dead] as "Votre chien est mort". Thus, different languages might
> >>> write the same sentence the same in Chinese script. This does not mean
> >>> that there are no differences - someone who spoke Latin would probably
> >>> spell this line as [dog] [your] [dead] [is], and perhaps in yet
> >>> another language this would be immensely crude, and the right thing to
> >>> say would be "[prepare for bad news] [honorific person] [your] [dog]
> >>> [is] [not] [alive]", but the mere difference of being in a different
> >>> language with totally different sounds is not enough to conclude that
> >>> in Chinese writing the actual written text will be different.
> >>
> >> Andre, that's not accurate explanation. Chinese script is not purely
> >> logographic, but logo-syllabic (or logo-phonetic). There are *phonetic*
> >> parts inside of the writing system.
> >
> > But different Chinese languages will still use the same character for
> > different but related phonetic component.
> That's living process in Chinese languages. While for phonetic
> transcription of an old word Classical Chinese knowledge is required (or
> learning pronunciation as-is), it is possible to create a dialectal
> compound. However, I can just guess is it true or not. And our fellow
> Chinese Wikimedians could give to us some information regarding that.
> _______________________________________________
> foundation-l mailing list
> foundation-l at lists.wikimedia.org
> Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l

More information about the foundation-l mailing list