The fourth proposal on this list (Wikipedia Pinyin) has some serious policy-related questions. If you respond on nothing else, please provide an opinion on that one.
Wikipedia Simple Chinesehttps://meta.wikimedia.org/wiki/Requests_for_new_languages/Wikipedia_Simple_Chinese (zh-simple): I have a query out (in two places) to the original requestor, as well as to the Meta community, as to whether a true "simple" version really exists. However, no one has cited one on the request page over the last six years, and in any case per current policy this would incubate within Chinese Wikipedia. I'm going to give two more weeks to hear a response to that query. If the answer is "no", I will reject. If the answer is "yes", I think I would put on hold, pending some evidence that "simple" content is starting to be added to zhwiki. (And if zhwiki has a community discussion and decides it doesn't want to do this, then I will reject.) If this ends in rejection, I will encourage anyone truly interested to start at Incubator Plus.
Wikipedia Southern Ndebelehttps://meta.wikimedia.org/wiki/Requests_for_new_languages/Wikipedia_South_Ndebele (nr): An official language of the Republic of South Africa, spoken by over 2 million speakers (L1 + L2). Test has about 20 pages, but was periodically active as late as 2016. Marking as eligible.
Wikipedia Kari Seediqhttps://meta.wikimedia.org/wiki/Requests_for_new_languages/Wikipedia_Kari_Seediq (trv): Aboriginal language of Taiwan with about 20,000 speakers. Test has well over 1,200 pages, and had a stretch of four months in late 2016–early 2017 when it had sufficient activity to be approvable. (There is no localization yet, however.) Marking as eligible.
Wikipedia Pinyin Chinesehttps://meta.wikimedia.org/wiki/Requests_for_new_languages/Wikipedia_Chinese_(Pinyin)_2 (coded for now as cmn, Mandarin Chinese): Here's a proposal that I think needs some serious discussion. Please read the discussion on the linked Meta page.
Arguments against:
* No separate ISO 639-3 language code * It is proposed that this can be handled with a script converter, per (for example) T193366https://phabricator.wikimedia.org/T193366. * One respondent objected to such a project taking manpower away from other Chinese-language projects.
Arguments supporting:
* Extremely widely used, and much on-line work in Chinese happens in Pinyin, not in ideographic characters. * Proponents state (I cannot confirm) that there are many people who are "illiterate" in Chinese, not having mastered 3000 characters, who can potentially contribute to such a project. If so, that is closer to the ideal of creating projects that "anyone can edit".
I would also note that several other Chinese projects use Romanized Chinese. (All the min-nan projects are exclusively in Romanized language—see Wikipedia herehttps://zh-min-nan.wikipedia.org/wiki/Th%C3%A2u-ia%CC%8Dh; Min Dong Wikipediahttps://cdo.wikipedia.org/wiki/T%C3%A0u_Hi%C4%95k has pages in both scripts.)
If this project is deemed eligible, I'm pretty sure that it should not be coded with "cmn". (I can let it stay that way in Incubator for now, or give it a q-code.) Test has 250 mainspace pages, and has been active periodically. Last period of substantial activity was during summer 2017.
I am going to hold back my opinion on this just yet.
Wikipedia Mesopotamian Arabichttps://meta.wikimedia.org/wiki/Requests_for_new_languages/Wikipedia_Mesopotamian_Arabic (acm): In principle, as eligible as any other variety of Arabic. Test has only two pages, both created in 2016. Placing on hold pending additional contributions.
Steven
Sent from Outlookhttp://aka.ms/weboutlook
I am going to address some points about the Pinyin Chinese proposal in this nail that I didn't address in the previous one:
Wikipedia Pinyin Chinese
https://meta.wikimedia.org/wiki/Requests_for_new_languages/Wikipedia_Chinese_(Pinyin)_2 (coded for now as cmn, Mandarin Chinese): Here's a proposal that I think needs some serious discussion. Please read the discussion on the linked Meta page.
*Arguments against:*
- No separate ISO 639-3 language code
- It is proposed that this can be handled with a script converter, per
(for example) T193366 https://phabricator.wikimedia.org/T193366.
- One respondent objected to such a project taking manpower away from
other Chinese-language projects.
*Arguments supporting:*
- Extremely widely used, and much on-line work in Chinese happens in
Pinyin, not in ideographic characters.
- Proponents state (I cannot confirm) that there are many people who
are "illiterate" in Chinese, not having mastered 3000 characters, who can potentially contribute to such a project. If so, that is closer to the ideal of creating projects that "anyone can edit".
I would also note that several other Chinese projects use Romanized Chinese. (All the min-nan projects are exclusively in Romanized language—see Wikipedia here https://zh-min-nan.wikipedia.org/wiki/Th%C3%A2u-ia%CC%8Dh; Min Dong Wikipedia https://cdo.wikipedia.org/wiki/T%C3%A0u_Hi%C4%95k has pages in both scripts.)
If this project is deemed eligible, I'm pretty sure that it should not be coded with "cmn". (I can let it stay that way in Incubator for now, or give it a q-code.) Test has 250 mainspace pages, and has been active periodically. Last period of substantial activity was during summer 2017.
I am going to hold back my opinion on this just yet.
* Nowhere in the proposal claims many Chinese online works are recorded in pinyin. What it said is simply that, when one want to type Chinese text into computer, most people would use the "pinyin" system to input those Chinese characters into the computer. * It is true that there are considerable amount of people in China who are still illiterate. However I don't think many of them can use Pinyin system either. Most of them are illiterate because of they didn't attend schools, which's where the pinyin system was taught. My understanding is that most of them probably can't tell what are characters like "abcd" alphabet are, let alone reading long text in pinyin. Reading/Writing in pinyin correctly would require fluency in the standard version of Mandarin Chinese which I don't think most of those population can do so either. * "How to let illiterate people involve in Wikipedia" is something that is hard to do. Pretty sure directly letting them to edit Wikipedia would be troublesome as they won't know how to find and read sources and make Wikipedia article that are up to encyclopedic standard. A better way to let them involves is probably to let then be a source via oral history projects like AfroCrowd imitative have attempted and let other Wikipedian to expand articles based on what they said. * The situation about other Chinese projects that are not Mandarin adopting latin character instead of Ideograph in representation are for inherently different reasons. Those language variants are rarely being written, so in daily life when they do written there are two main ways of doing it. First is to adopt Chinese characters depending on situation and context to see which would fit and write accordingly, and then second would be romanization. In many of these cases, the ideograph representation systems are disorganized and many who use them are influenced by pronunciation of those characters in Mandarin, so it is natural for users to adopt the more systemic latin character presentation, even if it mean greater learning curve to most users, at least there are actually an system that determine how to write them out and books like bibles are actually printed in such character. The situation for Mandarin would be totally different.
On 21 Aug 2018, at 17:46, Phake Nick c933103@gmail.com wrote:
Wikipedia Pinyin Chinese (coded for now as cmn, Mandarin Chinese): Here's a proposal that I think needs some serious discussion. Please read the discussion on the linked Meta page.
Arguments against: • No separate ISO 639-3 language code
Not necessary as the IANA script code extensions can be used, so zh-Latn-pinyin
• It is proposed that this can be handled with a script converter, per (for example) T193366.
This will never add in capital letters correctly.
Michael
On Wed, Aug 29, 2018, 11:39 AM Michael Everson everson@evertype.com wrote:
On 21 Aug 2018, at 17:46, Phake Nick c933103@gmail.com wrote:
Wikipedia Pinyin Chinese (coded for now as cmn, Mandarin Chinese):
Here's a proposal that I think needs some serious discussion. Please read the discussion on the linked Meta page.
Arguments against: • No separate ISO 639-3 language code
Not necessary as the IANA script code extensions can be used, so zh-Latn-pinyin
Agreed: BCP47.
• It is proposed that this can be handled with a script converter,
per (for example) T193366.
This will never add in capital letters correctly.
Could you describe this issue in more detail? --scott
Le mer. 29 août 2018 à 17:56, C. Scott Ananian cananian@wikimedia.org a écrit :
On Wed, Aug 29, 2018, 11:39 AM Michael Everson everson@evertype.com wrote:
On 21 Aug 2018, at 17:46, Phake Nick c933103@gmail.com wrote:
Wikipedia Pinyin Chinese (coded for now as cmn, Mandarin Chinese):
Here's a proposal that I think needs some serious discussion. Please read the discussion on the linked Meta page.
Arguments against: • No separate ISO 639-3 language code
Not necessary as the IANA script code extensions can be used, so zh-Latn-pinyin
Agreed: BCP47.
• It is proposed that this can be handled with a script
converter, per (for example) T193366.
This will never add in capital letters correctly.
Could you describe this issue in more detail?
Hi,
My Chinese is not very good (there might be some mistakes in this mail) but I can see how very difficult it could be to build a pinyin converter.
For instance, 北 is usually běi in pinyin (but probably not always, sinogram pronuncation can change depending on the context), the word for "North" in English. But in 北京 it's Běijīng, the capital of China (it's litt. "North capital", same for others cities of the same name), with an uppercase initial (and same for the derivated words). This cases are probably not unsolvable but a table of Chinese words in sinograms with corresponding pinyin is probably needed.
And it could be worse: the same sequence of sinograms can have different meanings. For instance, 王 is both a common noun for "king" but also the most common surname in China. So 王是大 could be either "The king is big" or "Mr/Ms Wang is big". Here 王 is at the beggining of the sentence so it's always uppercased but it could also be in the middle and then, in the first case it would be "wáng shì dà" and in the second case "Wáng shì dà".
Cheers, ~nicolas
在 2018年8月30日週四 00:50,Nicolas VIGNERON vigneron.nicolas@gmail.com 寫道:
Le mer. 29 août 2018 à 17:56, C. Scott Ananian cananian@wikimedia.org a écrit :
On Wed, Aug 29, 2018, 11:39 AM Michael Everson everson@evertype.com wrote:
On 21 Aug 2018, at 17:46, Phake Nick c933103@gmail.com wrote:
Wikipedia Pinyin Chinese (coded for now as cmn, Mandarin Chinese):
Here's a proposal that I think needs some serious discussion. Please read the discussion on the linked Meta page.
Arguments against: • No separate ISO 639-3 language code
Not necessary as the IANA script code extensions can be used, so zh-Latn-pinyin
Agreed: BCP47.
• It is proposed that this can be handled with a script
converter, per (for example) T193366.
This will never add in capital letters correctly.
Could you describe this issue in more detail?
Hi,
My Chinese is not very good (there might be some mistakes in this mail) but I can see how very difficult it could be to build a pinyin converter.
For instance, 北 is usually běi in pinyin (but probably not always, sinogram pronuncation can change depending on the context), the word for "North" in English. But in 北京 it's Běijīng, the capital of China (it's litt. "North capital", same for others cities of the same name), with an uppercase initial (and same for the derivated words). This cases are probably not unsolvable but a table of Chinese words in sinograms with corresponding pinyin is probably needed.
And it could be worse: the same sequence of sinograms can have different meanings. For instance, 王 is both a common noun for "king" but also the most common surname in China. So 王是大 could be either "The king is big" or "Mr/Ms Wang is big". Here 王 is at the beggining of the sentence so it's always uppercased but it could also be in the middle and then, in the first case it would be "wáng shì dà" and in the second case "Wáng shì dà".
Cheers, ~nicolas _______________________________________________ Langcom mailing list Langcom@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/langcom
Ah yeah space delimitation seems to require more capability than what the current LanguageConverter can provide. There are a number of Chinese word segmentation program [including a number of open source development projects] available on the internet, one of them claims a 97.5% accuracy (apparently that's not even the best one), but that still mean additional works will still be needed for LanguageConverter to integrate one of those programs before it can transliterate Chinese text into the recommended orthography for pinyin, and manual annotation like NoteTA template currently used in Chinese Wikipedia would probably still be needed to provide a perfect word segmentation to achieve perfect space delimitation. Without the word segmentation support, the paragraph produced would be less readable, comparable to what Vietnamese text look like.
Also, so far only 1 user have responded to my post on Chinese WP Community Portal, let me post a translation of that message here: "I have also contributed a few pages there (What I did was copying text from Wikipedia, throw them into Google and then copy the result back), however I support the deletion( of the incubator), I haven't seen any country that formally use phonetic transliteration for Chinese languages other than the Dungan language yet. As for importation, that's not necessary. Just as you see, other than the wiki's main page, which of them aren't copied from Chinese Wikipedia."
On Mon, Jul 30, 2018 at 10:47 AM, Steven White Koala19890@hotmail.com wrote:
Wikipedia Pinyin Chinese https://meta.wikimedia.org/wiki/Requests_for_new_languages/Wikipedia_Chinese_%28Pinyin%29_2 (coded for now as cmn, Mandarin Chinese): Here's a proposal that I think needs some serious discussion. Please read the discussion on the linked Meta page.
*Arguments against:*
- No separate ISO 639-3 language code
- It is proposed that this can be handled with a script converter, per
(for example) T193366 https://phabricator.wikimedia.org/T193366.
- One respondent objected to such a project taking manpower away from
other Chinese-language projects.
*Arguments supporting:*
- Extremely widely used, and much on-line work in Chinese happens in
Pinyin, not in ideographic characters.
- Proponents state (I cannot confirm) that there are many people who
are "illiterate" in Chinese, not having mastered 3000 characters, who can potentially contribute to such a project. If so, that is closer to the ideal of creating projects that "anyone can edit".
I would also note that several other Chinese projects use Romanized Chinese. (All the min-nan projects are exclusively in Romanized language—see Wikipedia here https://zh-min-nan.wikipedia.org/wiki/Th%C3%A2u-ia%CC%8Dh; Min Dong Wikipedia https://cdo.wikipedia.org/wiki/T%C3%A0u_Hi%C4%95k has pages in both scripts.)
If this project is deemed eligible, I'm pretty sure that it should not be coded with "cmn". (I can let it stay that way in Incubator for now, or give it a q-code.) Test has 250 mainspace pages, and has been active periodically. Last period of substantial activity was during summer 2017.
I am going to hold back my opinion on this just yet.
I would support an effort to explore the use of LanguageConverter here (and for zh-min-nan). I agree that the language code should not be bare 'cmn' but should include a script suffix. BCP47 suggests 'zh-Latn-pinyin' as a valid code: http://schneegans.de/lv/?tags=zh-Latn%0D%0Azh-Latn-pinyin%0D%0A&format=t... --scott