Le mer. 29 août 2018 à 17:56, C. Scott Ananian <cananian@wikimedia.org> a écrit :
On Wed, Aug 29, 2018, 11:39 AM Michael Everson <everson@evertype.com> wrote:
On 21 Aug 2018, at 17:46, Phake Nick <c933103@gmail.com> wrote:
> Wikipedia Pinyin Chinese (coded for now as cmn, Mandarin Chinese): Here's a proposal that I think needs some serious discussion. Please read the discussion on the linked Meta page.
> Arguments against:
>       • No separate ISO 639-3 language code

Not necessary as the IANA script code extensions can be used, so zh-Latn-pinyin

Agreed: BCP47.

>       • It is proposed that this can be handled with a script converter, per (for example) T193366.

This will never add in capital letters correctly.

Could you describe this issue in more detail?


My Chinese is not very good (there might be some mistakes in this mail) but I can see how very difficult it could be to build a pinyin converter.

For instance, 北 is usually běi in pinyin (but probably not always, sinogram pronuncation can change depending on the context), the word for "North" in English.
But in 北京 it's Běijīng, the capital of China (it's litt. "North capital", same for others cities of the same name), with an uppercase initial (and same for the derivated words).
This cases are probably not unsolvable but a table of Chinese words in sinograms with corresponding pinyin is probably needed.

And it could be worse: the same sequence of sinograms can have different meanings. For instance, 王 is both a common noun for "king" but also the most common surname in China. So 王是大 could be either "The king is big" or "Mr/Ms Wang is big". Here 王 is at the beggining of the sentence so it's always uppercased but it could also be in the middle and then, in the first case it would be "wáng shì dà" and in the second case "Wáng shì dà".

Cheers, ~nicolas