Le mer. 29 août 2018 à 17:56, C. Scott Ananian <cananian(a)wikimedia.org> a
On Wed, Aug 29, 2018, 11:39 AM Michael Everson
On 21 Aug 2018, at 17:46, Phake Nick
Wikipedia Pinyin Chinese (coded for now as cmn, Mandarin Chinese):
proposal that I think needs some serious discussion. Please read
the discussion on the linked Meta page.
• No separate ISO 639-3 language code
Not necessary as the IANA script code extensions can be used, so
• It is proposed that this can be handled
with a script
converter, per (for example) T193366.
This will never add in capital letters correctly.
Could you describe this issue in more detail?
My Chinese is not very good (there might be some mistakes in this mail) but
I can see how very difficult it could be to build a pinyin converter.
For instance, 北 is usually běi in pinyin (but probably not always, sinogram
pronuncation can change depending on the context), the word for "North" in
But in 北京 it's Běijīng, the capital of China (it's litt. "North
same for others cities of the same name), with an uppercase initial (and
same for the derivated words).
This cases are probably not unsolvable but a table of Chinese words in
sinograms with corresponding pinyin is probably needed.
And it could be worse: the same sequence of sinograms can have different
meanings. For instance, 王 is both a common noun for "king" but also the
most common surname in China. So 王是大 could be either "The king is big" or
"Mr/Ms Wang is big". Here 王 is at the beggining of the sentence so it's
always uppercased but it could also be in the middle and then, in the first
case it would be "wáng shì dà" and in the second case "Wáng shì dà".