Andrew Dunbar wrote:
--- Gerard Meijssen gerardm@myrealbox.com wrote:
There is a big thing on the wikipedia-l about writing up Chinese. One thing I gleaned from this discussion is that zh-tw and zh-cn are used to indicate respectively traditional and simplified Chinese. As it is relevant to wiktionary to have both correct spellings, I propose to use these codes as well as the zh code to indicate Chinese words.
I don't have references but I'm sure I've read this is not a good idea. Because all countries actually do use both scripts sometimes, because sometimes the countries use entirely different words for the same thing, so it doesn't always even come down to a choice of character. I think we'd be overriding a country code as a script code when what we really need is a script code. That would not be ambiguous. Something like zh-trad and zh-simp. Or zh-trd and zh-smp.
I think I've seen a page on Microsoft's site which does it this way somewhere but I doubt they really use it. They do have language numbers to take into account the different scripts though, including the Serbians below.
In a wikipedia context, it is not a good idea as you want to bring people together. In a wiktionary context things are imho different as we do provide all (correct) words in all languages. In English, many words are spelled differently depending on it being en-us or en-uk or en-aus etc. The meaning of a word may be subtly different as well, so it is not always synonyms that we are talking about. With different scripts in one language you have words in a different script that are synonymous.
The pronounciation is also often different depending on where you come form. Patatoe, router etc
In a wiktionary you want to define the words and make it plain where the word comes from. All English variants can understand each other. It is up to wiktionary to allow for these differences. Consequently it is not only about script I realise. As Wikipedia already has zh-tw and zh-cn as codes, using them within wiktionary as well is reasonable. For Serbian, sr-cyr and sr-lat makes sense to me.
What I am not sure about is, how do we indicate the word as such; {{-xx-}} indicates a word in a language. I really want to keep it that way. Does following it up with {{xx-xx}} to indicate the relevant subset (characterset or regionality) as reasonable ?
Thanks, GerardM
I hope someone has a good suggestion for Serbian, cyrillic and alphabetic.
Umm Cyrillic is still alphabetic. I think you meant Cyrillic and Latin. How about sr-cyr and sr-lat?
There are more language that are written in different charactersets. I am looking forward to suggestions.
The least obscure I can think of is Punjabi which is written in Gurmukhi, its own indic script; Shahmukhi, a derivation from the Urdu script which is itself a derivation of the Arabic script; and finally in Deva- nagari, the most common script in India. - But this is already quite obscure. Many former Soviet Republics have 2 or 3 scripts as well.
Andrew Dunbar (hippietrail)
Thanks, GerardM _______________________________________________ Wiktionary-l mailing list Wiktionary-l@Wikipedia.org