Tim Starling wrote:
The reason for choosing zh-cfr can be summarised in one word: Aromanian. A wiki in the Aromanian dialect was requested, and it was agreed that we should have one, but there was a problem with choosing the subdomain. Aromanian does not have a distinct ISO 639 code, however it is listed in the Ethlnologue under the code RUP. We had used only ISO 639 codes up to that time.
I understand the line of thought re Aromanian. My impression, though, was that "tokipona" preceded the roa-rup construction in not following ISO 639. Perhaps there was/is a separate policy for conlangs, I don't know -- or maybe there was a transitional (ongoing?) state of confusion.
The solution we came up with was to use the group ISO 639 code followed by the SIL code. This allows us to specify most languages in the Ethnologue without conflicting with the ISO standard, since ISO 639 codes do not contain hyphens.
My understanding is that RFC 3066 is backward compatible with ISO-639. It much prefers ISO-639 whenever a tag exists. I quote as an example:
"4. When a language has both an IANA-registered tag (i-something) and a tag derived from an ISO registered code, you MUST use the ISO tag. NOTE: When such a situation is discovered, the IANA- registered tag SHOULD be deprecated as soon as possible."
Some related discussion on meta: http://meta.wikipedia.org/wiki/Language_codes#RFC_3066
Now of course Wikipedia *could* stick to ISO-639 but it's so...um, limiting, as we've discovered.
Using the hyphenated language tags assigned by IANA, such as zh-min-nan, would conflict with this scheme. For example if we used zh-yue, it would be difficult to know what "yue" refers to. Is it an SIL code or an assigned code?
Well, zh-yue has been defined (http://www.iana.org/assignments/lang-tags/zh-yue) so I'm not sure why it'd be confusing to find out. Even without a lookup, on the face of it it's telling: the zh (which is ISO-639) says it's some kind of Sinitic variant. The "yue" part is a well-known word for Cantonese; it's found in words such as "粵語" (Yue Speech/Language), "粵劇" (Yue Opera). Although "yue" alone is not part of any standard I know of, the meaning is clear.
To further argue from examples: "zh-min-nan" is more intuitive than "zh-cfr". "min-nan" is again the transliteration of a common word in Chinese. Although one *could* look up "cfr", the former can be deduced based on knowledge common to Internet-using speakers of the said language.
To sum up: I think the IANA-assigned codes are (1) ISO-639 compatible and (2) apparently easier to decipher than SIL tags (though there may be counter examples). Unlike "tokipona", "minnan" alone, IANA tags appear to make use of ISO-639 to suggest genetic affiliation. That's useful, in my opinion.
As for the use of common names, that's something to consider. I'd prefer the Minnan Wikipedia not be used as a guinea pig, though :)
If I understand correctly, Shizhao's problem is that holopedia.net, and by extension minnan.wikipedia.org, is written in a script peculiar to Taiwan. The writing there is thus not representative of min-nan generally. So wouldn't it be better to use the RFC 3066 code specific to Taiwanese, namely zh-min-nan-TW? Or indeed, in keeping with my earlier point about such language codes being cryptic and unnecessarily lengthy, why not use ho-lo-oe.wikipedia.org?
I have not seen Shizhao describe the issue, certainly not to Minnan Wikipedians. Now, I'm personally not opposed to "zh-min-nan-TW" (the matter has not been discussed in Minnan), though as you know, Wikipedia generally prefers to put same-language content together. The most prominent example of this is the zh Wikipedia (where Simplifed and Traditional scripts coexist). Various other languages also have Latin/Cyrillic/Arabic di-/trigraphia. (And I think the issue has not been more prominent because those Wikipedias are fairly inactive or merely reserved.)
To keep discussion short I'll just say, for now, that most variants of Minnan are generally and historically *not* written and when written, generally without reference to a standard (lots of words written inconsistently). The genra have also been limited to songs and other poetic forms. So it's difficult to talk about representativeness in that context. The current Minnan Wikipedia *is* the "state of the art" to the extent that Minnan has been written and read historically.
~~~~