[Wikitech-l] Re: zh-min-nan for Holopedia at Wikipedia, please! [no forking]
Henry H. Tan-Tenn
share2002nov at lomaji.com
Wed Jul 7 17:45:39 UTC 2004
Tim Starling wrote:
> The reason for choosing zh-cfr can be summarised in one word: Aromanian.
> A wiki in the Aromanian dialect was requested, and it was agreed that we
> should have one, but there was a problem with choosing the subdomain.
> Aromanian does not have a distinct ISO 639 code, however it is listed in
> the Ethlnologue under the code RUP. We had used only ISO 639 codes up to
> that time.
I understand the line of thought re Aromanian. My impression, though,
was that "tokipona" preceded the roa-rup construction in not following
ISO 639. Perhaps there was/is a separate policy for conlangs, I don't
know -- or maybe there was a transitional (ongoing?) state of confusion.
> The solution we came up with was to use the group ISO 639 code followed
> by the SIL code. This allows us to specify most languages in the
> Ethnologue without conflicting with the ISO standard, since ISO 639
> codes do not contain hyphens.
My understanding is that RFC 3066 is backward compatible with ISO-639.
It much prefers ISO-639 whenever a tag exists. I quote as an example:
"4. When a language has both an IANA-registered tag (i-something) and
a tag derived from an ISO registered code, you MUST use the ISO
tag. NOTE: When such a situation is discovered, the IANA-
registered tag SHOULD be deprecated as soon as possible."
Some related discussion on meta:
http://meta.wikipedia.org/wiki/Language_codes#RFC_3066
Now of course Wikipedia *could* stick to ISO-639 but it's so...um,
limiting, as we've discovered.
> Using the hyphenated language tags assigned by IANA, such as zh-min-nan,
> would conflict with this scheme. For example if we used zh-yue, it would
> be difficult to know what "yue" refers to. Is it an SIL code or an
> assigned code?
Well, zh-yue has been defined
(http://www.iana.org/assignments/lang-tags/zh-yue) so I'm not sure why
it'd be confusing to find out. Even without a lookup, on the face of it
it's telling: the zh (which is ISO-639) says it's some kind of Sinitic
variant. The "yue" part is a well-known word for Cantonese; it's found
in words such as "粵語" (Yue Speech/Language), "粵劇" (Yue Opera).
Although "yue" alone is not part of any standard I know of, the meaning
is clear.
To further argue from examples: "zh-min-nan" is more intuitive than
"zh-cfr". "min-nan" is again the transliteration of a common word in
Chinese. Although one *could* look up "cfr", the former can be deduced
based on knowledge common to Internet-using speakers of the said language.
To sum up: I think the IANA-assigned codes are (1) ISO-639 compatible
and (2) apparently easier to decipher than SIL tags (though there may be
counter examples). Unlike "tokipona", "minnan" alone, IANA tags appear
to make use of ISO-639 to suggest genetic affiliation. That's useful,
in my opinion.
As for the use of common names, that's something to consider. I'd
prefer the Minnan Wikipedia not be used as a guinea pig, though :)
> If I understand correctly, Shizhao's problem is that holopedia.net, and
> by extension minnan.wikipedia.org, is written in a script peculiar to
> Taiwan. The writing there is thus not representative of min-nan
> generally. So wouldn't it be better to use the RFC 3066 code specific to
> Taiwanese, namely zh-min-nan-TW? Or indeed, in keeping with my earlier
> point about such language codes being cryptic and unnecessarily lengthy,
> why not use ho-lo-oe.wikipedia.org?
I have not seen Shizhao describe the issue, certainly not to Minnan
Wikipedians. Now, I'm personally not opposed to "zh-min-nan-TW" (the
matter has not been discussed in Minnan), though as you know, Wikipedia
generally prefers to put same-language content together. The most
prominent example of this is the zh Wikipedia (where Simplifed and
Traditional scripts coexist). Various other languages also have
Latin/Cyrillic/Arabic di-/trigraphia. (And I think the issue has not
been more prominent because those Wikipedias are fairly inactive or
merely reserved.)
To keep discussion short I'll just say, for now, that most variants of
Minnan are generally and historically *not* written and when written,
generally without reference to a standard (lots of words written
inconsistently). The genra have also been limited to songs and other
poetic forms. So it's difficult to talk about representativeness in
that context. The current Minnan Wikipedia *is* the "state of the art"
to the extent that Minnan has been written and read historically.
~~~~
More information about the Wikitech-l
mailing list