I am an admin on en:, but I hardly ever read this mailing list, and only subscribed today. The reason I subscribed is because I am really quite upset and fed up about the ridiculous treatment Holopedia@Wikipedia received -- the free encyclopedia in the Taiwanese language "http://en.wikipedia.org/wiki/Taiwanese_%28linguistics%29".
I came up with the idea of Holopedia "http://weblog.holopedia.org/archives/holopedia.jpg".
The Most Serene Fellows Pektiong and Henry "http://www.digitas.harvard.edu/cgi-bin/wiki/ken/ThoanKhe" submitted an application to create Holopedia@Wikipedia. The intention was to prevent forking -- we could easily have run MediaWiki somewhere else (and indeed we did, as a means of testing -- but to prevent forking, I will not let you know where we ran it).
We requested the hostname 'zh-min-nan', per RFC 3066 "http://www.iana.org/assignments/lang-tags/zh-min-nan". I registered the tag, so I should know. Some silly person did not honour our request, but created a syncretic monster 'zh-cfr' without consulting the community (to prevent forking (and to prevent giving it any authority), I will not let you know where the letters CFR came from). Then, some (other) silly person dictated 'minnan'. Then, yet some other person suggested the meaningless 'poj'.
Please honour our original, simple request and give us "http://zh-min-nan.wikipedia.org/". We are holding back from editing at the moment because we still want to prevent forking. All the silly manoeuvre described above only works against the non-forking -- a fundamental value of Wikipedia. Please, CHANGE IT NOW to zh-min-nan! CHANGE IT NOW! CHANGE IT NOW! Let's waste no more time. Thank you very much.
CHANGE IT NOW! CHANGE IT NOW! CHANGE IT NOW to zh-min-nan! Have you changed it?
(Sorry for crossposting, but I found soon after sending to mediawiki-l@Wikimedia.org that I should have sent it here.)
Kaihsu Tai wrote:
We requested the hostname 'zh-min-nan', per RFC 3066 "http://www.iana.org/assignments/lang-tags/zh-min-nan". I registered the tag, so I should know. Some silly person did not honour our request, but created a syncretic monster 'zh-cfr' without consulting the community (to prevent forking (and to prevent giving it any authority), I will not let you know where the letters CFR came from). Then, some (other) silly person dictated 'minnan'.
Actually that was me both times.
Then, yet some other person suggested the meaningless 'poj'.
We don't pick language codes on the basis of what the community in question requests. Instead we are attempting to create some kind of site-wide standard. Wikimedia is not a webhosting company, it does not assign subdomains to whoever requests them on a first-come first-served basis.
The reason for choosing zh-cfr can be summarised in one word: Aromanian. A wiki in the Aromanian dialect was requested, and it was agreed that we should have one, but there was a problem with choosing the subdomain. Aromanian does not have a distinct ISO 639 code, however it is listed in the Ethlnologue under the code RUP. We had used only ISO 639 codes up to that time.
The solution we came up with was to use the group ISO 639 code followed by the SIL code. This allows us to specify most languages in the Ethnologue without conflicting with the ISO standard, since ISO 639 codes do not contain hyphens.
Using the hyphenated language tags assigned by IANA, such as zh-min-nan, would conflict with this scheme. For example if we used zh-yue, it would be difficult to know what "yue" refers to. Is it an SIL code or an assigned code?
We could use the RFC 3066 codes instead. This is still an option. However it wouldn't give us access to a large number of languages without resorting to awkward constructions such as x-sil-RUP.
Later I came to doubt the usefulness of language codes which are longer than the names of the languages themselves. Maybe en.wikipedia.org is a useful shortcut for english.wikipedia.org, but surely aromanian.wikipedia.org would be easier to remember than roa-rup.wikipedia.org.
If I'm wrong about that, feel free to explain it to me.
If I understand correctly, Shizhao's problem is that holopedia.net, and by extension minnan.wikipedia.org, is written in a script peculiar to Taiwan. The writing there is thus not representative of min-nan generally. So wouldn't it be better to use the RFC 3066 code specific to Taiwanese, namely zh-min-nan-TW? Or indeed, in keeping with my earlier point about such language codes being cryptic and unnecessarily lengthy, why not use ho-lo-oe.wikipedia.org?
-- Tim Starling
Let me preemptively apologize for the tone of this message.
Quick summary: if 'zh-min-nan' is not desirable, change to 'cfr.wikipedia.org'. See reasoning below.
Tim Starling, 2004-07-07 21:56:46+1000:
The reason for choosing zh-cfr can be summarised in one word: Aromanian. A wiki in the Aromanian dialect was requested, and it was agreed that we should have one, but there was a problem with choosing the subdomain. Aromanian does not have a distinct ISO 639 code, however it is listed in the Ethlnologue under the code RUP. We had used only ISO 639 codes up to that time.
The solution we came up with was to use the group ISO 639 code followed by the SIL code. This allows us to specify most languages in the Ethnologue without conflicting with the ISO standard, since ISO 639 codes do not contain hyphens.
All right. Now, explain why it is 'roa-rup' rather than just 'rup'? Why 'zh-cfr' rather than just 'cfr'? That is mixing ISO 639-1 unnecessarily with SIL code, which creates a monster: see paragraph below.
Maybe it is too late and the damage has been done, but one of the several reasonable candidate solutions would be: (option a) Wikipedia host 2-letter sequence -> look up in ISO 639-1 Wikipedia host 3-letter sequence -> look up in SIL Ethnologue But the current solution is: (option b) Wikipedia host containing hyphen -> look up in both ISO 639-1 or 639-2 and SIL And there is no understanding of which language is referred to when one encounters something internally contradictory such as en-frn.wikipedia.org. Is that English (ISO 639-1: en) or French (SIL: FRN)?? (One may argue that it will return 'This Wiki does not exist', but I still do not see why (b) is in any way necessary or better than (a) other than the farcical possibility of including ad hoc pidgins.)
Using the hyphenated language tags assigned by IANA, such as zh-min-nan, would conflict with this scheme. For example if we used zh-yue, it would be difficult to know what "yue" refers to. Is it an SIL code or an assigned code?
(option c) No, it would not. RFC 3066 adopts all ISO 639-1 language codes. Just look up in RFC 3066 (and transitively ISO 639-1).
We could use the RFC 3066 codes instead. This is still an option. However it wouldn't give us access to a large number of languages without resorting to awkward constructions such as x-sil-RUP.
(option c, corollary) And that is the thing one should do: using constructions such as x-sil-RUP when nothing more appropriate comes up elsewhere in RFC 3066. That is why such standards are useful -- they are usually very precise and expressive, however awkward one may consider.
If I'm wrong about that, feel free to explain it to me.
I have been arguing for the hardcore option (c), but I think a compromise is to consider option (a), which will involve (option a, implementation) 1) changing 'minnan' and 'zh-cfr' to simply 'cfr' 2) changing 'roa-rup' to 'rup' [3) any other adjustments]
[By the way, much of the SIL-inclusion problems have been argued about when RFC 3066 was being drafted ... and the result was x-SIL-RUP etc. Sorry to be speaking a bit ex cathedra, but I resent fighting battles already-won....]
If I understand correctly, Shizhao's problem is that holopedia.net, and by extension minnan.wikipedia.org, is written in a script peculiar to Taiwan. The writing there is thus not representative of min-nan generally. So wouldn't it be better to use the RFC 3066 code specific to Taiwanese, namely zh-min-nan-TW? Or indeed, in keeping with my earlier point about such language codes being cryptic and unnecessarily lengthy, why not use ho-lo-oe.wikipedia.org?
To give a glib answer: I do not think Shizhao speaks Minnan, otherwise he will rather contribute to Holopedia in whatever way he writes Minnan, or apply for zh-min-nan-CN.
And again let me apologize for presuming that you know about RFC 3066 before adopting the (ISO 639-1/2)-(SIL) scheme (which I still regard monstrous).
Kaihsu Tai, 2004-07-07 13:50:58+0100:
Let me preemptively apologize for the tone of this message.
Kaihsu Tai wrote in part:
Using the hyphenated language tags assigned by IANA, such as zh-min-nan, would conflict with this scheme. For example if we used zh-yue, it would be difficult to know what "yue" refers to. Is it an SIL code or an assigned code?
(option c) No, it would not. RFC 3066 adopts all ISO 639-1 language codes. Just look up in RFC 3066 (and transitively ISO 639-1).
I mean it's ambiguous if you use both the ISO-SIL hybrid and RFC 3066 on the same site. I'm quite aware of the fact that RFC 3066 adopts all ISO 639-1 codes.
We could use the RFC 3066 codes instead. This is still an option. However it wouldn't give us access to a large number of languages without resorting to awkward constructions such as x-sil-RUP.
(option c, corollary) And that is the thing one should do: using constructions such as x-sil-RUP when nothing more appropriate comes up elsewhere in RFC 3066. That is why such standards are useful -- they are usually very precise and expressive, however awkward one may consider.
If I'm wrong about that, feel free to explain it to me.
I have been arguing for the hardcore option (c), but I think a compromise is to consider option (a), which will involve (option a, implementation)
- changing 'minnan' and 'zh-cfr' to simply 'cfr'
- changing 'roa-rup' to 'rup'
[3) any other adjustments]
I suggested a few similar schemes to Brion, and he turned them down on the basis of lack of standards compliance. SIL codes may conflict with future ISO 639 assignments.
[By the way, much of the SIL-inclusion problems have been argued about when RFC 3066 was being drafted ... and the result was x-SIL-RUP etc. Sorry to be speaking a bit ex cathedra, but I resent fighting battles already-won....]
This is a different battle, we have different needs. I'm disappointed you haven't commented on my idea of using descriptive names rather than codes. Cryptic hyphenated codes may be useful for information interchange between computers, but humans often find them hard to remember. We can always include the ISO 3066 code in the document headers, and we can always set up redirects from various language identifiers to the primary subdomain.
-- Tim Starling
Tim Starling wrote:
The reason for choosing zh-cfr can be summarised in one word: Aromanian. A wiki in the Aromanian dialect was requested, and it was agreed that we should have one, but there was a problem with choosing the subdomain. Aromanian does not have a distinct ISO 639 code, however it is listed in the Ethlnologue under the code RUP. We had used only ISO 639 codes up to that time.
I understand the line of thought re Aromanian. My impression, though, was that "tokipona" preceded the roa-rup construction in not following ISO 639. Perhaps there was/is a separate policy for conlangs, I don't know -- or maybe there was a transitional (ongoing?) state of confusion.
The solution we came up with was to use the group ISO 639 code followed by the SIL code. This allows us to specify most languages in the Ethnologue without conflicting with the ISO standard, since ISO 639 codes do not contain hyphens.
My understanding is that RFC 3066 is backward compatible with ISO-639. It much prefers ISO-639 whenever a tag exists. I quote as an example:
"4. When a language has both an IANA-registered tag (i-something) and a tag derived from an ISO registered code, you MUST use the ISO tag. NOTE: When such a situation is discovered, the IANA- registered tag SHOULD be deprecated as soon as possible."
Some related discussion on meta: http://meta.wikipedia.org/wiki/Language_codes#RFC_3066
Now of course Wikipedia *could* stick to ISO-639 but it's so...um, limiting, as we've discovered.
Using the hyphenated language tags assigned by IANA, such as zh-min-nan, would conflict with this scheme. For example if we used zh-yue, it would be difficult to know what "yue" refers to. Is it an SIL code or an assigned code?
Well, zh-yue has been defined (http://www.iana.org/assignments/lang-tags/zh-yue) so I'm not sure why it'd be confusing to find out. Even without a lookup, on the face of it it's telling: the zh (which is ISO-639) says it's some kind of Sinitic variant. The "yue" part is a well-known word for Cantonese; it's found in words such as "粵語" (Yue Speech/Language), "粵劇" (Yue Opera). Although "yue" alone is not part of any standard I know of, the meaning is clear.
To further argue from examples: "zh-min-nan" is more intuitive than "zh-cfr". "min-nan" is again the transliteration of a common word in Chinese. Although one *could* look up "cfr", the former can be deduced based on knowledge common to Internet-using speakers of the said language.
To sum up: I think the IANA-assigned codes are (1) ISO-639 compatible and (2) apparently easier to decipher than SIL tags (though there may be counter examples). Unlike "tokipona", "minnan" alone, IANA tags appear to make use of ISO-639 to suggest genetic affiliation. That's useful, in my opinion.
As for the use of common names, that's something to consider. I'd prefer the Minnan Wikipedia not be used as a guinea pig, though :)
If I understand correctly, Shizhao's problem is that holopedia.net, and by extension minnan.wikipedia.org, is written in a script peculiar to Taiwan. The writing there is thus not representative of min-nan generally. So wouldn't it be better to use the RFC 3066 code specific to Taiwanese, namely zh-min-nan-TW? Or indeed, in keeping with my earlier point about such language codes being cryptic and unnecessarily lengthy, why not use ho-lo-oe.wikipedia.org?
I have not seen Shizhao describe the issue, certainly not to Minnan Wikipedians. Now, I'm personally not opposed to "zh-min-nan-TW" (the matter has not been discussed in Minnan), though as you know, Wikipedia generally prefers to put same-language content together. The most prominent example of this is the zh Wikipedia (where Simplifed and Traditional scripts coexist). Various other languages also have Latin/Cyrillic/Arabic di-/trigraphia. (And I think the issue has not been more prominent because those Wikipedias are fairly inactive or merely reserved.)
To keep discussion short I'll just say, for now, that most variants of Minnan are generally and historically *not* written and when written, generally without reference to a standard (lots of words written inconsistently). The genra have also been limited to songs and other poetic forms. So it's difficult to talk about representativeness in that context. The current Minnan Wikipedia *is* the "state of the art" to the extent that Minnan has been written and read historically.
~~~~
Kaihsu Tai wrote:
because I am really quite upset and fed up about the ridiculous treatment Holopedia@Wikipedia received -- the [...] intention was to prevent forking -- we could easily have run MediaWiki somewhere else (and indeed we did, as a means of
Your angry tone and constant threats about forking suggests that this is exactly what you should do. Go fork.
Suppose people at Wikipedia agreed to your demands today. Then you would come with new demands tomorrow, threatening again to fork. With friends like you, the Wikipedia community needs no enemies. You are angry now. Go away and calm down. In the mean time, take your idea with you and prove that you can indeed build upon it. Next year you can merge with Wikipedia again, if you want to.
wikitech-l@lists.wikimedia.org