[ Mark writes ]
You think that you can claim that just because I am not a Serbian, I know nothing about this issue. Well, I have seen images of Zlatibor on television, and I read all about the Serbian occupation. I have a bumper sticker that says "Free Zlatibor Now! Boycott Serbia!", and I own some books on Zlatiborian language. My deepest hope is that after the Serbian tyranny ends, I may travel to Zlatibor and witness the beauty firsthand.
Boy, I would give an eyetooth for one of those bumper stickers. Trade you for a pair of "Zapatistas of the world, untie!" shoe decals...
As long as rants about language acceptance are benig exchanged, here's something that has been bothering me lately:
It has been suggested more than once that we should get professional linguists / a small group of Wiki[m]edians to determine what new languages are 'proper' or acceptable, before asking the community to discuss and reach consensus. But should Wikimedia be making decisions about the officialness of languages at all? It seems to me we can neutrally defer this to a higher authority*.
The ISO handles ISO 639-1 and -2 reasonably well; these are standards designed to identify written documents, and to include those languages 'most frequently represented in the total body of the world's literature' -- which seems appropriate. I put some specific information on meta: http://meta.wikimedia.org/wiki/ISO_criteria_for_defining_new_languages
If you think we can do better than the ISO, please comment on that talk page**...
SJ
* "Wikipedia includes 74 major and 7 minor[1] languages with over 1000 articles. ... [1] 'Major' languages are defined as those with two-letter ISO 639-1 codes, a set of languages considered to be most frequently represented in world literature."
** Perhaps we can have a standard procedure that assumes an ISO-2 code language, and provide for exceptions. Some current non-ISO wikipedia languages, illustrating various reasons different users might have not to stick blindly to such a standard : - zh_min_nan (1,200 articles; listed in places as "taiwanese") , - tpi (tok pisin, recent conlang, 160 articles), - fiu_vro (Võro, 105 articles & activity), - roa_rup (Aromanian; 29 articles & little activity, but just got ISO 639-2 approval for "rup" in September). + Current well-received /proposals/ for non-ISO languages include pdc (Pennsylvania dutch), which already has a 500-article site independent of Wikimedia).
2005/11/8, SJ 2.718281828@gmail.com:
** Perhaps we can have a standard procedure that assumes an ISO-2 code language, and provide for exceptions. Some current non-ISO wikipedia languages, illustrating various reasons different users might have not to stick blindly to such a standard :
- zh_min_nan (1,200 articles; listed in places as "taiwanese") ,
- tpi (tok pisin, recent conlang, 160 articles),
- fiu_vro (Võro, 105 articles & activity),
- roa_rup (Aromanian; 29 articles & little activity, but just got ISO
639-2 approval for "rup" in September).
tpi *is* an official ISO 639-2 code. It is also not a conlang, but a creole - I think you're confusing it with Tokipona, for which the Wikipedia has been closed.
Apart from that, I am of the opinion that it is good to go by ISO 639-2, but as a default rather than an absolute. That is, we don't always follow ISO 639-2, but for languages on the list we will include unless there is a strong argument being made against, whereas languages not on the list will not be included only if there is a strong argument being made in support. Personally I would like to be slightly more strict - ISO 639-2 for living languages, but dead languages will have to show that there is recent material written in the language as well, rather than just an ISO 639-2 code because of old work.
-- Andre Engels, andreengels@gmail.com ICQ: 6260644 -- Skype: a_engels
On 11/8/05, Andre Engels andreengels@gmail.com wrote:
2005/11/8, SJ 2.718281828@gmail.com:
- tpi (tok pisin, recent conlang, 160 articles),
tpi *is* an official ISO 639-2 code. It is also not a conlang, but a creole - I think you're confusing it with Tokipona, for which the Wikipedia has been closed.
Whoops, it certainly is. Boy do I have papaya on my face... Tokipona, while it lastedas a WM project, did represent another reason some people wish to go outside the ISO.
Apart from that, I am of the opinion that it is good to go by ISO 639-2, but as a default rather than an absolute. That is, we don't always follow ISO 639-2, but for languages on the list we will include unless there is a strong argument being made against, whereas languages not on the list will not be included only if there is a strong argument being made in support. Personally I would like to be slightly more strict - ISO 639-2 for living languages, but dead languages will have to show that there is recent material written in the language as well, rather than just an ISO 639-2 code because of old work.
It should always be possible to make strong arguments for exceptions... though perhaps they should be limited to a well-defined space (perhaps even off the mailing lists :-).
As for ISO 639-3, this is not at all a patch or substitute for 639-2. Each layer of language-codes has a lower bar for what it means to be a [meaningful] language. -1 and -2 are specifically focused on l anguage designations that are useful to written work...
We should make use of the good work ISO is doing to distinguish between these language categories, where we can.
SJ
The main problem with ISO 639-1 & -2 is that nobody looked to see if they were covering all of the languages meeting the criteria.
Now, any language which doesn't have an ISO 639-1 or -2 code but _does_ meet the criteria, must have an application submitted by you for its inclusion by them. (yes, that sentence purposefully was an awkward one).
ISO isn't doing any work whatsoever to distinguish between solidly established written languages and everything else.
The main requirement, IIRC, is that there be at least 50 documents in the language at any library.
That wrongfully excludes many languages which are written but rarely published, or which are written but only have very minor publications.
Mark
On 10/11/05, SJ 2.718281828@gmail.com wrote:
On 11/8/05, Andre Engels andreengels@gmail.com wrote:
2005/11/8, SJ 2.718281828@gmail.com:
- tpi (tok pisin, recent conlang, 160 articles),
tpi *is* an official ISO 639-2 code. It is also not a conlang, but a creole - I think you're confusing it with Tokipona, for which the Wikipedia has been closed.
Whoops, it certainly is. Boy do I have papaya on my face... Tokipona, while it lastedas a WM project, did represent another reason some people wish to go outside the ISO.
Apart from that, I am of the opinion that it is good to go by ISO 639-2, but as a default rather than an absolute. That is, we don't always follow ISO 639-2, but for languages on the list we will include unless there is a strong argument being made against, whereas languages not on the list will not be included only if there is a strong argument being made in support. Personally I would like to be slightly more strict - ISO 639-2 for living languages, but dead languages will have to show that there is recent material written in the language as well, rather than just an ISO 639-2 code because of old work.
It should always be possible to make strong arguments for exceptions... though perhaps they should be limited to a well-defined space (perhaps even off the mailing lists :-).
As for ISO 639-3, this is not at all a patch or substitute for 639-2. Each layer of language-codes has a lower bar for what it means to be a [meaningful] language. -1 and -2 are specifically focused on l anguage designations that are useful to written work...
We should make use of the good work ISO is doing to distinguish between these language categories, where we can.
SJ _______________________________________________ Wikipedia-l mailing list Wikipedia-l@Wikimedia.org http://mail.wikipedia.org/mailman/listinfo/wikipedia-l
-- "Take away their language, destroy their souls." -- Joseph Stalin
2005/11/11, Mark Williamson node.ue@gmail.com:
The main requirement, IIRC, is that there be at least 50 documents in the language at any library.
50 different documents spread over 5 libraries is also considered sufficient.
That wrongfully excludes many languages which are written but rarely published, or which are written but only have very minor publications.
They are excluded, but I'm not sure whether that would be "wrongfully". Not that I want to exclude such languages from Wikipedia, but I would not mind if for those languages a more stringent test were required than for others.
-- Andre Engels, andreengels@gmail.com ICQ: 6260644 -- Skype: a_engels
I'd agree with you there, except that ISO is not always nessecarily correct.
For starters, ISO 639-2 excludes lots and lots of languages (cf Ethnologue).
The (provisional) ISO 639-3 solves this, but as of yet it is not final or officially official.
Now, the problem with that is that it follows Ethnologue's divisions of languages vs dialects for the most part. Thus, while Moroccan and Tunisian Arabic are considered separate languages (even people who advocate for separate languages within "Arabic" would consider them both part of Maghrebi Arabic), Yavapai and Havasupai or Lithuanian and Samogitian are not. Zlatiborian, of course, is not even mentioned. Give it time, perhaps.
Mark
On 08/11/05, SJ 2.718281828@gmail.com wrote:
[ Mark writes ]
You think that you can claim that just because I am not a Serbian, I know nothing about this issue. Well, I have seen images of Zlatibor on television, and I read all about the Serbian occupation. I have a bumper sticker that says "Free Zlatibor Now! Boycott Serbia!", and I own some books on Zlatiborian language. My deepest hope is that after the Serbian tyranny ends, I may travel to Zlatibor and witness the beauty firsthand.
Boy, I would give an eyetooth for one of those bumper stickers. Trade you for a pair of "Zapatistas of the world, untie!" shoe decals...
As long as rants about language acceptance are benig exchanged, here's something that has been bothering me lately:
It has been suggested more than once that we should get professional linguists / a small group of Wiki[m]edians to determine what new languages are 'proper' or acceptable, before asking the community to discuss and reach consensus. But should Wikimedia be making decisions about the officialness of languages at all? It seems to me we can neutrally defer this to a higher authority*.
The ISO handles ISO 639-1 and -2 reasonably well; these are standards designed to identify written documents, and to include those languages 'most frequently represented in the total body of the world's literature' -- which seems appropriate. I put some specific information on meta: http://meta.wikimedia.org/wiki/ISO_criteria_for_defining_new_languages
If you think we can do better than the ISO, please comment on that talk page**...
SJ
- "Wikipedia includes 74 major and 7 minor[1] languages with over
1000 articles. ... [1] 'Major' languages are defined as those with two-letter ISO 639-1 codes, a set of languages considered to be most frequently represented in world literature."
** Perhaps we can have a standard procedure that assumes an ISO-2 code language, and provide for exceptions. Some current non-ISO wikipedia languages, illustrating various reasons different users might have not to stick blindly to such a standard :
- zh_min_nan (1,200 articles; listed in places as "taiwanese") ,
- tpi (tok pisin, recent conlang, 160 articles),
- fiu_vro (Võro, 105 articles & activity),
- roa_rup (Aromanian; 29 articles & little activity, but just got ISO
639-2 approval for "rup" in September).
- Current well-received /proposals/ for non-ISO languages include pdc
(Pennsylvania dutch), which already has a 500-article site independent of Wikimedia). _______________________________________________ Wikipedia-l mailing list Wikipedia-l@Wikimedia.org http://mail.wikipedia.org/mailman/listinfo/wikipedia-l
-- "Take away their language, destroy their souls." -- Joseph Stalin
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA256
Mark Williamson wrote:
I'd agree with you there, except that ISO is not always nessecarily correct.
For starters, ISO 639-2 excludes lots and lots of languages (cf Ethnologue).
The (provisional) ISO 639-3 solves this, but as of yet it is not final or officially official.
Now, the problem with that is that it follows Ethnologue's divisions of languages vs dialects for the most part. Thus, while Moroccan and Tunisian Arabic are considered separate languages (even people who advocate for separate languages within "Arabic" would consider them both part of Maghrebi Arabic), Yavapai and Havasupai or Lithuanian and Samogitian are not. Zlatiborian, of course, is not even mentioned. Give it time, perhaps.
Show me where "Upper New York on the west side" exists in the ethnologue, and I will show you Zlatiborian.
- -- Alphax - http://en.wikipedia.org/wiki/User:Alphax Contributor to Wikipedia, the Free Encyclopedia "We make the internet not suck" - Jimbo Wales
"Alphax" alphasigmax@gmail.com wrote in message news:437150F9.8060102@gmail.com...
Mark Williamson wrote:
I'd agree with you there, except that ISO is not always nessecarily correct. For starters, ISO 639-2 excludes lots and lots of languages (cf Ethnologue). The (provisional) ISO 639-3 solves this, but as of yet it is not final or officially official.
[snip]
[...] Zlatiborian, of course, is not even mentioned. Give it time, perhaps.
Show me where "Upper New York on the west side" exists in the ethnologue, and I will show you Zlatiborian.
Does it have an entry for "Estuary English"?
No, but some people do consider it a sep'rate language.
Mark
On 10/11/05, Phil Boswell phil.boswell@gmail.com wrote:
"Alphax" alphasigmax@gmail.com wrote in message news:437150F9.8060102@gmail.com...
Mark Williamson wrote:
I'd agree with you there, except that ISO is not always nessecarily correct. For starters, ISO 639-2 excludes lots and lots of languages (cf Ethnologue). The (provisional) ISO 639-3 solves this, but as of yet it is not final or officially official.
[snip]
[...] Zlatiborian, of course, is not even mentioned. Give it time, perhaps.
Show me where "Upper New York on the west side" exists in the ethnologue, and I will show you Zlatiborian.
Does it have an entry for "Estuary English"?
Phil [[en:User:Phil Boswell]]
Wikipedia-l mailing list Wikipedia-l@Wikimedia.org http://mail.wikipedia.org/mailman/listinfo/wikipedia-l
-- "Take away their language, destroy their souls." -- Joseph Stalin
wikipedia-l@lists.wikimedia.org