In my last email, I detailed some of the broad principles that we follow (or should follow) in the creation of new language editions of Wikipedia, and in this email I will sketch a basic proposal for a solution to our problems in this area.
To recap: I think there is broad community consensus for the following principles:
1. To encourage the creation of new Wikipedia editions in all legitimate living natural languages, in an orderly way which involves fostering a real community to care for wikis. We want to make it easy for newcomers to get excited and build, while at the same time trying to keep from having too many dead wikis around.
For real languages, then, we only want some small indication of interest from some native (or at least fluent) speakers. We want to make this part relatively easy.
2. To discourage the creation of new Wikipedia editions in dialects which do not significantly differ from existing Wikipedias. We want to keep from being hoaxed, and from falling into political traps.
For dialects, then, we want to require a much higher threshold before allowing the wikipedia -- we need a good reason to start it. A "Bavarian" Wikipedia proposal would need a much much stronger rationale before we start it than "German". Obviously.
3. To discourage the creation of new Wikipedia editions in constructed languages. I do not say "forbid" here, merely "discourage". Similar to dialects, constructed languages pose many risks for us of politics, hoax, etc.
-----
Therefore, I propose that we have a two-tier system and a formal committee formed of experts (or some outside language standards body if we can find one which is suitable) which will declare if a proposed language is an actual language or merely a dialect. This committee will issue for us an advisory opinion. We are not required, as a community, to follow the advisory opinion, but the advisory opinion will set a differential threshold for creation.
For those which are declared to be languages, we could continue with something like the existing process.
For those which are declared to be dialects, we would have a presumption against creation, but that presumption could be overcome by a broad community vote.
--Jimbo
2005/11/22, Jimmy Wales jwales@wikia.com:
In my last email, I detailed some of the broad principles that we follow (or should follow) in the creation of new language editions of Wikipedia, and in this email I will sketch a basic proposal for a solution to our problems in this area.
[...] Therefore, I propose that we have a two-tier system and a formal committee formed of experts (or some outside language standards body if we can find one which is suitable) which will declare if a proposed language is an actual language or merely a dialect. This committee will issue for us an advisory opinion. We are not required, as a community, to follow the advisory opinion, but the advisory opinion will set a differential threshold for creation.
Hi Jimbo, I appreciate your willingness to help solve the problems with new language wiki creation and also applaud your commitment to supporting minority languages, such as those in Europe like Cornish that you mentioned. I too feel that culturally these minority languages are a worthwhile addition to Wikipedia.
I'd like to suggest that taking into consideration ISO codes or SIL codes may be one solution. This would mean that an outside group which is well established and has looked into the matter has deemed a certain language important enough to be assigned a separate code.
In the case of Andalusian which I mentioned in a previous post this situation would result in it not being open for consideration since it does not yet have it's own ISO or SIL code. I would see this as a fair situation, although I'd want the Andalusian wiki formed, I'd see that the guidelines would not have been met, yet anyway to request such a wiki.
With regards, Jay B. [[User:ILVI]]
-- ilooy.gaon@gmail.com
ilooy wrote:
I'd like to suggest that taking into consideration ISO codes or SIL codes may be one solution. This would mean that an outside group which is well established and has looked into the matter has deemed a certain language important enough to be assigned a separate code.
I guess there are some who think this is perhaps a little bit too lax of a policy in some regards. There are ISO codes for completely fictional languages like Klingon or Tolkein Elvish, and there is some opposition to the current Klingon language Wikipedia as a result, even though it was created using this argument you are making.
In general, even with the group that assigns these ISO codes you are talking about, there is even opposition to constructed languages like these, so that does give more support to your argument. There are new ISO codes that are being added as well, and if you speak a dialect like Cornish or some other very little-know language you would do both your culture and the international community in general a good service by trying to get a new ISO code established for your language, going beyond just the Wikimedia Foundation. It is far easier to add an ISO code for a German dialect from Bavaria than to add one for a constructed language like Pig Latin.
The only real issue at that point is if you can get a reasonably large group of people to support something like Wikipedia. There are many very ambitious people who have started a Wikipedia for their language, only to have it become the target of spammers and vandals after it wasn't updated for several months with any new content. This is the dilemma really.
ilooy wrote:
I'd like to suggest that taking into consideration ISO codes or SIL codes may be one solution. This would mean that an outside group which is well established and has looked into the matter has deemed a certain language important enough to be assigned a separate code.
This is exactly the policy we adopted several years ago, which has proved insufficient.
Relying on existence of ISO codes brings us: * split Serbian/Croatian/Bosnian replacing Serbocroatian [controverial] * Klingon etc
and denies various languages/dialects/whatever which don't have their own codes but which are oft asked for.
-- brion vibber (brion @ pobox.com)
Brion Vibber wrote:
ilooy wrote:
I'd like to suggest that taking into consideration ISO codes or SIL codes may be one solution. This would mean that an outside group which is well established and has looked into the matter has deemed a certain language important enough to be assigned a separate code.
This is exactly the policy we adopted several years ago, which has proved insufficient.
Relying on existence of ISO codes brings us:
- split Serbian/Croatian/Bosnian replacing Serbocroatian [controverial]
- Klingon
etc
and denies various languages/dialects/whatever which don't have their own codes but which are oft asked for.
Clearly we should not rely exclusively on the codes for our decisions, but they do give a first slice in the decision making process, particularly if used as an exclusionary tool. There should still be an opportunity for those who are excluded to make a case.
IIRC the Ethnologue criterion for including a constructed language is the the existence of literary works in that language. They recognize 24 such languages. Yes, there are people with the kind of mentality that will create a literature in Klingon or Elvish. A better criterion for us would be the existence of native speakers. Another might be to ask who is the audience for this new language project. What interests the readers rather than the writers? What practical information would serve them best? What cultural support will best help their culture?
For the Serbocroatian and other politically charged situations it is very difficult to avoid the political traps unless you are willing to say "This is the way it's going to be," right from the beginning.
The other thing that is not addressed is the need for a critical mass. There are many individuals for whom the challenge is in having the project in their language approved. When they win the fight they lose interest, so for me the two-tier involves a fairly easy hurdle to get a project started followed by a probationary period where they can prove that there really is enough interest to keep it going.
Ec
IIRC the Ethnologue criterion for including a constructed language is the the existence of literary works in that language.
I don't think so.
http://en.wikipedia.org/wiki/Ethnologue http://en.wikipedia.org/wiki/SIL_International
Ilario
They recognize 24
such languages. Yes, there are people with the kind of mentality that will create a literature in Klingon or Elvish. A better criterion for us would be the existence of native speakers. Another might be to ask who is the audience for this new language project. What interests the readers rather than the writers? What practical information would serve
valdelli@bluemail.ch wrote:
IIRC the Ethnologue criterion for including a constructed language is the the existence of literary works in that language.
I don't think so.
http://en.wikipedia.org/wiki/Ethnologue http://en.wikipedia.org/wiki/SIL_International
Sorry, I should have been more clear in making the distinction between Ethnologue/SIL and ISO 639-3 for which Wthnologue is the "Registration Authority". Ethnolgue essentially does not concern itself with constructed languages.
The following is from http://www.sil.org/iso639-3/types.asp
Constructed languages
This part of ISO 639 also includes identifiers that denote constructed (or artificial) languages. In order to qualify for inclusion the language must have a literature and it must be designed for the purpose of human communication. Specifically excluded are reconstructed languages and computer programming languages.
Ec
2005/11/23, Brion Vibber brion@pobox.com:
ilooy wrote:
I'd like to suggest that taking into consideration ISO codes or SIL codes may [...]
This is exactly the policy we adopted several years ago, which has proved insufficient. Relying on existence of ISO codes brings us:
- split Serbian/Croatian/Bosnian replacing Serbocroatian [controverial]
- Klingon
etc and denies various languages/dialects/whatever which don't have their own codes but which are oft asked for.
Hi Brion, kaj saluton,
Actually I was suggesting it as a means to pre-sift through new language requests. Then those situations which require further attention could be handled more delicately.
As it is the situation gets pretty charged up. I offered my suggestion in the hope that maybe this way some requests, like the one for Murcian or Andalusian would be handled right off without the need for a lot of discussion.
I hope I was able to explain what I had in mind with this suggestion. I know somethings have been tried already. But there is also the idea that precedents have been made, a forum which deals with the more delicate issues would be a welcomed situation, I too agree with this, and my suggestion of the ISO and SIL codes was meant to help in this regard not to take its place of course.
With regards, Jay B. [[meta:User:ILVI]]
-- ilooy.gaon@gmail.com
ilooy wrote:
2005/11/23, Brion Vibber brion@pobox.com:
ilooy wrote:
I'd like to suggest that taking into consideration ISO codes or SIL codes may [...]
This is exactly the policy we adopted several years ago, which has proved insufficient. Relying on existence of ISO codes brings us:
- split Serbian/Croatian/Bosnian replacing Serbocroatian [controverial]
- Klingon
etc and denies various languages/dialects/whatever which don't have their own codes but which are oft asked for.
Hi Brion, kaj saluton,
Actually I was suggesting it as a means to pre-sift through new language requests. Then those situations which require further attention could be handled more delicately.
Well, that *is exactly* what we've done the last few years.
-- brion vibber (brion @ pobox.com)
Brion Vibber wrote:
ilooy wrote:
I'd like to suggest that taking into consideration ISO codes or SIL codes may be one solution. This would mean that an outside group which is well established and has looked into the matter has deemed a certain language important enough to be assigned a separate code.
This is exactly the policy we adopted several years ago, which has proved insufficient.
Relying on existence of ISO codes brings us:
- split Serbian/Croatian/Bosnian replacing Serbocroatian [controverial]
- Klingon
etc
and denies various languages/dialects/whatever which don't have their own codes but which are oft asked for.
Yes. It's really important that everyone gets this. The idea of referencing these external codes is and was a great one in many ways... it gets the argument out of our hands, it could presumably be a professionally-decided list, etc.
The only problem is that the list of ISO codes is highly politicized and broken in many many ways. It was fine for getting a list of things like "English" and "German" and "French" and so on, but it breaks down when you start looking at it more closely.
--Jimbo
Hoi, When we consider if a language is appropriate for having a wikipedia, you may find old respectable languages that have gone in official disuse like Neapolitan and consequently diverged in spelling because there was no longer an education in a language. Neopolitan is a language that some see as a dialect of Italian. I understand that the Andalusian statute does not say that Andalusian is a language while it is as different, as distinct as many of the other languages spoken in Spain. I know that Andalusian is known for its many songs which is one form of literature. On the I&I conference I spoke with someone from Guatamala and discussed Mayan languages. I learned that for many people it is their first and only language. It does not have a literature or standardised orthography because it is primarily a spoken language.
My point is that by creating artificial barriers, you prevent us from making our aim come true; all information for all people. When a small dialect/language is good for codifying that culture, it serves our purpose. I doubt that we would get these same people to write about their culture as much in a language that they are forced to use. When we insist on literature and one orthography we may exclude the people that use predominantly a spoken language and prevent us from getting our information to these people.
As I mentioned on the wikitech list, the whole idea of voting is broken anyway because when you vote for a language, you are supposed to actively support this project to be while a nay-sayer is excused from any effort. I do not vote for Dutch nds or for Andalusian because I will not work on these projects.
Artificial languages are different from dialects and languages as they do not represent a culture, a people. Consequently comparing natural and artificial languages is problematic. It also helps to be a little more relaxed about artificial languages; even with Klingon it works as a project for as long as there is a community willing to put a lot of effort in it. As it is a rather silly thing, I am sure that these people will reach this conclusion at some point. In the mean time it does not detract from the value of our other projects. It only becomes a problem when we adopt or care about the value system of the officious.
Thanks, GerardM
On 11/25/05, Jimmy Wales jwales@wikia.com wrote:
Brion Vibber wrote:
ilooy wrote:
I'd like to suggest that taking into consideration ISO codes or SIL
codes may
be one solution. This would mean that an outside group which is well established and has looked into the matter has deemed a certain language important enough to be assigned a separate code.
This is exactly the policy we adopted several years ago, which has
proved
insufficient.
Relying on existence of ISO codes brings us:
- split Serbian/Croatian/Bosnian replacing Serbocroatian [controverial]
- Klingon
etc
and denies various languages/dialects/whatever which don't have their
own codes
but which are oft asked for.
Yes. It's really important that everyone gets this. The idea of referencing these external codes is and was a great one in many ways... it gets the argument out of our hands, it could presumably be a professionally-decided list, etc.
The only problem is that the list of ISO codes is highly politicized and broken in many many ways. It was fine for getting a list of things like "English" and "German" and "French" and so on, but it breaks down when you start looking at it more closely.
--Jimbo _______________________________________________ foundation-l mailing list foundation-l@wikimedia.org http://mail.wikipedia.org/mailman/listinfo/foundation-l
On 11/25/05, Jimmy Wales jwales@wikia.com wrote:
Brion Vibber wrote:
This is exactly the policy we adopted several years ago, which has proved insufficient.
Relying on existence of ISO codes brings us:
- split Serbian/Croatian/Bosnian replacing Serbocroatian [controverial]
- Klingon
Another issue is, that we've already created wikipedias in most ISO 639-1 languages. So those that remain are by definition somewhat difficult.
and denies various languages/dialects/whatever which don't have their own codes but which are oft asked for.
Should it be possible for language enthusiasts to pitch to Wikipedia directly, rather than to some third-party language-centered org that WP truts and works with? If it were not possible to pitch directly to WM (say, acceptance by such a third-party group is a pre-req to applying for a lang-project), this would help avoid subsets of the world's language zealots engaging in subsets of global debates on WM mailing lists.
The only problem is that the list of ISO codes is highly politicized and broken in many many ways. It was fine for getting a list of things like "English" and "German" and "French" and so on, but it breaks down when
Who are the target audiences for a new and improved list of codes? language-enthusiast editors? readers? third-party content developers/aggregators? linguists? translators? educators? If some of these audiences will have to do more work and others less, which should get priority?
-- ++SJ
On 11/26/05, SJ 2.718281828@gmail.com wrote:
On 11/25/05, Jimmy Wales jwales@wikia.com wrote:
Brion Vibber wrote:
This is exactly the policy we adopted several years ago, which has
proved
insufficient.
Relying on existence of ISO codes brings us:
- split Serbian/Croatian/Bosnian replacing Serbocroatian
[controverial]
- Klingon
Another issue is, that we've already created wikipedias in most ISO 639-1 languages. So those that remain are by definition somewhat difficult.
There are nice tables for converting ISO-639-1 to ISO-639-2. The upcoming ISO-639-3 code is like the ISO-639-2 in three characters and there will be no conversion. However some codes will be removed and replaced by others. For your information, in Ultimate Wiktionary we will have a table with the ISO-639-3 codes but for convenience sake we have room for the ISO-639-1 codes. In order to make it more simple, we will have a field where we store the name that the WMF uses.
and denies various languages/dialects/whatever which don't have their
own codes
but which are oft asked for.
Should it be possible for language enthusiasts to pitch to Wikipedia directly, rather than to some third-party language-centered org that WP truts and works with? If it were not possible to pitch directly to WM (say, acceptance by such a third-party group is a pre-req to applying for a lang-project), this would help avoid subsets of the world's language zealots engaging in subsets of global debates on WM mailing lists.
As Ultimate Wiktionary, aims to have all words in all languages, words particular to dialects are certainly welcome. This will allow us to have words in Westfries, spoken in the region where I was born, once a language but now not more than a dialect of the Dutch language.. :)
The only problem is that the list of ISO codes is highly politicized and
broken in many many ways. It was fine for getting a list of things like "English" and "German" and "French" and so on, but it breaks down when
Who are the target audiences for a new and improved list of codes? language-enthusiast editors? readers? third-party content developers/aggregators? linguists? translators? educators? If some of these audiences will have to do more work and others less, which should get priority?
By creating "the" list of languages, you remove the discussion from the argument "is this a language or not". This removes the need from creating your own list and it is important because as we know it can be an endless struggle. Even when there is a language known on the list and there are people denying the validity and there are always more languages to consider. Being on the list is important when you care for a language. The audience ? All of the above. However for organisations like ours it does not free us from having the argument anyway. We have people that insist on a fixed orthography, a literature or an army to have the argument going their way.
I would like us to be relaxed about it. Let us have the projects in the languages that are requested, in the end the proof of the pudding is in the eating, when a project is a success, it is a reason to rejoice.
Thanks, GerardM
wikimedia-l@lists.wikimedia.org