While preparing Missing Wikipedias [1], I've got numbers of speakers and languages by area and country with chapter not covered by Wikipedias.
Numbers are preliminary, some of them should be corrected. I didn't exclude Han languages, which mostly shouldn't be counted, and similar. Note, also, that every language should be analyzed separately. Many languages are spoken not just inside of one country.
Please, fix errors and comment.
* * *
Areas. They approximate the usual definitions of areas, but they are different because of linguistic corrections.
* Afro-Asiatic Area: Area where Afro-Asiatic languages are dominant. North Africa + Middle East + Sudan, Ethiopia, Eritrea and Somalia - Iran. * Europe: Europe (including Caucasus) includes Turkey. * South Asia: South Asia + Iran. Dominantly Indo-European and Dravidian languages. * Sub-Saharan Africa: The rest of Africa. * Polynesia, Australia and Oceania: Includes Malaysia and Taiwan (Taiwanese languages not covered in Wikipedias are dominantly Austronesian.) * East Asia: Han China "China (Central)", Korea and Japan. * South-East Asia: Includes non-Han south China "China (South)". * Latin America: Parts of America where Spanish and Portuguese are official languages. * Anglo-French America: Parts of America where English, French and Dutch are official languages. * North Asia: Asian part of former USSR, Mongolia and non-Han northern and western China "China (North)".
The first column is number of speakers, the second number of languages, the third is area.
399259294 592 South Asia 353676706 1805 Sub-Saharan Africa 221855457 253 Afro-Asiatic Area 138979263 2198 Polynesia, Australia and Oceania 107363760 37 East Asia 99260271 447 South-East Asia 47901185 143 Europe 30361602 724 Latin America 8481452 227 Anglo-French America 3724384 45 North Asia
* * *
Countries with chapters. (Numbers are not fully correct, as they include some languages removed in the list below this one.)
If any chapter (or interested group) is interested in full list of missing languages, I'll provide it by request before completing the work. I suppose that some chapters are interested in languages with less than 100K of speakers, as well.
296,097,274 349 India 71,356,176 681 Indonesia 46,676,395 157 Philippines 7,819,010 9 Germany 7,994,871 76 Russian Federation 5,386,580 5 Serbia 4,785,299 6 South Africa 2,841,300 17 Israel 1,139,750 4 Ukraine 1,085,931 125 United States 832,000 3 Netherlands 705,967 70 Canada 472,470 1 Czech Republic 375,704 17 Taiwan 313,642 6 Chile 246,900 3 United Kingdom 200,500 4 Spain 191,430 5 Poland 151,240 7 Sweden 132,809 12 Argentina 86,390 155 Australia 50,000 1 France 30,000 1 Hungary 29,980 4 Switzerland 17,460 5 Finland 15,000 1 Portugal 10,500 2 Norway 5,000 1 Denmark 4,500 1 Estonia
Languages with more than million or more than 100,000 of speakers without Wikipedia and with chapter in the country:
India (more than million) 38261000 Awadhi 34700000 Maithili 17500000 Chhattisgarhi 13000000 Magahi 13000000 Haryanvi 12800000 Deccan 10400000 Malvi 9500000 Kanauji 9000000 Dhundari 7760000 Bagheli 6970000 Varhadi-Nagpuri 6170900 Santali 6000000 Lambadi 5622600 Marwari 5000000 Mewati 4730000 Hadothi 4004490 Konkani 3900000 Merwari 3800000 Mina 3633900 Konkani, Goan 3000000 Shekhawati 3000000 Godwari 2920000 Garhwali 2680000 Indian Sign Language 2360000 Kumaoni 2110000 Dogri 2100000 Bagri 2094200 Kurux 2000000 Mewari 1970000 Sadri 1950000 Tulu 1950000 Gondi, Northern 1930000 Waddar 1710000 Wagdi 1700000 Kangri 1580000 Khandesi 1560280 Mundari 1543300 Bodo 1500000 Ho 1430000 Nimadi 1391000 Meitei 1300000 Bhili 1200000 Vasavi 1150000 Bhilali 1045000 Panjabi, Mirpur 1000000 Pahari, Mahasu
Indonesia (more than million) 13600900 Madura 5530000 Minangkabau 3930000 Musi 3502300 Banjar 3330000 Bali 2700000 Betawi 2350000 Malay, Central 2100000 Sasak 2000000 Batak Toba 1880000 Malay, Makassar 1600000 Makasar 1200000 Batak Simalungun 1200000 Batak Dairi 1100000 Batak Mandailing 1000000 Malay, Jambi
Philippines (more than 100k) 5770000 Hiligaynon 2500000 Bicolano, Central 1900000 Bicolano, Albay 1062000 Tausug 1000000 Maguindanao 776000 Maranao 639000 Capiznon 540000 Bontoc, Central 500000 Ibanag 395000 Inakeanon 378000 Kinaray-a 350000 Masbatenyo 345000 Surigaonon 319000 Sama, Southern 293000 Chavacano 234000 Bicolano, Iriga 200000 Romblomanon 200000 Bantoanon 185000 Sorsogon, Waray 150000 Kankanaey 150000 Blaan, Koronadal 147000 Davawenyo 140000 Subanen, Central 134000 Itawit 123000 Cuyonon 122000 Bicolano, Northern Catanduanes 111000 Ibaloi 107000 Yakan 100000 Philippine Sign Language 100000 Binukid
Germany 4910000 Mainfränkisch 2000000 Saxon, Upper 819000 Swabian
Russian Federation 783720 Lezgi 696630 Erzya 614000 Moksha 516490 Dargwa 499300 Adyghe 460090 Mari, Meadow 422550 Kumyk 413000 Ingush 363000 Yakut 264400 Tuva 217000 Komi-Zyrian 164420 Lak 128900 Tabassaran 113710 Balkar
Serbia and Kosovo 4156090 Albanian, Gheg 709570 Romani, Balkan 318920 Romani, Sinte 172000 Romano-Serbian
South Africa 4101000 Sotho, Northern 640000 Ndebele
Israel 1762320 Yiddish, Eastern 352500 Arabic, Judeo-Tunisian 258930 Arabic, Judeo-Moroccan 110000 Bukharic 100130 Arabic, Judeo-Iraqi
United States 600000 Hawai’i Creole English 250000 Sea Island Creole English
Netherlands 592000 Gronings 220000 Zeeuws
Canada 402900 Plautdietsch
Czech Republic 472470 Romani, Carpathian
Taiwan 138000 Amis
Chile 300039 Mapudungun
United Kingdom 202900 Angloromani
Spain 102000 Spanish Sign Language
Sweden 109600 Finnish, Tornedalen
Hi,
On 25 Jun 2011, at 05:52, Milos Rancic millosh@gmail.com wrote:
While preparing Missing Wikipedias [1], I've got numbers of speakers and languages by area and country with chapter not covered by Wikipedias.
Fascinating! Thanks for the work! :-)
Isabell.
Forwarding Deryk Chan's email and my response on his request.
-------- Original Message -------- Subject: Re: [Internal-l] Fwd: [Foundation-l] Languages and numbers Date: Sat, 25 Jun 2011 13:55:58 +0200 From: Milos Rancic millosh@gmail.com To: Deryck Chan deryckchan@gmail.com
On 06/25/2011 01:28 PM, Deryck Chan wrote:
(sorry, am on mobile, can't post to list. Feel free to forward this onto the list)
2 obvious queries:
- How are we going to do a Wikipedia on... Indian Sign Language?
- If we exclude the Chinese languages from the table (which is a move I
agree with), we should also exclude all other languages which defer to the standard written form of a related language that has a Wikipedia, eg. Mainfränkisch (because we have a standard German Wikipedia).
1. There are requests for Wikipedias in sign languages (search for "sign language" here [1]). They intend to use SignWriting [2]. We are waiting for implementation of top-bottom writing to be able to host sign languages.
2. I didn't say that we should exclude Chinese languages, but that is likely that some of them should be excluded. If they are too close to Mandarin so there is no significant difference in writing, yes. If not, no. But, I think that all of the Han languages not closely related to Mandarin already have their own Wikipedia.
Note, also, that there is request for Wikipedia in Swabian [3], as well as there are a number of Wikipedias in German languages. So, it's up to them to decide what do they want. Besides that, one thing is Standard Chinese, the other is Standard German. Logographic script allows much more varieties to be covered than phonetic one. For example, with logographic script Serbian and English could be written in one orthography (while not English and German nor Serbian and Bulgarian).
Besides that, I intentionally categorized Han China, Korea and Japan together (as East Asia) because it is not likely that WMF should do anything there. All countries are developed enough (OK, North Korea is not, but there is South Korea) and languages in those areas stay well enough. That's true for the most of languages of countries which are OECD members.
The main purpose of this document is to point to the large populations without Wikipedia in their native language. India, Indonesia and Philippines will be in focus, obviously.
[1] http://meta.wikimedia.org/wiki/Requests_for_new_languages [2] http://en.wikipedia.org/wiki/SignWriting [3] http://meta.wikimedia.org/wiki/Requests_for_new_languages/Wikipedia_Swabian
I posted this on the India list (many people are not subscribed to foundation-l) - forwarding this question which just popped up.
Bishakha
---------- Forwarded message ---------- From: Vickram Crishna vvcrishna@radiophony.com Date: Sat, Jun 25, 2011 at 6:08 PM Subject: Re: [Wikimediaindia-l] Fwd: [Foundation-l] Languages and numbers To: Wikimedia India Community list wikimediaindia-l@lists.wikimedia.org
It is fascinating, although I think I may not have understood the classifications. Is there only one Indian Sign Language, for instance? I was told by a user (in the UK) that several are in use in different parts of the country. Still, perhaps the variants do not have sufficient numbers of users to qualify for this listing. However, the context in which I was told was precisely the severe lack of support materials for helping users become self-sufficient and good communicators, so the list itself becomes a barrier.
Unfortunately, I do not know at the moment how to fix the problem.
[296,097,274 349 India]
Does the population number mean that the existing indic language wikipedias covers the rest of the population ie over 90 crore? Is this information updated from the current census?
On Sat, Jun 25, 2011 at 10:22 AM, Milos Rancic millosh@gmail.com wrote:
While preparing Missing Wikipedias [1], I've got numbers of speakers and languages by area and country with chapter not covered by Wikipedias.
Numbers are preliminary, some of them should be corrected. I didn't exclude Han languages, which mostly shouldn't be counted, and similar. Note, also, that every language should be analyzed separately. Many languages are spoken not just inside of one country.
Please, fix errors and comment.
Areas. They approximate the usual definitions of areas, but they are different because of linguistic corrections.
- Afro-Asiatic Area: Area where Afro-Asiatic languages are dominant.
North Africa + Middle East + Sudan, Ethiopia, Eritrea and Somalia - Iran.
- Europe: Europe (including Caucasus) includes Turkey.
- South Asia: South Asia + Iran. Dominantly Indo-European and Dravidian
languages.
- Sub-Saharan Africa: The rest of Africa.
- Polynesia, Australia and Oceania: Includes Malaysia and Taiwan
(Taiwanese languages not covered in Wikipedias are dominantly Austronesian.)
- East Asia: Han China "China (Central)", Korea and Japan.
- South-East Asia: Includes non-Han south China "China (South)".
- Latin America: Parts of America where Spanish and Portuguese are
official languages.
- Anglo-French America: Parts of America where English, French and Dutch
are official languages.
- North Asia: Asian part of former USSR, Mongolia and non-Han northern
and western China "China (North)".
The first column is number of speakers, the second number of languages, the third is area.
399259294 592 South Asia 353676706 1805 Sub-Saharan Africa 221855457 253 Afro-Asiatic Area 138979263 2198 Polynesia, Australia and Oceania 107363760 37 East Asia 99260271 447 South-East Asia 47901185 143 Europe 30361602 724 Latin America 8481452 227 Anglo-French America 3724384 45 North Asia
Countries with chapters. (Numbers are not fully correct, as they include some languages removed in the list below this one.)
If any chapter (or interested group) is interested in full list of missing languages, I'll provide it by request before completing the work. I suppose that some chapters are interested in languages with less than 100K of speakers, as well.
296,097,274 349 India 71,356,176 681 Indonesia 46,676,395 157 Philippines 7,819,010 9 Germany 7,994,871 76 Russian Federation 5,386,580 5 Serbia 4,785,299 6 South Africa 2,841,300 17 Israel 1,139,750 4 Ukraine 1,085,931 125 United States 832,000 3 Netherlands 705,967 70 Canada 472,470 1 Czech Republic 375,704 17 Taiwan 313,642 6 Chile 246,900 3 United Kingdom 200,500 4 Spain 191,430 5 Poland 151,240 7 Sweden 132,809 12 Argentina 86,390 155 Australia 50,000 1 France 30,000 1 Hungary 29,980 4 Switzerland 17,460 5 Finland 15,000 1 Portugal 10,500 2 Norway 5,000 1 Denmark 4,500 1 Estonia
Languages with more than million or more than 100,000 of speakers without Wikipedia and with chapter in the country:
India (more than million) 38261000 Awadhi 34700000 Maithili 17500000 Chhattisgarhi 13000000 Magahi 13000000 Haryanvi 12800000 Deccan 10400000 Malvi 9500000 Kanauji 9000000 Dhundari 7760000 Bagheli 6970000 Varhadi-Nagpuri 6170900 Santali 6000000 Lambadi 5622600 Marwari 5000000 Mewati 4730000 Hadothi 4004490 Konkani 3900000 Merwari 3800000 Mina 3633900 Konkani, Goan 3000000 Shekhawati 3000000 Godwari 2920000 Garhwali 2680000 Indian Sign Language 2360000 Kumaoni 2110000 Dogri 2100000 Bagri 2094200 Kurux 2000000 Mewari 1970000 Sadri 1950000 Tulu 1950000 Gondi, Northern 1930000 Waddar 1710000 Wagdi 1700000 Kangri 1580000 Khandesi 1560280 Mundari 1543300 Bodo 1500000 Ho 1430000 Nimadi 1391000 Meitei 1300000 Bhili 1200000 Vasavi 1150000 Bhilali 1045000 Panjabi, Mirpur 1000000 Pahari, Mahasu
Indonesia (more than million) 13600900 Madura 5530000 Minangkabau 3930000 Musi 3502300 Banjar 3330000 Bali 2700000 Betawi 2350000 Malay, Central 2100000 Sasak 2000000 Batak Toba 1880000 Malay, Makassar 1600000 Makasar 1200000 Batak Simalungun 1200000 Batak Dairi 1100000 Batak Mandailing 1000000 Malay, Jambi
Philippines (more than 100k) 5770000 Hiligaynon 2500000 Bicolano, Central 1900000 Bicolano, Albay 1062000 Tausug 1000000 Maguindanao 776000 Maranao 639000 Capiznon 540000 Bontoc, Central 500000 Ibanag 395000 Inakeanon 378000 Kinaray-a 350000 Masbatenyo 345000 Surigaonon 319000 Sama, Southern 293000 Chavacano 234000 Bicolano, Iriga 200000 Romblomanon 200000 Bantoanon 185000 Sorsogon, Waray 150000 Kankanaey 150000 Blaan, Koronadal 147000 Davawenyo 140000 Subanen, Central 134000 Itawit 123000 Cuyonon 122000 Bicolano, Northern Catanduanes 111000 Ibaloi 107000 Yakan 100000 Philippine Sign Language 100000 Binukid
Germany 4910000 Mainfränkisch 2000000 Saxon, Upper 819000 Swabian
Russian Federation 783720 Lezgi 696630 Erzya 614000 Moksha 516490 Dargwa 499300 Adyghe 460090 Mari, Meadow 422550 Kumyk 413000 Ingush 363000 Yakut 264400 Tuva 217000 Komi-Zyrian 164420 Lak 128900 Tabassaran 113710 Balkar
Serbia and Kosovo 4156090 Albanian, Gheg 709570 Romani, Balkan 318920 Romani, Sinte 172000 Romano-Serbian
South Africa 4101000 Sotho, Northern 640000 Ndebele
Israel 1762320 Yiddish, Eastern 352500 Arabic, Judeo-Tunisian 258930 Arabic, Judeo-Moroccan 110000 Bukharic 100130 Arabic, Judeo-Iraqi
United States 600000 Hawai’i Creole English 250000 Sea Island Creole English
Netherlands 592000 Gronings 220000 Zeeuws
Canada 402900 Plautdietsch
Czech Republic 472470 Romani, Carpathian
Taiwan 138000 Amis
Chile 300039 Mapudungun
United Kingdom 202900 Angloromani
Spain 102000 Spanish Sign Language
Sweden 109600 Finnish, Tornedalen
foundation-l mailing list foundation-l@lists.wikimedia.org Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l
On 06/25/2011 03:11 PM, Bishakha Datta wrote:
I posted this on the India list (many people are not subscribed to foundation-l) - forwarding this question which just popped up.
First of all, although numbers look fascinatingly precise, they are far from that. When you make a sum of approximations like ~1M+800k+30k+4k+700+20+ the language spoken by three individuals, you will get fascinating number 1,834,723. So, the numbers are far from being census-level precision.
All of the numbers are based on Ethnologue data [1], which varies from very good to very bad approximations. Ethnologue varies even in linguistic classification a lot. (Being educated in Serbian linguistics, I know how bad the description of the South Slavic area is.) BUT, it is the best source for all languages of the world ever been made, and it gives good general picture.
[296,097,274 349 India]
Does the population number mean that the existing indic language wikipedias covers the rest of the population ie over 90 crore? Is this information updated from the current census?
By making a quick approximation of number of speakers of some large official languages of India [2] and not counting English, I've come to the number of ~650M and stopped counting (BTW, that includes the number of 180M of Hindi speakers from 1991; and according to the population growth in India, there should be at least 250M of Hindi speakers today). Thus, I think that ~300M more could be gathered by other languages with Wikipedias and by adjusting existing numbers for population growth. (I could make more precise calculation if needed, but I would need some time.) It should be also noted that dates of the entries in Ethnologue vary a lot and that some of them could be old 20 years or more.
And, again, this should be used as very general guideline, not as a precise one. This list would be very good in telling that there are much more speakers of Awadhi than Merwari today. However, it is not good to be used for comparison of number of speakers between Awadhi and Maithili. But, anyway, that's not important. We know that we should work to cover both Awadhi and Maithili.
At the other side, I will, indeed, try to make those numbers more useful (although I think that the most important usefulness is about pointing to the large populations without Wikipedias).
It is fascinating, although I think I may not have understood the classifications. Is there only one Indian Sign Language, for instance? I was told by a user (in the UK) that several are in use in different parts of the country. Still, perhaps the variants do not have sufficient numbers of users to qualify for this listing. However, the context in which I was told was precisely the severe lack of support materials for helping users become self-sufficient and good communicators, so the list itself becomes a barrier.
Unfortunately, I do not know at the moment how to fix the problem.
I've checked the whole database and just one Indian Sign Language has been listed, which doesn't tell us a lot. Ethnologue entry about Indian Sign Language [3] says that it is called "Indo-Pakistani Sign Language" or "Urban Indian Sign Language". However, according to the fact that "Deaf schools mainly do not use ISL...", it could mean that dialectical divergence could be very high (thus, it could look as a number of different languages), no matter the fact that it's been used in Pakistan and Bangladesh, as well.
Said so, I have to admit that my knowledge about sign languages is very limited.
[1] http://www.ethnologue.com/ [2] http://en.wikipedia.org/wiki/Languages_with_official_status_in_India [3] http://www.ethnologue.com/show_language.asp?code=ins
Some of these actually already have Wikipedias:
Meadow Mari Yakut (aka Sakha) Lak Balkar (aka Karachay-Balkar) Yiddish, Eastern (= "standard" Yiddish, "Western Yiddish" is the one we are missing but it has much fewer speakers; according to Ethnologue there are only 5,400 around the world)
In addition, in another message you stated that we probably had Wikipedias in every Sinitic language that was distinct enough from Mandarin to receive an own Wikipedia; Min Bei has 10.3 million speakers and does not have a Wikipedia and is definitely far removed from Mandarin; Xiang is also probably deserving of its own Wikipedia and has 30 million+ speakers.
2011/6/24 Milos Rancic millosh@gmail.com
While preparing Missing Wikipedias [1], I've got numbers of speakers and languages by area and country with chapter not covered by Wikipedias.
Numbers are preliminary, some of them should be corrected. I didn't exclude Han languages, which mostly shouldn't be counted, and similar. Note, also, that every language should be analyzed separately. Many languages are spoken not just inside of one country.
Please, fix errors and comment.
Areas. They approximate the usual definitions of areas, but they are different because of linguistic corrections.
- Afro-Asiatic Area: Area where Afro-Asiatic languages are dominant.
North Africa + Middle East + Sudan, Ethiopia, Eritrea and Somalia - Iran.
- Europe: Europe (including Caucasus) includes Turkey.
- South Asia: South Asia + Iran. Dominantly Indo-European and Dravidian
languages.
- Sub-Saharan Africa: The rest of Africa.
- Polynesia, Australia and Oceania: Includes Malaysia and Taiwan
(Taiwanese languages not covered in Wikipedias are dominantly Austronesian.)
- East Asia: Han China "China (Central)", Korea and Japan.
- South-East Asia: Includes non-Han south China "China (South)".
- Latin America: Parts of America where Spanish and Portuguese are
official languages.
- Anglo-French America: Parts of America where English, French and Dutch
are official languages.
- North Asia: Asian part of former USSR, Mongolia and non-Han northern
and western China "China (North)".
The first column is number of speakers, the second number of languages, the third is area.
399259294 592 South Asia 353676706 1805 Sub-Saharan Africa 221855457 253 Afro-Asiatic Area 138979263 2198 Polynesia, Australia and Oceania 107363760 37 East Asia 99260271 447 South-East Asia 47901185 143 Europe 30361602 724 Latin America 8481452 227 Anglo-French America 3724384 45 North Asia
Countries with chapters. (Numbers are not fully correct, as they include some languages removed in the list below this one.)
If any chapter (or interested group) is interested in full list of missing languages, I'll provide it by request before completing the work. I suppose that some chapters are interested in languages with less than 100K of speakers, as well.
296,097,274 349 India 71,356,176 681 Indonesia 46,676,395 157 Philippines 7,819,010 9 Germany 7,994,871 76 Russian Federation 5,386,580 5 Serbia 4,785,299 6 South Africa 2,841,300 17 Israel 1,139,750 4 Ukraine 1,085,931 125 United States 832,000 3 Netherlands 705,967 70 Canada 472,470 1 Czech Republic 375,704 17 Taiwan 313,642 6 Chile 246,900 3 United Kingdom 200,500 4 Spain 191,430 5 Poland 151,240 7 Sweden 132,809 12 Argentina 86,390 155 Australia 50,000 1 France 30,000 1 Hungary 29,980 4 Switzerland 17,460 5 Finland 15,000 1 Portugal 10,500 2 Norway 5,000 1 Denmark 4,500 1 Estonia
Languages with more than million or more than 100,000 of speakers without Wikipedia and with chapter in the country:
India (more than million) 38261000 Awadhi 34700000 Maithili 17500000 Chhattisgarhi 13000000 Magahi 13000000 Haryanvi 12800000 Deccan 10400000 Malvi 9500000 Kanauji 9000000 Dhundari 7760000 Bagheli 6970000 Varhadi-Nagpuri 6170900 Santali 6000000 Lambadi 5622600 Marwari 5000000 Mewati 4730000 Hadothi 4004490 Konkani 3900000 Merwari 3800000 Mina 3633900 Konkani, Goan 3000000 Shekhawati 3000000 Godwari 2920000 Garhwali 2680000 Indian Sign Language 2360000 Kumaoni 2110000 Dogri 2100000 Bagri 2094200 Kurux 2000000 Mewari 1970000 Sadri 1950000 Tulu 1950000 Gondi, Northern 1930000 Waddar 1710000 Wagdi 1700000 Kangri 1580000 Khandesi 1560280 Mundari 1543300 Bodo 1500000 Ho 1430000 Nimadi 1391000 Meitei 1300000 Bhili 1200000 Vasavi 1150000 Bhilali 1045000 Panjabi, Mirpur 1000000 Pahari, Mahasu
Indonesia (more than million) 13600900 Madura 5530000 Minangkabau 3930000 Musi 3502300 Banjar 3330000 Bali 2700000 Betawi 2350000 Malay, Central 2100000 Sasak 2000000 Batak Toba 1880000 Malay, Makassar 1600000 Makasar 1200000 Batak Simalungun 1200000 Batak Dairi 1100000 Batak Mandailing 1000000 Malay, Jambi
Philippines (more than 100k) 5770000 Hiligaynon 2500000 Bicolano, Central 1900000 Bicolano, Albay 1062000 Tausug 1000000 Maguindanao 776000 Maranao 639000 Capiznon 540000 Bontoc, Central 500000 Ibanag 395000 Inakeanon 378000 Kinaray-a 350000 Masbatenyo 345000 Surigaonon 319000 Sama, Southern 293000 Chavacano 234000 Bicolano, Iriga 200000 Romblomanon 200000 Bantoanon 185000 Sorsogon, Waray 150000 Kankanaey 150000 Blaan, Koronadal 147000 Davawenyo 140000 Subanen, Central 134000 Itawit 123000 Cuyonon 122000 Bicolano, Northern Catanduanes 111000 Ibaloi 107000 Yakan 100000 Philippine Sign Language 100000 Binukid
Germany 4910000 Mainfränkisch 2000000 Saxon, Upper 819000 Swabian
Russian Federation 783720 Lezgi 696630 Erzya 614000 Moksha 516490 Dargwa 499300 Adyghe 460090 Mari, Meadow 422550 Kumyk 413000 Ingush 363000 Yakut 264400 Tuva 217000 Komi-Zyrian 164420 Lak 128900 Tabassaran 113710 Balkar
Serbia and Kosovo 4156090 Albanian, Gheg 709570 Romani, Balkan 318920 Romani, Sinte 172000 Romano-Serbian
South Africa 4101000 Sotho, Northern 640000 Ndebele
Israel 1762320 Yiddish, Eastern 352500 Arabic, Judeo-Tunisian 258930 Arabic, Judeo-Moroccan 110000 Bukharic 100130 Arabic, Judeo-Iraqi
United States 600000 Hawai’i Creole English 250000 Sea Island Creole English
Netherlands 592000 Gronings 220000 Zeeuws
Canada 402900 Plautdietsch
Czech Republic 472470 Romani, Carpathian
Taiwan 138000 Amis
Chile 300039 Mapudungun
United Kingdom 202900 Angloromani
Spain 102000 Spanish Sign Language
Sweden 109600 Finnish, Tornedalen
foundation-l mailing list foundation-l@lists.wikimedia.org Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l
On 06/27/2011 12:30 AM, M. Williamson wrote:
Some of these actually already have Wikipedias:
Meadow Mari Yakut (aka Sakha) Lak Balkar (aka Karachay-Balkar) Yiddish, Eastern (= "standard" Yiddish, "Western Yiddish" is the one we are missing but it has much fewer speakers; according to Ethnologue there are only 5,400 around the world)
In addition, in another message you stated that we probably had Wikipedias in every Sinitic language that was distinct enough from Mandarin to receive an own Wikipedia; Min Bei has 10.3 million speakers and does not have a Wikipedia and is definitely far removed from Mandarin; Xiang is also probably deserving of its own Wikipedia and has 30 million+ speakers.
Thanks for the corrections!
As for Han languages, because of the languages which you mentioned, I intentionally left all of them. Obviously, they will be analyzed on case-by-case basis.
But, Han languages are not endangered, China is fairly developed country, their basic written language needs are covered by CJK characters and fonts etc. If they want to have Wikipedia, it is likely that they would get it, but it is not priority.
If we are talking about languages of China, Hmong–Mien (or Miao–Yao) languages, for example, should be more in focus, as some of them have enough speakers to create viable Wikimedia projects if supported (Chuanqiandian Cluster Miao has 1.4M of speakers).
More data could be found at [1]. It is about coverage of languages by Wikimedia projects by size of population, logarithmic.
Numbers are not a surprise.
[1] https://spreadsheets.google.com/spreadsheet/ccc?key=tCwO11tFPLPB-SJafDesypg&...
Milosh, thanks for your work. Just to correct: Moksha, Erzya, Yakut (=Sakha), Komi-Zyrian (=Komi) and Lak all have Wikipedias (though admittedly for Lak I am the only active contributor). Adyge is almost identical to Kabardino-Circassian, and Adyge speakers probably will never have their own Wikipedia. Balkar is a part of Karachai-Balkar which has a Wikipedia.
Cheers Yaroslav
Russian Federation 783720 Lezgi 696630 Erzya 614000 Moksha 516490 Dargwa 499300 Adyghe 460090 Mari, Meadow 422550 Kumyk 413000 Ingush 363000 Yakut 264400 Tuva 217000 Komi-Zyrian 164420 Lak 128900 Tabassaran 113710 Balkar
2011/7/1 Yaroslav M. Blanter putevod@mccme.ru:
Adyge is almost identical to Kabardino-Circassian, and Adyge speakers probably will never have their own Wikipedia.
From what i hear about this, Adyge and Kabardian may be two varieties
of a Circassian [[macrolanguage]]. Maybe someone who cares about it will submit a request to ISO to consider redefining their codes accordingly.
The recently created Kabardian Wikipedia ( kbd.wikipedia.org ) is developing quite nicely. It already has contributors in both varieties of this language and they get along well.
-- Amir Elisha Aharoni · אָמִיר אֱלִישָׁע אַהֲרוֹנִי http://aharoni.wordpress.com "We're living in pieces, I want to live in peace." - T. Moore
On 07/01/2011 01:24 PM, Yaroslav M. Blanter wrote:
Milosh, thanks for your work. Just to correct: Moksha, Erzya, Yakut (=Sakha), Komi-Zyrian (=Komi) and Lak all have Wikipedias (though admittedly for Lak I am the only active contributor). Adyge is almost identical to Kabardino-Circassian, and Adyge speakers probably will never have their own Wikipedia. Balkar is a part of Karachai-Balkar which has a Wikipedia.
Thanks! I've updated database for those which have Wikipedias.
As Russia is fairly developed country, it is likely that reaching people who speak those languages and teaching them how to use Wikimedia projects would the task for WM RU. Besides that, I think that all languages of Russia have writing systems and support in Unicode.
2011/7/1 Milos Rancic millosh@gmail.com:
As Russia is fairly developed country, it is likely that reaching people who speak those languages and teaching them how to use Wikimedia projects would the task for WM RU. Besides that, I think that all languages of Russia have writing systems and support in Unicode.
Actually, a few small languages in Northern and Eastern Russia don't have writing systems, but at least for some of them one is being developed by the government.
And all the current languages of Russia are indeed supported in Unicode, but in a few discussions i had just a couple of weeks ago i learned the shocking truth: While we take Unicode for granted for about a decade, it is not so for quite a lot of people around the globe. In less developed parts of Russia there are still computers with Windows 98 and even earlier, and Unicode support there is poor to non-existent. Maybe in Russia WM-RU can indeed handle this - for example, to organize sending donated second-hand computers to key organizations in these regions (schools, libraries, local newspapers etc.)
This, however, happens in many other countries, some of which need Unicode even more desperately than these Russian regions, and which don't have a chapter. For example, Ethiopia. There the Foundation or other chapters will be able to help. WM-IL, for example, sent second-hand computers pre-installed with Ubuntu and offline Wikipedia to African countries, and maybe other chapters did similar things, too.
Long story short: Unicode support cannot be taken for granted, but something can be done about it.
wikimedia-l@lists.wikimedia.org