I am preparing document for Wikimania. Presently, I am in process of analyzing data (SIL [1], Ethnologue [2], Wikimedia projects). I am using Ethnologue data for population estimates.
Before I started this task, I thought that the situation is not so bad (or good, if it is about possibility for development). I thought that we are around the end of languages with more than 1M of speakers. However, this is far from being true.
There are no Wikipedias in 243 languages with more than 1M of speakers. Of those, 27 have more than 10M of speakers.
The biggest language without any Wikimedia project is Jin Chinese, with 45 millions of speakers.
Around 1 billion of people belong to the group of big languages without Wikipedia (or any Wikimedia project) in their language.
Of those, 480 millions have test projects, but 550 millions don't have even test project; including:
* Jin Chinese, 45M, China * Haryanvi, 38M, India, incubator * Xiang Chinese, 36, China, incubator * Maithili, 34M, India, incubator * Nigerian Pidgin, 30M, Nigeria, incubator * Filipino, 25M, Philippines, incubator * Chhattisgarhi, 17.5M, India, incubator * Rangpuri, 15M, Bangladesh * Seraiki, 13.8M, Pakistan, incubator * Madura, 13.6M, Indonesia, incubator * Haryanvi, 13M, India * Deccan, 12.8M, India * Malvi, 10.4M, India * Min Bei Chinese, 10.3M, China, incubator * Sylheti, 10.3M, Bangladesh
Around 300 millions of people are using languages with less than 1M of speakers which don't have Wikipedia editions.
Note that for all languages in the world Ethnologue gives the number of 6.15 billion, which is pretty accurate, counting that current estimate (according to Wikipedia [3]) is 6.92 billion and that counting speakers is very different from counting official population statistics.
Those are preliminary results. We have two chapters (and strategic focus) in countries of the list above. Inside of the longer list, which should be verified, we have more chapters. I noted that there are even two languages of Germany without Wikipedia, but with more than million of speakers: Mainfränkisch and Upper Saxon (the later one without test Wikipedia).
The list of countries with languages with more than 1M of speakers and without Wikipedia is: Afghanistan, Algeria, Angola, Bangladesh, Benin, Bolivia, Brazil, Burkina Faso, Cameroon, Chad, China, Congo, Côte d’Ivoire, Democratic Republic of the Congo, Ecuador, Egypt, Equatorial Guinea, Eritrea, Ethiopia, Germany, Ghana, Guatemala, Guinea, India, Indonesia (Java and Bali), Indonesia (Kalimantan), Indonesia (Nusa Tenggara), Indonesia (Sulawesi), Indonesia (Sumatra), Iran, Iraq, Jamaica, Jordan, Kenya, Libya, Madagascar, Malawi, Malaysia (Peninsular), Mali, Mauritania, Morocco, Mozambique, Myanmar, Namibia, Niger, Nigeria, Pakistan, Paraguay, Peru, Philippines, Saudi Arabia, Senegal, Serbia, Sierra Leone, Somalia, South Africa, Sudan, Syria, Tanzania, Thailand, Tunisia, Turkey (Asia), Uganda, Viet Nam, Yemen, Zambia, Zimbabwe.
[1] http://www.sil.org/ [2] http://www.ethnologue.com/ [3] http://en.wikipedia.org/wiki/Human_population
Can you also provide at least a rough estimate of the number of humans who don't have Wikipedia in any language they speak reasonably well (understanding very well that this is a hard problem of definition)?
Curious.
On May 22, 2011, at 14:15, Milos Rancic wrote:
I am preparing document for Wikimania. Presently, I am in process of analyzing data (SIL [1], Ethnologue [2], Wikimedia projects). I am using Ethnologue data for population estimates.
Before I started this task, I thought that the situation is not so bad (or good, if it is about possibility for development). I thought that we are around the end of languages with more than 1M of speakers. However, this is far from being true.
There are no Wikipedias in 243 languages with more than 1M of speakers. Of those, 27 have more than 10M of speakers.
The biggest language without any Wikimedia project is Jin Chinese, with 45 millions of speakers.
Around 1 billion of people belong to the group of big languages without Wikipedia (or any Wikimedia project) in their language.
Of those, 480 millions have test projects, but 550 millions don't have even test project; including:
- Jin Chinese, 45M, China
- Haryanvi, 38M, India, incubator
- Xiang Chinese, 36, China, incubator
- Maithili, 34M, India, incubator
- Nigerian Pidgin, 30M, Nigeria, incubator
- Filipino, 25M, Philippines, incubator
- Chhattisgarhi, 17.5M, India, incubator
- Rangpuri, 15M, Bangladesh
- Seraiki, 13.8M, Pakistan, incubator
- Madura, 13.6M, Indonesia, incubator
- Haryanvi, 13M, India
- Deccan, 12.8M, India
- Malvi, 10.4M, India
- Min Bei Chinese, 10.3M, China, incubator
- Sylheti, 10.3M, Bangladesh
Around 300 millions of people are using languages with less than 1M of speakers which don't have Wikipedia editions.
Note that for all languages in the world Ethnologue gives the number of 6.15 billion, which is pretty accurate, counting that current estimate (according to Wikipedia [3]) is 6.92 billion and that counting speakers is very different from counting official population statistics.
Those are preliminary results. We have two chapters (and strategic focus) in countries of the list above. Inside of the longer list, which should be verified, we have more chapters. I noted that there are even two languages of Germany without Wikipedia, but with more than million of speakers: Mainfränkisch and Upper Saxon (the later one without test Wikipedia).
The list of countries with languages with more than 1M of speakers and without Wikipedia is: Afghanistan, Algeria, Angola, Bangladesh, Benin, Bolivia, Brazil, Burkina Faso, Cameroon, Chad, China, Congo, Côte d’Ivoire, Democratic Republic of the Congo, Ecuador, Egypt, Equatorial Guinea, Eritrea, Ethiopia, Germany, Ghana, Guatemala, Guinea, India, Indonesia (Java and Bali), Indonesia (Kalimantan), Indonesia (Nusa Tenggara), Indonesia (Sulawesi), Indonesia (Sumatra), Iran, Iraq, Jamaica, Jordan, Kenya, Libya, Madagascar, Malawi, Malaysia (Peninsular), Mali, Mauritania, Morocco, Mozambique, Myanmar, Namibia, Niger, Nigeria, Pakistan, Paraguay, Peru, Philippines, Saudi Arabia, Senegal, Serbia, Sierra Leone, Somalia, South Africa, Sudan, Syria, Tanzania, Thailand, Tunisia, Turkey (Asia), Uganda, Viet Nam, Yemen, Zambia, Zimbabwe.
[1] http://www.sil.org/ [2] http://www.ethnologue.com/ [3] http://en.wikipedia.org/wiki/Human_population
foundation-l mailing list foundation-l@lists.wikimedia.org Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l
On 05/22/2011 01:22 PM, Denny Vrandecic wrote:
Can you also provide at least a rough estimate of the number of humans who don't have Wikipedia in any language they speak reasonably well (understanding very well that this is a hard problem of definition)?
I would say that the number humans without Wikipedia in any language they speak is below "statistical error": 2% of 7 billions is 140 millions and I don't think that there are more than let's say, 50 millions.
Just very isolated tribes don't know official language of their country and such tribes could be found in Central Africa, New Guinea, Amazon and probably couple of other smaller places in the world.
Speaking in such numbers, 20% of population in Serbia would be fine with English Wikipedia, all inhabitants of former Soviet Union would be fine with Russian Wikipedia, all Native American peoples would be fine with English, Spanish, Portuguese and French Wikipedias. And so on.
However, significant educational problem is attending primary school in non-native language. And 1.3 billion gap counts there.
On Sun, 22 May 2011 14:08:14 +0200, Milos Rancic millosh@gmail.com wrote:
On 05/22/2011 01:22 PM, Denny Vrandecic wrote:
Can you also provide at least a rough estimate of the number of humans who don't have Wikipedia in any language they speak reasonably well (understanding very well that this is a hard problem of definition)?
I would say that the number humans without Wikipedia in any language they speak is below "statistical error": 2% of 7 billions is 140 millions and I don't think that there are more than let's say, 50
millions.
Are there any data available on the literacy? I guess illiterate people can hardlu use Wikipedia, and the number of literate speakers of some of these languages can well be below the million.
Cheers Yaroslav
On 05/22/2011 04:57 PM, Yaroslav M. Blanter wrote:
On Sun, 22 May 2011 14:08:14 +0200, Milos Rancic millosh@gmail.com wrote:
On 05/22/2011 01:22 PM, Denny Vrandecic wrote:
Can you also provide at least a rough estimate of the number of humans who don't have Wikipedia in any language they speak reasonably well (understanding very well that this is a hard problem of definition)?
I would say that the number humans without Wikipedia in any language they speak is below "statistical error": 2% of 7 billions is 140 millions and I don't think that there are more than let's say, 50 millions.
Are there any data available on the literacy? I guess illiterate people can hardlu use Wikipedia, and the number of literate speakers of some of these languages can well be below the million.
Just data (Ethnologue's "Language development" field) for the mentioned languages (without Filipino). Comments tomorrow.
* Jin Chinese, 45M, China; Literacy rate in L2 (I suppose, Standard Chinese): 91%. * Haryanvi, 38M, India, incubator; Literacy rate in L2: 55% Hindi. Dictionary. Bible portions: 2001. * Xiang Chinese, 36, China, incubator; Literacy rate in L2: 91%. * Maithili, 34M, India, incubator; Literacy rate in L1: 25%–50%. Literacy rate in L2: 25%–50%. If they can read Nepali or Hindi, they can read Maithili. The educated read Hindi, Nepali, or English books for pleasure. Some literacy work in India. Poetry. Magazines. Newspapers. Radio programs. Films. TV. Dictionary. Grammar. * Nigerian Pidgin, 30M, Nigeria, incubator; Poetry. Radio programs. TV. Dictionary. Grammar. Bible portions: 1957. * Chhattisgarhi, 17.5M, India, incubator; Poetry. Newspapers. Radio programs. TV. NT: 2005. * Rangpuri, 15M, Bangladesh; Dictionary. Grammar. * Seraiki, 13.8M, Pakistan, incubator; Literacy rate in L1: Below 1%. Literacy rate in L2: 5%–15%. Radio programs. TV. Dictionary. Grammar. NT: 1819. * Madura, 13.6M, Indonesia, incubator; Literacy rate in L2: 40%. Literacy higher among Bangkalon. Grammar. Bible: 1994. * Haryanvi, 13M, India; Literacy rate in L2: 55% Hindi. Dictionary. Bible portions: 2001. * Deccan, 12.8M, India; <nothing> * Malvi, 10.4M, India; Literacy rate in L2: 58% for rural Madhya Pradesh. Government project discontinued due to low response. Poetry. Radio programs. Dictionary. NT: 1826. * Min Bei Chinese, 10.3M, China, incubator; Literacy rate in L2: 91%. NT: 1934. * Sylheti, 10.3M, Bangladesh; Literacy rate in L2: 35%. Educated can read Bengali. Few women are educated. Bible portions: 1993.
Here is the article at Strategy wiki: http://strategy.wikimedia.org/wiki/Missing_Wikipedias
Some important ideas have been mentioned during this discussion. Feel free to add them there.
More data. This is about all living languages, just to get a clue about what is reasonable to do and what is not.
Number of languages and number of speakers for languages categorized by number of speakers (you can get more nice wikitable at [1]):
category: number of languages, total number of speakers 100M+: 17, 2514548848 10M-99M: 78, 2376900757 1M-9M: 303, 950166458 100k-999k: 900, 284119716 10k-99k: 1837, 61223297 1k-9k: 2025, 7823891 100-999: 1039, 460911 10-99: 343, 12664 1-9: 134, 528 all: 6677, 6195257070
[1] http://strategy.wikimedia.org/wiki/Missing_Wikipedias/Languages_and_numbers
Clearly, at least at this point, it is probably unreasonable to target any languages with less than 100,000 native speakers; of course, if there is community interest I think they should get Wikipedias, but the 70 million or so human beings who speak languages with less than 100k speakers are likely to be either 1) fluent in a language with more speakers or 2) live in such isolation that it is unlikely they would use Wikipedia, at least at this point in time.
2011/5/23 Milos Rancic millosh@gmail.com:
More data. This is about all living languages, just to get a clue about what is reasonable to do and what is not.
Number of languages and number of speakers for languages categorized by number of speakers (you can get more nice wikitable at [1]):
category: number of languages, total number of speakers 100M+: 17, 2514548848 10M-99M: 78, 2376900757 1M-9M: 303, 950166458 100k-999k: 900, 284119716 10k-99k: 1837, 61223297 1k-9k: 2025, 7823891 100-999: 1039, 460911 10-99: 343, 12664 1-9: 134, 528 all: 6677, 6195257070
[1] http://strategy.wikimedia.org/wiki/Missing_Wikipedias/Languages_and_numbers
foundation-l mailing list foundation-l@lists.wikimedia.org Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l
On 05/23/2011 12:58 PM, Milos Rancic wrote:
Here is the article at Strategy wiki: http://strategy.wikimedia.org/wiki/Missing_Wikipedias
Some important ideas have been mentioned during this discussion. Feel free to add them there.
Copied from [1]
I've started to categorize languages according to some principles [2]. (Task is presently far from being done.) Some of the are exact, some of them are arbitrary categories: * Does the language have any Wikimedia test project (usually at Incubator) or not? If yes, that usually shows that there is interest in creating that project. We should see how things are going there, do they need our help and which kind of help they need. * If literacy is low and there are no efforts to improve it, efforts should go that way. * Is it about the language without writing system? If yes, efforts should go that way. * Does the language have Wikimedia project in a "macrolanguage"? That likely means that speakers would be happy to use their macrolanguage project or that they are already using it. However, it doesn't necessarily mean that and should be checked. We have a number of macrolanguage editions which cover probably hundred of "individual" languages. * Special cases. Up to now, there are three categories (described inside of that section): ** "Macrolanguage" is widely used. (Arabic languages, Mongolian languages.) ** Writing system gives de facto literacy in L1 if L2 is known. (Chinese languages.) ** Language is spoken in well developed areas of the world by non-endangered population. It is assumed that population want or doesn't want Wikimedia projects because of their internal reasons. Examples are Mainfränkisch and Albanian Gheg (with Incubator project) and Upper Saxon (without Incubator project). * Languages not inside of any of the category above should be the second priority (after those with test projects). It is likely about languages spoken by people without [good] Internet connection or some other reasons. Every case should be analyzed separately. If it is about low Internet penetration, then we should create alternative ways of reaching that population and instructing them how to create and edit wiki encyclopedia in their language.
Feel free to add your ideas here or at Strategy wiki.
[1] http://strategy.wikimedia.org/wiki/Talk:Missing_Wikipedias#Language_categori... [2] http://strategy.wikimedia.org/wiki/Missing_Wikipedias
Having a test project doesn't necessarily mean community interest; having a test project with dozens of articles might indicate that. In many cases, users with no relation to the language (in particular, User:Jose77) have started test wikis with text all in English for hundreds of languages. So the fact that we have an Adangme Test-wp http://incubator.wikimedia.org/wiki/Wp/ada/Main_Page created by Jose77 means absolutely nothing about community interest or access.
2011/5/26 Milos Rancic millosh@gmail.com:
On 05/23/2011 12:58 PM, Milos Rancic wrote:
Here is the article at Strategy wiki: http://strategy.wikimedia.org/wiki/Missing_Wikipedias
Some important ideas have been mentioned during this discussion. Feel free to add them there.
Copied from [1]
I've started to categorize languages according to some principles [2]. (Task is presently far from being done.) Some of the are exact, some of them are arbitrary categories:
- Does the language have any Wikimedia test project (usually at
Incubator) or not? If yes, that usually shows that there is interest in creating that project. We should see how things are going there, do they need our help and which kind of help they need.
- If literacy is low and there are no efforts to improve it, efforts
should go that way.
- Is it about the language without writing system? If yes, efforts
should go that way.
- Does the language have Wikimedia project in a "macrolanguage"? That
likely means that speakers would be happy to use their macrolanguage project or that they are already using it. However, it doesn't necessarily mean that and should be checked. We have a number of macrolanguage editions which cover probably hundred of "individual" languages.
- Special cases. Up to now, there are three categories (described inside
of that section): ** "Macrolanguage" is widely used. (Arabic languages, Mongolian languages.) ** Writing system gives de facto literacy in L1 if L2 is known. (Chinese languages.) ** Language is spoken in well developed areas of the world by non-endangered population. It is assumed that population want or doesn't want Wikimedia projects because of their internal reasons. Examples are Mainfränkisch and Albanian Gheg (with Incubator project) and Upper Saxon (without Incubator project).
- Languages not inside of any of the category above should be the second
priority (after those with test projects). It is likely about languages spoken by people without [good] Internet connection or some other reasons. Every case should be analyzed separately. If it is about low Internet penetration, then we should create alternative ways of reaching that population and instructing them how to create and edit wiki encyclopedia in their language.
Feel free to add your ideas here or at Strategy wiki.
[1] http://strategy.wikimedia.org/wiki/Talk:Missing_Wikipedias#Language_categori... [2] http://strategy.wikimedia.org/wiki/Missing_Wikipedias
foundation-l mailing list foundation-l@lists.wikimedia.org Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l
On 05/26/2011 07:30 PM, M. Williamson wrote:
Having a test project doesn't necessarily mean community interest; having a test project with dozens of articles might indicate that. In many cases, users with no relation to the language (in particular, User:Jose77) have started test wikis with text all in English for hundreds of languages. So the fact that we have an Adangme Test-wp http://incubator.wikimedia.org/wiki/Wp/ada/Main_Page created by Jose77 means absolutely nothing about community interest or access.
Statistically, that's useful information; which should be checked, of course.
On 5/26/11 3:05 PM, Milos Rancic wrote:
- If literacy is low and there are no efforts to improve it, efforts
should go that way.
- Is it about the language without writing system? If yes, efforts
should go that way.
I guess I'd say what efforts "should" be taken should depend on what the language speakers in question want from us. Are there communities who would like to create a Wikipedia in their language but cannot, for some reason that we can assist with? Have they requested our assistance? Are there other organizations based in the relevant regions that already have opinions on and projects in areas like literacy, internet access, writing systems, etc.? (If not, is it because of lack of interest, political difficulties, or some other reason?)
I'm a bit worried that a more centralized approach will repeat the old pattern of Europeans and Americans telling people in other parts of the world what they ought to do with their languages/culture/education. The 19th-century missionaries who inventing writing systems to translate their materials and "enlighten the natives" (a legacy Ethnologue is associated with) doesn't seem like what we'd want to repeat. I'd much rather see us work with initiatives and organizations originated by people who are actually from a community of language speakers. Plus, Wikimedia as a generic global NGO dedicated to education/literacy/development/internetaccess/etc. would be a significant mission-creep away from things we're actually good at. (All "imo", of course.)
-Mark
On 05/28/2011 05:02 PM, Mark wrote:
On 5/26/11 3:05 PM, Milos Rancic wrote:
- If literacy is low and there are no efforts to improve it, efforts
should go that way.
- Is it about the language without writing system? If yes, efforts
should go that way.
I guess I'd say what efforts "should" be taken should depend on what the language speakers in question want from us. Are there communities who would like to create a Wikipedia in their language but cannot, for some reason that we can assist with? Have they requested our assistance? Are there other organizations based in the relevant regions that already have opinions on and projects in areas like literacy, internet access, writing systems, etc.? (If not, is it because of lack of interest, political difficulties, or some other reason?)
I'm a bit worried that a more centralized approach will repeat the old pattern of Europeans and Americans telling people in other parts of the world what they ought to do with their languages/culture/education. The 19th-century missionaries who inventing writing systems to translate their materials and "enlighten the natives" (a legacy Ethnologue is associated with) doesn't seem like what we'd want to repeat. I'd much rather see us work with initiatives and organizations originated by people who are actually from a community of language speakers. Plus, Wikimedia as a generic global NGO dedicated to education/literacy/development/internetaccess/etc. would be a significant mission-creep away from things we're actually good at. (All "imo", of course.)
Foundations of Wikimedia projects lays in volunteerism. If you want some volunteer project to work, you have to make it attractive to contributors. Attractiveness doesn't go with enforcing anything. So, if some group doesn't want anything from us, I don't think that we could do anything about it.
Whenever we talked about this issues during the LangCom meeting, we agreed that we shouldn't do just take it or leave it approach, but we agree that in any case we should work according to the wishes of local population.
The most of people usually want to adopt technology. The most of people usually don't want to adopt culture. Having in mind the fact that technology usually went with colonial and racist culture, they are completely right. However, there are necessary levels of culture needed for particular technology.
For example, if some group of people want to have doctors, they have to have education at least up to the high school level, no matter in which language. If they don't want to have literacy, but want to have doctors, they would probably have to be content with not so efficient druids. I am perfectly fine with it and Wikimedia movement should accept it.
In other words, nothing will be enforced and we fully understand all (or to be cautions: the most) of those issues.
Hoi, Having projects that rely on volunteers does not mean that what they do is without consequence. The policy of the language committee insists on requirements. They have to be met. One of the things we consider is the involvement of native speakers ... In the language committee we are volunteers as well. What we do is relevant, it is necessary and requirements are requirements and they have to be met. Thanks, GerardM
On 28 May 2011 18:59, Milos Rancic millosh@gmail.com wrote:
On 05/28/2011 05:02 PM, Mark wrote:
On 5/26/11 3:05 PM, Milos Rancic wrote:
- If literacy is low and there are no efforts to improve it, efforts
should go that way.
- Is it about the language without writing system? If yes, efforts
should go that way.
I guess I'd say what efforts "should" be taken should depend on what the language speakers in question want from us. Are there communities who would like to create a Wikipedia in their language but cannot, for some reason that we can assist with? Have they requested our assistance? Are there other organizations based in the relevant regions that already have opinions on and projects in areas like literacy, internet access, writing systems, etc.? (If not, is it because of lack of interest, political difficulties, or some other reason?)
I'm a bit worried that a more centralized approach will repeat the old pattern of Europeans and Americans telling people in other parts of the world what they ought to do with their languages/culture/education. The 19th-century missionaries who inventing writing systems to translate their materials and "enlighten the natives" (a legacy Ethnologue is associated with) doesn't seem like what we'd want to repeat. I'd much rather see us work with initiatives and organizations originated by people who are actually from a community of language speakers. Plus, Wikimedia as a generic global NGO dedicated to education/literacy/development/internetaccess/etc. would be a significant mission-creep away from things we're actually good at. (All "imo", of course.)
Foundations of Wikimedia projects lays in volunteerism. If you want some volunteer project to work, you have to make it attractive to contributors. Attractiveness doesn't go with enforcing anything. So, if some group doesn't want anything from us, I don't think that we could do anything about it.
Whenever we talked about this issues during the LangCom meeting, we agreed that we shouldn't do just take it or leave it approach, but we agree that in any case we should work according to the wishes of local population.
The most of people usually want to adopt technology. The most of people usually don't want to adopt culture. Having in mind the fact that technology usually went with colonial and racist culture, they are completely right. However, there are necessary levels of culture needed for particular technology.
For example, if some group of people want to have doctors, they have to have education at least up to the high school level, no matter in which language. If they don't want to have literacy, but want to have doctors, they would probably have to be content with not so efficient druids. I am perfectly fine with it and Wikimedia movement should accept it.
In other words, nothing will be enforced and we fully understand all (or to be cautions: the most) of those issues.
foundation-l mailing list foundation-l@lists.wikimedia.org Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l
On Sun, May 22, 2011 at 4:15 AM, Milos Rancic millosh@gmail.com wrote:
I am preparing document for Wikimania. Presently, I am in process of analyzing data (SIL [1], Ethnologue [2], Wikimedia projects). I am using Ethnologue data for population estimates.
Before I started this task, I thought that the situation is not so bad (or good, if it is about possibility for development). I thought that we are around the end of languages with more than 1M of speakers. However, this is far from being true.
There are no Wikipedias in 243 languages with more than 1M of speakers. Of those, 27 have more than 10M of speakers.
The biggest language without any Wikimedia project is Jin Chinese, with 45 millions of speakers.
Around 1 billion of people belong to the group of big languages without Wikipedia (or any Wikimedia project) in their language.
Of those, 480 millions have test projects, but 550 millions don't have even test project; including:
- Jin Chinese, 45M, China
- Haryanvi, 38M, India, incubator
- Xiang Chinese, 36, China, incubator
- Maithili, 34M, India, incubator
- Nigerian Pidgin, 30M, Nigeria, incubator
- Filipino, 25M, Philippines, incubator
- Chhattisgarhi, 17.5M, India, incubator
- Rangpuri, 15M, Bangladesh
- Seraiki, 13.8M, Pakistan, incubator
- Madura, 13.6M, Indonesia, incubator
- Haryanvi, 13M, India
- Deccan, 12.8M, India
- Malvi, 10.4M, India
- Min Bei Chinese, 10.3M, China, incubator
- Sylheti, 10.3M, Bangladesh
Around 300 millions of people are using languages with less than 1M of speakers which don't have Wikipedia editions.
Note that for all languages in the world Ethnologue gives the number of 6.15 billion, which is pretty accurate, counting that current estimate (according to Wikipedia [3]) is 6.92 billion and that counting speakers is very different from counting official population statistics.
Those are preliminary results. We have two chapters (and strategic focus) in countries of the list above. Inside of the longer list, which should be verified, we have more chapters. I noted that there are even two languages of Germany without Wikipedia, but with more than million of speakers: Mainfränkisch and Upper Saxon (the later one without test Wikipedia).
The list of countries with languages with more than 1M of speakers and without Wikipedia is: Afghanistan, Algeria, Angola, Bangladesh, Benin, Bolivia, Brazil, Burkina Faso, Cameroon, Chad, China, Congo, Côte d’Ivoire, Democratic Republic of the Congo, Ecuador, Egypt, Equatorial Guinea, Eritrea, Ethiopia, Germany, Ghana, Guatemala, Guinea, India, Indonesia (Java and Bali), Indonesia (Kalimantan), Indonesia (Nusa Tenggara), Indonesia (Sulawesi), Indonesia (Sumatra), Iran, Iraq, Jamaica, Jordan, Kenya, Libya, Madagascar, Malawi, Malaysia (Peninsular), Mali, Mauritania, Morocco, Mozambique, Myanmar, Namibia, Niger, Nigeria, Pakistan, Paraguay, Peru, Philippines, Saudi Arabia, Senegal, Serbia, Sierra Leone, Somalia, South Africa, Sudan, Syria, Tanzania, Thailand, Tunisia, Turkey (Asia), Uganda, Viet Nam, Yemen, Zambia, Zimbabwe.
Good work generally, but regarding this last list...
Afghanistan has many languages in use (Pashto, Tajik, Hazara, Uzbek); Algeria uses Arabic, Berber, and French; Jordan's official language is Arabic (though the spoken one is a dialect); and generally so forth.
Can you break this out by which languages we are missing, not just by country, as country isn't specific enough?
Thank you.
On 05/22/2011 01:28 PM, George Herbert wrote:
Good work generally, but regarding this last list...
Afghanistan has many languages in use (Pashto, Tajik, Hazara, Uzbek); Algeria uses Arabic, Berber, and French; Jordan's official language is Arabic (though the spoken one is a dialect); and generally so forth.
Can you break this out by which languages we are missing, not just by country, as country isn't specific enough?
Thank you.
Here is the table with all missing languages with more than 1M of speakers. See notes about usage (especially in the case of Arabic languages), as well as my reply to Denny about importance of native languages in primary education. Note also that we have a number of incubator projects in Arabic languages.
If any of you find some factual problem, please let me know.
It is likely that I'll put the complete list at Meta in the future.
population ISO 639-3 code ISO 639-1 code language name Wikimedia projects country region writing system classification 45000000 cjy Chinese, Jinyu China Mainly in Shanxi Province; some in Shaanxi and Henan provinces. Han script, Simplified variant. Han script, Traditional variant. Sino-Tibetan, Chinese http://www.ethnologue.com/show_lang_family.asp?code=cjy A member of macrolanguage Chinese [zho http://www.ethnologue.com/show_language.asp?code=zho] (China http://www.ethnologue.com/show_country.asp?name=CN). 38261000 awa Awadhi incubator-wikipedia India Uttar Pradesh, Kheri, Sitapur, Lucknow, Unnao, Rae-Bareli, Bahraich, Bara-Banki, Pratapgarh, Sultanpur, Gonda, Faizabad, Allahabad districts; Bihar; Madhya Pradesh; Delhi. Also in Nepal. Devanagari script. Indo-European, Indo-Iranian, Indo-Aryan, East Central zone http://www.ethnologue.com/show_lang_family.asp?code=awa 36024400 hsn Chinese, Xiang incubator-wikipedia China Hunan Province, Sichuan Province, over 20 counties; parts of Guangxi and Guangdong provinces. Also in United States. Han script, Simplified variant. Han script, Traditional variant. Sino-Tibetan, Chinese http://www.ethnologue.com/show_lang_family.asp?code=hsn A member of macrolanguage Chinese [zho http://www.ethnologue.com/show_language.asp?code=zho] (China http://www.ethnologue.com/show_country.asp?name=CN). 34700000 mai Maithili incubator-wikipedia India Bihar, Muzaffarpur on west, past Kosi east to west Purnia District, to Munger, Bhagalpur districts south, and Himalayan foothills north; Delhi, Calcutta, Mumbai. Many settled abroad. Cultural and linguistic centers are Madhubani and Darbhanga towns. Janakpur also important culturally and religiously. Also in Nepal. Devanagari script. Indo-European, Indo-Iranian, Indo-Aryan, Eastern zone, Bihari http://www.ethnologue.com/show_lang_family.asp?code=mai 30000000 pcm Pidgin, Nigerian incubator-wikipedia Nigeria Southern states; northern states in Sabon Garis; coastal and urban areas. Creole, English based, Atlantic, Krio http://www.ethnologue.com/show_lang_family.asp?code=pcm 25000000 fil Filipino incubator-wiktionary,incubator-wikipedia Philippines Widespread. Latin script. Austronesian, Malayo-Polynesian, Philippine, Greater Central Philippine, Central Philippine, Tagalog http://www.ethnologue.com/show_lang_family.asp?code=fil 22397000 arq Arabic, Algerian Spoken incubator-wikipedia Algeria Also in Belgium, Egypt, France, Germany, Saint Pierre and Miquelon. Arabic script. Afro-Asiatic, Semitic, Central, South, Arabic http://www.ethnologue.com/show_lang_family.asp?code=arq A member of macrolanguage Arabic [ara http://www.ethnologue.com/show_language.asp?code=ara] (Saudi Arabia http://www.ethnologue.com/show_country.asp?name=SA). 21048900 ary Arabic, Moroccan Spoken incubator-wikipedia Morocco North, south of Atlas Mountains, including Sahara port cities. Also in Belgium, Egypt, France, Germany, Gibraltar, Libya, Netherlands, United Kingdom, Western Sahara. Arabic script. Afro-Asiatic, Semitic, Central, South, Arabic http://www.ethnologue.com/show_lang_family.asp?code=ary A member of macrolanguage Arabic [ara http://www.ethnologue.com/show_language.asp?code=ara] (Saudi Arabia http://www.ethnologue.com/show_country.asp?name=SA). 19000000 aec Arabic, Sa’idi Spoken Egypt Cairo south edge to Sudan border. Middle Egypt in Bani Sweef, Fayyuum, and Gizeh; Upper Egypt from Asyuut to Edfu and south. Afro-Asiatic, Semitic, Central, South, Arabic http://www.ethnologue.com/show_lang_family.asp?code=aec A member of macrolanguage Arabic [ara http://www.ethnologue.com/show_language.asp?code=ara] (Saudi Arabia http://www.ethnologue.com/show_country.asp?name=SA). 17500000 hne Chhattisgarhi incubator-wikipedia India Chhattisgarh; Bihar; Orissa; possibly in Maharashtra, Uttar Pradesh, and Tripura. Devanagari script. Indo-European, Indo-Iranian, Indo-Aryan, East Central zone http://www.ethnologue.com/show_lang_family.asp?code=hne 16833000 apd Arabic, Sudanese Spoken Sudan Primarily north. Also in Egypt, Eritrea, Ethiopia, Libya, Saudi Arabia. Arabic script. Latin script. Afro-Asiatic, Semitic, Central, South, Arabic http://www.ethnologue.com/show_lang_family.asp?code=apd A member of macrolanguage Arabic [ara http://www.ethnologue.com/show_language.asp?code=ara] (Saudi Arabia http://www.ethnologue.com/show_country.asp?name=SA). 15100000 acm Arabic, Mesopotamian Spoken Iraq Tigris and Euphrates area. Also in Iran, Jordan, Syria, Turkey (Asia). Afro-Asiatic, Semitic, Central, South, Arabic http://www.ethnologue.com/show_lang_family.asp?code=acm A member of macrolanguage Arabic [ara http://www.ethnologue.com/show_language.asp?code=ara] (Saudi Arabia http://www.ethnologue.com/show_country.asp?name=SA). 15000000 tts Thai, Northeastern incubator-wikipedia Thailand Northeast; 17 provinces. Kalerng in Sakon Nakhon and Nakhon Phanom. Thai script. Tai-Kadai, Kam-Tai, Be-Tai, Tai-Sek, Tai, Southwestern, Lao-Phutai http://www.ethnologue.com/show_lang_family.asp?code=tts 15000000 rkt Rangpuri Bangladesh Rajshahi Division north from Bogra, also known as the greater Dinajpur and Rangpur areas, now subdivided into Rangpur, Lalmonihat, Nilphamari, Gaibanda, Panchagar, Thakurgaon, and Dinajpur districts. Also in India. Bengali script. Kamtapura script, may be in use in Koch Bihar. Indo-European, Indo-Iranian, Indo-Aryan, Eastern zone, Bengali-Assamese http://www.ethnologue.com/show_lang_family.asp?code=rkt 14426540 apc Arabic, North Levantine Spoken incubator-wikipedia Syria Also in Antigua and Barbuda, Argentina, Belize, Cyprus, Dominican Republic, Egypt, French Guiana, Israel, Jamaica, Lebanon, Mali, Puerto Rico, Suriname, Trinidad and Tobago, Turkey (Asia). Arabic script. Afro-Asiatic, Semitic, Central, South, Arabic http://www.ethnologue.com/show_lang_family.asp?code=apc A member of macrolanguage Arabic [ara http://www.ethnologue.com/show_language.asp?code=ara] (Saudi Arabia http://www.ethnologue.com/show_country.asp?name=SA). 13820000 skr Seraiki incubator-wikipedia Pakistan South Punjab and north Sind, Indus River valley, Jampur area. Derawali in Dera Ismail Khan, Tank, Bannu, and Dera Ghazi Khan. Jangli is in Sahiwal area. Also in India, United Kingdom. Arabic script. Indo-European, Indo-Iranian, Indo-Aryan, Northwestern zone, Lahnda http://www.ethnologue.com/show_lang_family.asp?code=skr A member of macrolanguage Lahnda [lah http://www.ethnologue.com/show_language.asp?code=lah] (Pakistan http://www.ethnologue.com/show_country.asp?name=PK). 13600900 mad Madura incubator-wikipedia Indonesia (Java and Bali) North coastal area of east Java, Sapudi Islands, Madura Island. Also in Singapore. Latin script. Austronesian, Malayo-Polynesian, Malayo-Sumbawan, Madurese http://www.ethnologue.com/show_lang_family.asp?code=mad 13000000 mag Magahi India Bihar, Gaya, Bhagalpur, eastern Patna districts; Jharkhand, northern Chotanagpur Division, Hazaribagh District; West Bengal, Maldah District. Devanagari script. Indo-European, Indo-Iranian, Indo-Aryan, Eastern zone, Bihari http://www.ethnologue.com/show_lang_family.asp?code=mag 13000000 ctg Chittagonian Bangladesh Chittagong region. Arabic script. Latin script. Indo-European, Indo-Iranian, Indo-Aryan, Eastern zone, Bengali-Assamese http://www.ethnologue.com/show_lang_family.asp?code=ctg 13000000 bgc Haryanvi India Haryana; Rajasthan; Punjab; Karnataka; Delhi; Himachal Pradesh; Uttar Pradesh. Devanagari script. Indo-European, Indo-Iranian, Indo-Aryan, Central zone, Western Hindi, Unclassified http://www.ethnologue.com/show_lang_family.asp?code=bgc 12800000 dcc Deccan India Central Maharashtra, Deccan Plateau; Karnataka, Belgaum, Bijapur districts; Madhya Pradesh, Raisen, Sehore districts; Gujarat. Indo-European, Indo-Iranian, Indo-Aryan, Southern zone, Unclassified http://www.ethnologue.com/show_lang_family.asp?code=dcc 10400000 mup Malvi India Madhya Pradesh, Ujjain, Indore, Rathlam, Mandsaur, Rajgarh, Dewas, Shajapur, Nimuch, Sehore, Dhar, Bhopal districts; Rajasthan, Jhalawar District. Sondwari dialect geographically isolated from the others. Devanagari script. Indo-European, Indo-Iranian, Indo-Aryan, Central zone, Rajasthani, Unclassified http://www.ethnologue.com/show_lang_family.asp?code=mup A member of macrolanguage Rajasthani [raj http://www.ethnologue.com/show_language.asp?code=raj] (India http://www.ethnologue.com/show_country.asp?name=IN). 10304000 mnp Chinese, Min Bei incubator-wikipedia China North Fujian Province, 7 counties around Jian’ou. Also in Singapore. Sino-Tibetan, Chinese http://www.ethnologue.com/show_lang_family.asp?code=mnp A member of macrolanguage Chinese [zho http://www.ethnologue.com/show_language.asp?code=zho] (China http://www.ethnologue.com/show_country.asp?name=CN). 10300000 syl Sylheti Bangladesh Districts of Sylhet, Sunamganj, Habiganj, Moulvibazar. Also in Australia, Canada, India, Italy, Malaysia (Peninsular), Myanmar, Singapore, United Kingdom, United States. Bengali script. Latin script. Syloti Nagri script. Indo-European, Indo-Iranian, Indo-Aryan, Eastern zone, Bengali-Assamese http://www.ethnologue.com/show_lang_family.asp?code=syl 10296000 zlm Malay Malaysia (Peninsular) Widespread in Peninsular Malaysia, parts of Sarawak. Also in Canada, Indonesia (Sumatra), Myanmar, Singapore, United Arab Emirates, United States. Arabic script. Latin script. Austronesian, Malayo-Polynesian, Malayo-Sumbawan, North and East, Malayic, Malay http://www.ethnologue.com/show_lang_family.asp?code=zlm A member of macrolanguage Malay [msa http://www.ethnologue.com/show_language.asp?code=msa] (Malaysia http://www.ethnologue.com/show_country.asp?name=MY). 9977000 ars Arabic, Najdi Spoken Saudi Arabia Also in Canada, Iraq, Jordan, Kuwait, Syria, United States. Afro-Asiatic, Semitic, Central, South, Arabic http://www.ethnologue.com/show_lang_family.asp?code=ars A member of macrolanguage Arabic [ara http://www.ethnologue.com/show_language.asp?code=ara] (Saudi Arabia http://www.ethnologue.com/show_country.asp?name=SA). 9500000 bjj Kanauji India Uttar Pradesh, Kanpur, Farrukhabad, Etawah, Hardoi, Shahjahanpur, Pilibhit, Mainpuri, Auraiya districts. Devanagari script. Indo-European, Indo-Iranian, Indo-Aryan, Central zone, Western Hindi, Unclassified http://www.ethnologue.com/show_lang_family.asp?code=bjj 9406900 aeb Arabic, Tunisian Spoken Tunisia Also in Belgium, France, Germany, Libya. Arabic script. Afro-Asiatic, Semitic, Central, South, Arabic http://www.ethnologue.com/show_lang_family.asp?code=aeb A member of macrolanguage Arabic [ara http://www.ethnologue.com/show_language.asp?code=ara] (Saudi Arabia http://www.ethnologue.com/show_country.asp?name=SA). 9320240 kmr Kurdish, Northern incubator-wikipedia Turkey (Asia) Hakkari, Siirt, Mardin, Agri, Diyarbakir, Bitlis, Bingol, Van, Adiyaman, and Mus, most; also Urfa, Kars, Tunceli, Malatya, Erzurum, Marash, Sivas, and other provinces; communities in central Turkey (Cankiri, Cihanbeyli, near Konya); many in large cities in the west, including Istanbul, Adana, Ankara, Izmir. Also in Afghanistan, Armenia, Australia, Austria, Azerbaijan, Bahrain, Belgium, Canada, Denmark, Finland, France, Georgia, Germany, Greece, Iran, Iraq, Italy, Jordan, Kazakhstan, Kuwait, Kyrgyzstan, Lebanon, Netherlands, Norway, Russian Federation (Europe), Sweden, Switzerland, Syria, Turkmenistan, United Kingdom, United States. Arabic script. Cyrillic script, used in Armenia. Latin script, developed in 1932. Indo-European, Indo-Iranian, Iranian, Western, Northwestern, Kurdish http://www.ethnologue.com/show_lang_family.asp?code=kmr A member of macrolanguage Kurdish [kur http://www.ethnologue.com/show_language.asp?code=kur] (Iraq http://www.ethnologue.com/show_country.asp?name=IQ). 9000000 dhd Dhundari India Rajasthan, Jaipur, Dausa, Tonk districts. Possibly in Bundi, Kota, Kishangarh, Ajmer, Jhalawar, northern Karauli, Sawai Madhopur districts. Indo-European, Indo-Iranian, Indo-Aryan, Central zone, Rajasthani, Marwari http://www.ethnologue.com/show_lang_family.asp?code=dhd A member of macrolanguage Marwari [mwr http://www.ethnologue.com/show_language.asp?code=mwr] (India http://www.ethnologue.com/show_country.asp?name=IN). 7760000 bfy Bagheli India Northeast Madhya Pradesh, Rewa, Satna, Sidhi, Shahdol, Umaria, Anuppur, Jabalpur, Mandla, Chhindwara, Dindori, Panna districts; Uttar Pradesh, Allahabad, Mirzapur, Banda, Hamirpur districts; Chhattisgarh, Bilaspur and Koriya districts. Also in Nepal. Devanagari script. Indo-European, Indo-Iranian, Indo-Aryan, East Central zone http://www.ethnologue.com/show_lang_family.asp?code=bfy 7600000 ayn Arabic, Sanaani Spoken Yemen Extends as far south as Dhamar. Afro-Asiatic, Semitic, Central, South, Arabic http://www.ethnologue.com/show_lang_family.asp?code=ayn A member of macrolanguage Arabic [ara http://www.ethnologue.com/show_language.asp?code=ara] (Saudi Arabia http://www.ethnologue.com/show_country.asp?name=SA). 7078500 acq Arabic, Ta’izzi-Adeni Spoken Yemen All provinces except the 2 east and the northeast ones. Probably a few in United Arab Emirates and Saudi Arabia. Also in Djibouti, Egypt, Eritrea, Kenya, Libya, Somalia, United Kingdom. Afro-Asiatic, Semitic, Central, South, Arabic http://www.ethnologue.com/show_lang_family.asp?code=acq A member of macrolanguage Arabic [ara http://www.ethnologue.com/show_language.asp?code=ara] (Saudi Arabia http://www.ethnologue.com/show_country.asp?name=SA). 6970000 vah Varhadi-Nagpuri India Maharashtra, Amravati, Buldana, Akola districts; Madhya Pradesh, Chhindwara and Balaghat districts; Andhra Pradesh, Adilabad and Nizamabad districts. Indo-European, Indo-Iranian, Indo-Aryan, Southern zone, Unclassified http://www.ethnologue.com/show_lang_family.asp?code=vah 6300000 lua Luba-Kasai Democratic Republic of the Congo Widespread in Kasaï Occidental and Kasaï Oriental provinces. Latin script. Niger-Congo, Atlantic-Congo, Volta-Congo, Benue-Congo, Bantoid, Southern, Narrow Bantu, Central, L, Luba (L.30) http://www.ethnologue.com/show_lang_family.asp?code=lua 6300000 ayp Arabic, North Mesopotamian Spoken Iraq Tigris, part of the Euphrates valleys north of Baghdad. Also in Jordan, Syria, Turkey (Asia). Afro-Asiatic, Semitic, Central, South, Arabic http://www.ethnologue.com/show_lang_family.asp?code=ayp A member of macrolanguage Arabic [ara http://www.ethnologue.com/show_language.asp?code=ara] (Saudi Arabia http://www.ethnologue.com/show_country.asp?name=SA). 6200000 ajp Arabic, South Levantine Spoken incubator-wikipedia Jordan Also in Argentina, Egypt, Israel, Kuwait, Libya, Palestinian West Bank and Gaza, Puerto Rico, Syria. Afro-Asiatic, Semitic, Central, South, Arabic http://www.ethnologue.com/show_lang_family.asp?code=ajp A member of macrolanguage Arabic [ara http://www.ethnologue.com/show_language.asp?code=ara] (Saudi Arabia http://www.ethnologue.com/show_country.asp?name=SA). 6170900 sat Santali incubator-wikipedia India Bihar, Bhagalpur, Munger districts; Jharkhand, Manbhum, Hazaribagh districts, Orissa, Balasore District; West Bengal, Birbhum, Bankura districts; Assam; Mizoram; Tripura. Also in Bangladesh, Bhutan, Nepal. Bengali script. Devanagari script. Latin script, used in Bangladesh. Ol Chiki (Ol Cemet’, Ol, Santali) script. Oriya script. Austro-Asiatic, Munda, North Munda, Kherwari, Santali http://www.ethnologue.com/show_lang_family.asp?code=sat 6023900 acw Arabic, Hijazi Spoken Saudi Arabia Red Sea coast and adjacent highlands. Also in Eritrea. Afro-Asiatic, Semitic, Central, South, Arabic http://www.ethnologue.com/show_lang_family.asp?code=acw A member of macrolanguage Arabic [ara http://www.ethnologue.com/show_language.asp?code=ara] (Saudi Arabia http://www.ethnologue.com/show_country.asp?name=SA). 6009400 nod Thai, Northern incubator-wikipedia Thailand Chiang Mai, Chiang Rai, Lamphun, Lampang, Maehongson, Hot, Nan, Phayao, Phrae, Uttaradit, Tak provinces. Also in Laos. Lanna (Tai Tham) script, most are not literate in this Old Northern script. Thai script. Tai-Kadai, Kam-Tai, Be-Tai, Tai-Sek, Tai, Southwestern, East Central, Chiang Saeng http://www.ethnologue.com/show_lang_family.asp?code=nod 6000000 lmn Lambadi India Andhra Pradesh; Madhya Pradesh; Himachal Pradesh; Gujarat; Tamil Nadu; Maharashtra; Karnataka; Orissa; West Bengal. Devanagari script. Kannada script. Telugu script. Indo-European, Indo-Iranian, Indo-Aryan, Central zone, Rajasthani, Unclassified http://www.ethnologue.com/show_lang_family.asp?code=lmn 5770000 hil Hiligaynon incubator-wikipedia Philippines Iloilo and Capiz provinces, Panay, Negros Occidental, Visayas. Also in United States. Latin script. Austronesian, Malayo-Polynesian, Philippine, Greater Central Philippine, Central Philippine, Bisayan, Central, Peripheral http://www.ethnologue.com/show_lang_family.asp?code=hil 5622600 rwr Marwari India Rajasthan, Jodhpur, Jaisalmer, Barmer, Bikaner, Churu, Pali, Jalore districts; Gujarat; Madhya Pradesh; Punjab; Delhi; Haryana; Uttar Pradesh; thoughout India. Also in Nepal, Pakistan. Devanagari script. Indo-European, Indo-Iranian, Indo-Aryan, Central zone, Rajasthani, Marwari http://www.ethnologue.com/show_lang_family.asp?code=rwr A member of macrolanguage Marwari [mwr http://www.ethnologue.com/show_language.asp?code=mwr] (India http://www.ethnologue.com/show_country.asp?name=IN). 5530000 min Minangkabau incubator-wikipedia Indonesia (Sumatra) Widespread in the Indonesian Archipelago; west central Sumatra, Padang area. Nearly half live outside central Sumatra; South Sumatra, west coast Mukomuko area. Latin script. Austronesian, Malayo-Polynesian, Malayo-Sumbawan, North and East, Malayic, Malay http://www.ethnologue.com/show_lang_family.asp?code=min 5430000 suk Sukuma Tanzania Northwest, between Lake Victoria and Lake Rukwa, Shinyanga to Serengeti Plain (Kiya); also Mwanza (Gwe). Few in cities; 88% in the traditional area. Latin script. Niger-Congo, Atlantic-Congo, Volta-Congo, Benue-Congo, Bantoid, Southern, Narrow Bantu, Central, F, Sukuma-Nyamwezi (F.20) http://www.ethnologue.com/show_lang_family.asp?code=suk 5061700 mos Mòoré Burkina Faso Central Ouagadougou area; widespread. Also in Benin, Côte d’Ivoire, Ghana, Mali, Senegal, Togo. Latin script. Niger-Congo, Atlantic-Congo, Volta-Congo, North, Gur, Central, Northern, Oti-Volta, Western, Northwest http://www.ethnologue.com/show_lang_family.asp?code=mos 5000000 wtm Mewati India Rajasthan, Alwar, Bharatpur, Dholpur districts; Uttar Pradesh, Madhura District; Haryana, Gurgaon, Faridabad districts. Indo-European, Indo-Iranian, Indo-Aryan, Central zone, Unclassified http://www.ethnologue.com/show_lang_family.asp?code=wtm 5000000 kng Koongo Democratic Republic of the Congo Bas-Congo Province cataract, Mbanza Manteke area; Fioti north of Boma, and scattered along Congo River from Brazzaville to its mouth. Also in Angola, Congo. Latin script. Niger-Congo, Atlantic-Congo, Volta-Congo, Benue-Congo, Bantoid, Southern, Narrow Bantu, Central, H, Kongo (H.10) http://www.ethnologue.com/show_lang_family.asp?code=kng A member of macrolanguage Kongo [kon http://www.ethnologue.com/show_language.asp?code=kon] (Democratic Republic of the Congo http://www.ethnologue.com/show_country.asp?name=CD). 4910000 vmf Mainfränkisch incubator-wikipedia Germany Mostly River Main area, including Mainz, west of Frankfurt. Indo-European, Germanic, West, High German, German, Middle German, West Middle German, Moselle Franconian http://www.ethnologue.com/show_lang_family.asp?code=vmf 4850000 gug Guaraní, Paraguayan Paraguay Also in Argentina. Latin script. Tupi, Tupi-Guarani, Subgroup I http://www.ethnologue.com/show_lang_family.asp?code=gug A member of macrolanguage Guarani [grn http://www.ethnologue.com/show_language.asp?code=grn] (Paraguay http://www.ethnologue.com/show_country.asp?name=PY). 4730000 hoj Hadothi India Rajasthan, Kota, Jhalawar, Bundi, Baran districts; Madhya Pradesh, Gwalior District. Devanagari script. Indo-European, Indo-Iranian, Indo-Aryan, Central zone, Rajasthani, Unclassified http://www.ethnologue.com/show_lang_family.asp?code=hoj A member of macrolanguage Rajasthani [raj http://www.ethnologue.com/show_language.asp?code=raj] (India http://www.ethnologue.com/show_country.asp?name=IN). 4600000 czh Chinese, Huizhou China South Anhui Province, Huizhou region and Jixi, She (Xi), Ningguo, Jingde, Tunxi, Xiuning, Yi, Qimen and Dongzhi counties; northern Zhejiang Province, Chun’an County, Jiande municipality; northeast Jiangxi Province, Wuyuan, Dexing and Fuliang counties. Han script, Simplified variant. Han script, Traditional variant. Sino-Tibetan, Chinese http://www.ethnologue.com/show_lang_family.asp?code=czh A member of macrolanguage Chinese [zho http://www.ethnologue.com/show_language.asp?code=zho] (China http://www.ethnologue.com/show_country.asp?name=CN). 4500000 sou Thai, Southern incubator-wikipedia Thailand Chumphon, Nakorn Srithammarat; 14 provinces total. Muslim Tai in provinces of Chumporn, Nakorn Srithammarat, Phattalung, Songkhla, Ranong, Phanga, Phuket, Krabi, Trang, Satun. Thai script. Tai-Kadai, Kam-Tai, Be-Tai, Tai-Sek, Tai, Southwestern, Southern http://www.ethnologue.com/show_lang_family.asp?code=sou 4410000 luo Dholuo incubator-wikipedia Kenya Nyanza Province. Also in Tanzania. Latin script. Nilo-Saharan, Eastern Sudanic, Nilotic, Western, Luo, Southern, Luo-Acholi, Luo http://www.ethnologue.com/show_lang_family.asp?code=luo 4321000 ayl Arabic, Libyan Spoken incubator-wikipedia Libya Especially north half. Also in Egypt, Niger. Afro-Asiatic, Semitic, Central, South, Arabic http://www.ethnologue.com/show_lang_family.asp?code=ayl A member of macrolanguage Arabic [ara http://www.ethnologue.com/show_language.asp?code=ara] (Saudi Arabia http://www.ethnologue.com/show_country.asp?name=SA). 4200000 ktu Kituba Democratic Republic of the Congo Bas-Congo and south Bandundu provinces. Latin script. Creole, Kongo based http://www.ethnologue.com/show_lang_family.asp?code=ktu 4156090 aln Albanian, Gheg incubator-wikipedia Serbia Kosovo. Also in Albania, Bulgaria, Czech Republic, Macedonia, Montenegro, Romania, Slovenia, United States. Latin script. Indo-European, Albanian, Gheg http://www.ethnologue.com/show_lang_family.asp?code=aln A member of macrolanguage Albanian [sqi http://www.ethnologue.com/show_language.asp?code=sqi] (Albania http://www.ethnologue.com/show_country.asp?name=AL). 4101000 nso Sotho, Northern incubator-wiktionary,incubator-wikipedia South Africa Transvaal, south and central. Also in Botswana. Latin script. Niger-Congo, Atlantic-Congo, Volta-Congo, Benue-Congo, Bantoid, Southern, Narrow Bantu, Central, S, Sotho-Tswana (S.30), Sotho, Northern http://www.ethnologue.com/show_lang_family.asp?code=nso 4004490 knn Konkani India North and central coastal strip of Maharashtra; Karnataka; Dadra and Nagar Haveli; Kerala. Also in Canada. Devanagari script, official script. Kannada script, no longer in use. Latin script, no longer in use. Indo-European, Indo-Iranian, Indo-Aryan, Southern zone, Konkani http://www.ethnologue.com/show_lang_family.asp?code=knn A member of macrolanguage Konkani [kok http://www.ethnologue.com/show_language.asp?code=kok] (India http://www.ethnologue.com/show_country.asp?name=IN). 4002880 umb Umbundu Angola West, Benguela District. Also in Namibia. Latin script. Niger-Congo, Atlantic-Congo, Volta-Congo, Benue-Congo, Bantoid, Southern, Narrow Bantu, Central, R, South Mbundu (R.10) http://www.ethnologue.com/show_lang_family.asp?code=umb 3960000 kam Kamba Kenya South central, Eastern Province, Machakos and Kitui districts; Coast Province, Kwale District. Latin script. Niger-Congo, Atlantic-Congo, Volta-Congo, Benue-Congo, Bantoid, Southern, Narrow Bantu, Central, E, Kikuyu-Kamba (E.20) http://www.ethnologue.com/show_lang_family.asp?code=kam 3952810 rmt Domari Iran Kurbat and Luli in west; Mehtar in Fars and Kohgiluyeh va Boyerahmad Province; Karachi in north. Also in Afghanistan, Egypt, India, Iraq, Israel, Jordan, Libya, Palestinian West Bank and Gaza, Russian Federation (Europe), Sudan, Syria, Turkey (Europe), Uzbekistan. Indo-European, Indo-Iranian, Indo-Aryan, Central zone, Dom http://www.ethnologue.com/show_lang_family.asp?code=rmt 3930000 mui Musi incubator-wikipedia Indonesia (Sumatra) South Sumatra Province, Musi River upstream to Bukit Barisan mountains, downstream to eastern coastal swamplands. Austronesian, Malayo-Polynesian, Malayo-Sumbawan, North and East, Malayic, Malay http://www.ethnologue.com/show_lang_family.asp?code=mui 3900000 wry Merwari India Rajasthan, Ajmer, Nagaur districts. Indo-European, Indo-Iranian, Indo-Aryan, Central zone, Rajasthani, Marwari http://www.ethnologue.com/show_lang_family.asp?code=wry A member of macrolanguage Marwari [mwr http://www.ethnologue.com/show_language.asp?code=mwr] (India http://www.ethnologue.com/show_country.asp?name=IN). 3800000 myi Mina India Madhya Pradesh, Gwalior, Shivpuri, Guna, Rajgarh districts, Vidisha District, Sironj Subdivision; Rajasthan, Jaipur, Alwar, Bharatpur, Sawai Madhopur, Tonk, Bundi, Ajmer districts. Indo-European, Indo-Iranian, Indo-Aryan, Unclassified http://www.ethnologue.com/show_lang_family.asp?code=myi 3691000 fuc Pulaar Senegal Fulbe Jeeri and Toucouleur primarily in Senegal River Valley and Mauritania; Fulacunda in Upper Casamance region, west of Kolda to Gambia River headwaters east, from Senegal south border to Gambian border north. Also in Gambia, Guinea, Guinea-Bissau, Mali, Mauritania. Arabic script, Ajami style. Latin script. Niger-Congo, Atlantic-Congo, Atlantic, Northern, Senegambian, Fulani-Wolof, Fula, Western http://www.ethnologue.com/show_lang_family.asp?code=fuc A member of macrolanguage Fulah [ful http://www.ethnologue.com/show_language.asp?code=ful] (Senegal http://www.ethnologue.com/show_country.asp?name=SN). 3633900 gom Konkani, Goan incubator-wiktionary,incubator-wikipedia India South coast strip of Maharashtra, Ratnagari District; Goa; Karnataka; Kerala. Also in Kenya, United Arab Emirates. Kannada script. Latin script. Indo-European, Indo-Iranian, Indo-Aryan, Southern zone, Konkani http://www.ethnologue.com/show_lang_family.asp?code=gom A member of macrolanguage Konkani [kok http://www.ethnologue.com/show_language.asp?code=kok] (India http://www.ethnologue.com/show_country.asp?name=IN). 3602000 bem Bemba Zambia North, Copperbelt, and Luapula provinces. Also in Botswana, Democratic Republic of the Congo, Malawi. Latin script. Niger-Congo, Atlantic-Congo, Volta-Congo, Benue-Congo, Bantoid, Southern, Narrow Bantu, Central, M, Bemba (M.40) http://www.ethnologue.com/show_lang_family.asp?code=bem 3599000 afb Arabic, Gulf Spoken Iraq Zubair area, Fau Peninsula. Also in Bahrain, Egypt, Iran, Kuwait, Oman, Qatar, Saudi Arabia, United Arab Emirates, Yemen. Afro-Asiatic, Semitic, Central, South, Arabic http://www.ethnologue.com/show_lang_family.asp?code=afb A member of macrolanguage Arabic [ara http://www.ethnologue.com/show_language.asp?code=ara] (Saudi Arabia http://www.ethnologue.com/show_country.asp?name=SA). 3502300 bjn Banjar Indonesia (Kalimantan) Around Banjarmasin south and east; East Kalimantan, coastal regions of Pulau Laut, Kutai and Pasir; Central Kalimantan as far as Sampit. Also in Malaysia (Sabah). Austronesian, Malayo-Polynesian, Malayo-Sumbawan, North and East, Malayic, Malay http://www.ethnologue.com/show_lang_family.asp?code=bjn 3500000 bug Bugis incubator-wiktionary,wikipedia Indonesia (Sulawesi) Western coast of southeast Sulawesi in Kolaka, Wundulako, Rumbia, and Poleang districts. Also in major towns of Sulawesi. Large enclaves also in other provinces of Sulawesi, Kalimantan, Maluku, Papua, and Sumatra; coastal swamp areas such as Bulukumba, Luwu, Polewali in Polmas, Pasangkayu in Mamuju districts. Also in Malaysia (Sabah). Buginese script. Latin script. Austronesian, Malayo-Polynesian, South Sulawesi, Bugis http://www.ethnologue.com/show_lang_family.asp?code=bug 3405000 bcc Balochi, Southern incubator-wikiversity,incubator-wikipedia Pakistan South Balochistan, south Sind, Karachi. Also in Iran, Oman, United Arab Emirates. Arabic script, Nastaliq style. Indo-European, Indo-Iranian, Iranian, Western, Northwestern, Balochi http://www.ethnologue.com/show_lang_family.asp?code=bcc A member of macrolanguage Baluchi [bal http://www.ethnologue.com/show_language.asp?code=bal] (Pakistan http://www.ethnologue.com/show_country.asp?name=PK). 3330000 ban Bali incubator-wikipedia Indonesia (Java and Bali) Island of Bali, north Nusa Penida, west Lombok Islands, and east Java, South Sulawesi. Balinese script. Javanese script, no longer in use. Latin script, used since early 20th century. Austronesian, Malayo-Polynesian, Malayo-Sumbawan, North and East, Bali-Sasak-Sumbawa http://www.ethnologue.com/show_lang_family.asp?code=ban 3295000 shn Shan incubator-wikipedia Myanmar Shan state, southeast Myanmar. Kokang Shan is in Kokang area, north Wa area, Shan state; Tai Mao is on Burma-Yunnan border, centered at Mu’ang Mao Long or Namkham, Myanmar. Also in China, Thailand. Myanmar (Burmese) script. Tai-Kadai, Kam-Tai, Be-Tai, Tai-Sek, Tai, Southwestern, Northwest http://www.ethnologue.com/show_lang_family.asp?code=shn 3270000 mzn Mazanderani incubator-wikinews,incubator-wikibooks,incubator-wikiversity,incubator-wiktionary,incubator-wikiquotes Iran North near Caspian Sea, south half of Mazanderan Province. Arabic script. Indo-European, Indo-Iranian, Iranian, Western, Northwestern, Caspian http://www.ethnologue.com/show_lang_family.asp?code=mzn 3270000 glk Gilaki incubator-wikinews,incubator-wikibooks,incubator-wikiversity,incubator-wiktionary,incubator-wikiquotes Iran Gilan region, coastal plain, south of Talish. Galeshi is a mountain dialect. Arabic script. Indo-European, Indo-Iranian, Iranian, Western, Northwestern, Caspian http://www.ethnologue.com/show_lang_family.asp?code=glk 3240500 knc Kanuri, Central Nigeria Borno state, Kukawa, Kaga, Konduga, Maiduguri, Monguno, Ngala, Bama, Gwoza LGAs; Yobe state, Nguru, Geidam, Damaturu, Fika, Fune, and Gujba LGAs; Jigawa state, Hadejia LGA. Also in Cameroon, Chad, Eritrea, Niger, Sudan. Arabic script, Ajami style. Latin script. Nilo-Saharan, Saharan, Western, Kanuri http://www.ethnologue.com/show_lang_family.asp?code=knc A member of macrolanguage Kanuri [kau http://www.ethnologue.com/show_language.asp?code=kau] (Nigeria http://www.ethnologue.com/show_country.asp?name=NG). 3202600 jam Jamaican Creole English incubator-wikipedia Jamaica Also in Canada, Costa Rica, Dominican Republic, Panama, United Kingdom, United States. Creole, English based, Atlantic, Western http://www.ethnologue.com/show_lang_family.asp?code=jam 3150000 tzm Tamazight, Central Atlas incubator-wikipedia Morocco Middle Atlas, High Atlas, east High Atlas Mountains. 1,200,000 in rural areas between Taza, Khemisset, Azilal, Errachidia; 100,000 outside language area. Also in Algeria, France. Arabic script. Latin script. Tifinagh (Berber) script. Afro-Asiatic, Berber, Northern, Atlas http://www.ethnologue.com/show_lang_family.asp?code=tzm 3126000 kab Kabyle incubator-wiktionary Algeria Grande Kabylie Mt. range, west Kabylia. Lesser Kabyle in Aokas town area. Also in Belgium, France. Arabic script. Latin script. Tifinagh (Berber) script. Afro-Asiatic, Berber, Northern, Kabyle http://www.ethnologue.com/show_lang_family.asp?code=kab 3123190 mey Hassaniyya Mauritania Also in Algeria, Libya, Mali, Morocco, Niger, Senegal, Western Sahara. Latin script. Afro-Asiatic, Semitic, Central, South, Arabic http://www.ethnologue.com/show_lang_family.asp?code=mey 3100000 czo Chinese, Min Zhong China Central Fujian Province, Sha County, Yong’an and Sanming municipalities. Sino-Tibetan, Chinese http://www.ethnologue.com/show_lang_family.asp?code=czo A member of macrolanguage Chinese [zho http://www.ethnologue.com/show_language.asp?code=zho] (China http://www.ethnologue.com/show_country.asp?name=CN). 3090000 vmw Makhuwa Mozambique Nampula, south of Meetto area. Latin script. Niger-Congo, Atlantic-Congo, Volta-Congo, Benue-Congo, Bantoid, Southern, Narrow Bantu, Central, P, Makua (P.30) http://www.ethnologue.com/show_lang_family.asp?code=vmw 3000000 swv Shekhawati India Rajasthan, Sikar, Jhunjhunun, Churu districts. Indo-European, Indo-Iranian, Indo-Aryan, Central zone, Rajasthani, Marwari http://www.ethnologue.com/show_lang_family.asp?code=swv A member of macrolanguage Marwari [mwr http://www.ethnologue.com/show_language.asp?code=mwr] (India http://www.ethnologue.com/show_country.asp?name=IN). 3000000 shi Tachelhit incubator-wikinews,incubator-wikibooks,incubator-wiktionary,incubator-wikipedia Morocco Southwest, from coast south to Ifni, north to near Agadir, northeast to Marrakech outskirts, east to Draa, including Sous valley, and south near the border. Also in Algeria, France. Arabic script. Tifinagh (Berber) script. Afro-Asiatic, Berber, Northern, Atlas http://www.ethnologue.com/show_lang_family.asp?code=shi 3000000 sdh Kurdish, Southern incubator-wikipedia Iran Western Iran, Kermanshah, Ilam provinces; Eastern Iraq border with those provinces including Xanaqin. Also in Iraq. Arabic script. Indo-European, Indo-Iranian, Iranian, Western, Northwestern, Kurdish http://www.ethnologue.com/show_lang_family.asp?code=sdh A member of macrolanguage Kurdish [kur http://www.ethnologue.com/show_language.asp?code=kur] (Iraq http://www.ethnologue.com/show_country.asp?name=IQ). 3000000 kmb Kimbundu Angola Northwest, Luanda Province. Latin script. Niger-Congo, Atlantic-Congo, Volta-Congo, Benue-Congo, Bantoid, Southern, Narrow Bantu, Central, H, Mbundu (H.20) http://www.ethnologue.com/show_lang_family.asp?code=kmb 3000000 hrx Hunsrik Brazil Widespread with high concentrations in Rio Grande do Sul, Santa Catarina, and Paraná. Also in Argentina, Chile, Paraguay, Uruguay. Indo-European, Germanic http://www.ethnologue.com/show_lang_family.asp?code=hrx 3000000 gdx Godwari India Rajasthan, Jhalor, Sirohi, Pali districts. Indo-European, Indo-Iranian, Indo-Aryan, Central zone, Rajasthani, Marwari http://www.ethnologue.com/show_lang_family.asp?code=gdx 2929200 fuf Pular Guinea Northwest, Fouta Djallon area. Also in Gambia, Guinea-Bissau, Mali, Senegal, Sierra Leone. Arabic script. Latin script. Niger-Congo, Atlantic-Congo, Atlantic, Northern, Senegambian, Fulani-Wolof, Fula, West Central http://www.ethnologue.com/show_lang_family.asp?code=fuf A member of macrolanguage Fulah [ful http://www.ethnologue.com/show_language.asp?code=ful] (Senegal http://www.ethnologue.com/show_country.asp?name=SN). 2920000 gbm Garhwali incubator-wikipedia India Uttarakhand; Tehri Garhwal, Pauri Garhwal, Uttarkashi, Chamoli, Dehra Dun, Rudraprayag districts; Himachal Pradesh; Tehri and Uttarkash, Jaunpuri and Ravai. Devanagari script. Indo-European, Indo-Iranian, Indo-Aryan, Northern zone, Garhwali http://www.ethnologue.com/show_lang_family.asp?code=gbm 2900000 sid Sidamo Ethiopia South central, Sidamo zone, northeast of Lake Abaya and southeast of Lake Awasa. Ethiopic script. Latin script. Afro-Asiatic, Cushitic, East, Highland http://www.ethnologue.com/show_lang_family.asp?code=sid 2700000 bew Betawi incubator-wikipedia Indonesia (Java and Bali) Jakarta, Java. Austronesian, Malayo-Polynesian, Malayo-Sumbawan, North and East, Malayic, Malay, Trade http://www.ethnologue.com/show_lang_family.asp?code=bew 2680000 ins Indian Sign Language India Widespread. Also in Bangladesh, Pakistan. Deaf sign language http://www.ethnologue.com/show_lang_family.asp?code=ins 2649205 pcc Bouyei incubator-wikipedia China Guizhou-Yunnan plateau, mainly Buyi-Miao and Miao-Dong autonomous prefectures, Zhenning and Guanling counties, south and southwest Guizhou; Yunnan Province, Luoping County; Sichuan Province, Ningnan and Huidong counties. Also in France, United States, Viet Nam. Latin script. Tai-Kadai, Kam-Tai, Be-Tai, Tai-Sek, Tai, Northern http://www.ethnologue.com/show_lang_family.asp?code=pcc 2600000 meo Malay, Kedah Malaysia (Peninsular) Kedah, Penang, Perlis, and (north) Perak states. Also in Thailand. Austronesian, Malayo-Polynesian, Malayo-Sumbawan, North and East, Malayic, Malay http://www.ethnologue.com/show_lang_family.asp?code=meo A member of macrolanguage Malay [msa http://www.ethnologue.com/show_language.asp?code=msa] (Malaysia http://www.ethnologue.com/show_country.asp?name=MY). 2558800 cpx Chinese, Pu-Xian incubator-wikipedia China East central Fujian Province, Putian and Xianyou counties. Also in Malaysia (Peninsular), Singapore. Sino-Tibetan, Chinese http://www.ethnologue.com/show_lang_family.asp?code=cpx A member of macrolanguage Chinese [zho http://www.ethnologue.com/show_language.asp?code=zho] (China http://www.ethnologue.com/show_country.asp?name=CN). 2500000 bcl Bicolano, Central incubator-wiktionary Philippines Luzon, Camarines Norte and Sur, south Catanduanes, north Sorsogon, Albay. Naga City and Legaspi City are centers. Latin script. Austronesian, Malayo-Polynesian, Philippine, Greater Central Philippine, Central Philippine, Bikol, Coastal, Naga http://www.ethnologue.com/show_lang_family.asp?code=bcl A member of macrolanguage Bikol [bik http://www.ethnologue.com/show_language.asp?code=bik] (Philippines http://www.ethnologue.com/show_country.asp?name=PH). 2438400 dje Zarma Niger Southwest. Also in Burkina Faso, Mali, Nigeria. Arabic script, Ajami style. Latin script. Nilo-Saharan, Songhai, Southern http://www.ethnologue.com/show_lang_family.asp?code=dje 2380000 ndc Ndau Mozambique South central region, Sofala and Manica Province, south of Beira. Also in Zimbabwe. Latin script. Niger-Congo, Atlantic-Congo, Volta-Congo, Benue-Congo, Bantoid, Southern, Narrow Bantu, Central, S, Shona (S.10) http://www.ethnologue.com/show_lang_family.asp?code=ndc 2360000 kfy Kumaoni incubator-wikipedia India Uttarakhand, Almora, Nainital, Pithoragarh, Bageshwar, Champawat, Udhamsingh Nagar districts; Central Kumaoni in Almora and north Nainital; Northeastern Kumaoni in Pithoragarh; Southeastern Kumaoni in southeast Nainital; Western Kumaoni west of Almora and Nainital. Devanagari script. Indo-European, Indo-Iranian, Indo-Aryan, Northern zone, Central Pahari http://www.ethnologue.com/show_lang_family.asp?code=kfy 2350000 pse Malay, Central incubator-wikipedia Indonesia (Sumatra) South Sumatra, central Bukit Barisan highlands west to the Indian ocean along Bengkulu coast, east down Lematang and Ogan river valleys; south of Muaraenim, east and southeast of Lahat. Austronesian, Malayo-Polynesian, Malayo-Sumbawan, North and East, Malayic, Malay http://www.ethnologue.com/show_lang_family.asp?code=pse 2330000 nyn Nyankore Uganda Southwest, Bushenyi and Mbarara districts mainly; Kanungu, Ntungamo, and Rukungiri districts. Latin script. Niger-Congo, Atlantic-Congo, Volta-Congo, Benue-Congo, Bantoid, Southern, Narrow Bantu, Central, J, Nyoro-Ganda (J.10) http://www.ethnologue.com/show_lang_family.asp?code=nyn 2320000 sgw Sebat Bet Gurage Ethiopia West Gurage region. Chaha in Emdibir area; Gura in Gura Megenase and Wirir areas; Muher in Ch’eza area, mountains north of Chaha and Ezha; Gyeto south of Ark’it’ in K’abul and K’want’e; Ezha in Agenna. Ethiopic script. Afro-Asiatic, Semitic, South, Ethiopian, South, Outer, tt-Group http://www.ethnologue.com/show_lang_family.asp?code=sgw 2262900 ayr Aymara, Central Bolivia Whole Altiplano west of eastern Andes. Some migration to yungas and lowlands. Also in Argentina, Chile, Peru. Latin script. Aymaran http://www.ethnologue.com/show_lang_family.asp?code=ayr A member of macrolanguage Aymara [aym http://www.ethnologue.com/show_language.asp?code=aym] (Bolivia http://www.ethnologue.com/show_country.asp?name=BO). 2220000 brh Brahui incubator-wikipedia Pakistan South central, Quetta and Kalat region, east Baluchistan and Sind provinces. Also in Afghanistan, Iran, Turkmenistan. Arabic script, Nastaliq style. Dravidian, Northern http://www.ethnologue.com/show_lang_family.asp?code=brh 2210000 tiv Tiv Nigeria Benue state, Makurdi, Gwer, Gboko Kwande, Vandeikya, and Katsina Ala LGAs; Plateau state, Lafia LGA; Taraba state, Bali, Takum, and Wukari LGAs. Also in Cameroon. Latin script. Niger-Congo, Atlantic-Congo, Volta-Congo, Benue-Congo, Bantoid, Southern, Tivoid http://www.ethnologue.com/show_lang_family.asp?code=tiv 2210000 haz Hazaragi incubator-wikipedia Afghanistan Central mountains between Kabul and Herat (Hazarajat); Kabul, between Maimana and Sari-Pul; north from immediately south of Ikoh i Baba mountain range almost to Mazar e Sharif; many refugees. Also in Iran, Pakistan, Tajikistan. Arabic script. Indo-European, Indo-Iranian, Iranian, Western, Southwestern, Persian http://www.ethnologue.com/show_lang_family.asp?code=haz 2130000 bci Baoulé Côte d’Ivoire Central Department, widespread in the south. Latin script. Niger-Congo, Atlantic-Congo, Volta-Congo, Kwa, Nyo, Potou-Tano, Tano, Central, Bia, Northern http://www.ethnologue.com/show_lang_family.asp?code=bci 2120300 guz Ekegusii Kenya Nyanza Province, Kisii District, south of Kavirondo Gulf. Also in Tanzania. Latin script. Niger-Congo, Atlantic-Congo, Volta-Congo, Benue-Congo, Bantoid, Southern, Narrow Bantu, Central, E, Kuria (E.10) http://www.ethnologue.com/show_lang_family.asp?code=guz 2110000 dgo Dogri incubator-wikipedia India Jammu and Kashmir, Udhampur, Reasi, Kathua, Poonch districts. Arabic script, Nastaliq style, no longer in use. Devanagari script. Indo-European, Indo-Iranian, Indo-Aryan, Northern zone, Western Pahari http://www.ethnologue.com/show_lang_family.asp?code=dgo A member of macrolanguage Dogri [doi http://www.ethnologue.com/show_language.asp?code=doi] (India http://www.ethnologue.com/show_country.asp?name=IN). 2100000 sas Sasak Indonesia (Nusa Tenggara) Lombok Island. Latin script. Austronesian, Malayo-Polynesian, Malayo-Sumbawan, North and East, Bali-Sasak-Sumbawa, Sasak-Sumbawa http://www.ethnologue.com/show_lang_family.asp?code=sas 2100000 bgq Bagri India Punjab, Firozepur District; Rajasthan, Hanumangarh, Sriganganagar districts; Haryana, Sirsa, Fatehabad districts. Also in Pakistan. Indo-European, Indo-Iranian, Indo-Aryan, Central zone, Rajasthani, Unclassified http://www.ethnologue.com/show_lang_family.asp?code=bgq A member of macrolanguage Rajasthani [raj http://www.ethnologue.com/show_language.asp?code=raj] (India http://www.ethnologue.com/show_country.asp?name=IN). 2094200 kru Kurux India Chhattisgarh, Raigarh, Surguja districts; Jharkhand Ranchi District; West Bengal, Jalpaigiri District; Bihar; Orissa, Sundargarh, Jharsuguda districts; Assam; Tripura. Also in Bangladesh, Bhutan. Devanagari script. Dravidian, Northern http://www.ethnologue.com/show_lang_family.asp?code=kru 2060000 xog Soga Uganda Central, between lakes Victoria and Kyoga: Kamuli, Bagiri and Mayuge districts; Kaliro District (Lulamogi Dialect); Jinja District (Lutenga Dialect); Iganga District, Busiki County (Lusiki Dialect). Latin script. Niger-Congo, Atlantic-Congo, Volta-Congo, Benue-Congo, Bantoid, Southern, Narrow Bantu, Central, J, Nyoro-Ganda (J.10) http://www.ethnologue.com/show_lang_family.asp?code=xog 2031800 emk Maninkakan, Eastern Guinea Widespread in upper region; central, Kankan region; forest region near Liberia. Also in Liberia, Sierra Leone. Latin script. N’Ko script. Niger-Congo, Mande, Western, Central-Southwestern, Central, Manding-Jogo, Manding-Vai, Manding-Mokole, Manding, Manding-East, Southeastern Manding http://www.ethnologue.com/show_lang_family.asp?code=emk A member of macrolanguage Mandingo [man http://www.ethnologue.com/show_language.asp?code=man] (Guinea http://www.ethnologue.com/show_country.asp?name=GN). 2000000 sxu Saxon, Upper Germany East, southeast, Sachsen with Dresden, Leipzig, Chemnitz, Halle in Sachsen-Anhalt. Indo-European, Germanic, West, High German, German, Middle German, East Middle German http://www.ethnologue.com/show_lang_family.asp?code=sxu 2000000 mtr Mewari India Rajasthan, Udaipur, Bhilwara, Chittoaurgarh districts; Gujarat; Haryana; Delhi; Madhya Pradesh; Uttar Pradesh. Devanagari script. Indo-European, Indo-Iranian, Indo-Aryan, Central zone, Rajasthani, Marwari http://www.ethnologue.com/show_lang_family.asp?code=mtr A member of macrolanguage Marwari [mwr http://www.ethnologue.com/show_language.asp?code=mwr] (India http://www.ethnologue.com/show_country.asp?name=IN). 2000000 iii ii Nuosu incubator-wikipedia China North Yunnan, south Sichuan, mainly in Greater and Lesser Liangshan mountains. Spoken in over 40 counties. Latin script. Yi script. Sino-Tibetan, Tibeto-Burman, Burmic, Ngwi, Northern http://www.ethnologue.com/show_lang_family.asp?code=iii 2000000 btb Beti Cameroon Major part of Center and South provinces; East Province, Lom-and-Djerem and Upper Nyong divisions. Latin script. Niger-Congo, Atlantic-Congo, Volta-Congo, Benue-Congo, Bantoid, Southern, Narrow Bantu, Northwest, A, Yaunde-Fang (A.70) http://www.ethnologue.com/show_lang_family.asp?code=btb 2000000 bbc Batak Toba incubator-wikipedia Indonesia (Sumatra) North Sumatra, Samosir Island and east, south, and west of Toba Lake. Batak script. Latin script. Austronesian, Malayo-Polynesian, Northwest Sumatra-Barrier Islands, Batak, Southern http://www.ethnologue.com/show_lang_family.asp?code=bbc 1980000 zyb Zhuang, Yongbei China Guangxi Zhuang Autonomous Region, N. Yongning, Hengxian, Bingyang, Wuming, Pingguo. Tai-Kadai, Kam-Tai, Be-Tai, Tai-Sek, Tai, Northern http://www.ethnologue.com/show_lang_family.asp?code=zyb A member of macrolanguage Zhuang [zha http://www.ethnologue.com/show_language.asp?code=zha] (China http://www.ethnologue.com/show_country.asp?name=CN). 1970000 sck Sadri India Jharkhand, Ranchi, Palamau districts; West Bengal; Orissa; Assam; Madhya Pradesh; Andaman Islands; Nagaland. Bengali script. Devanagari script. Indo-European, Indo-Iranian, Indo-Aryan, Eastern zone, Bihari http://www.ethnologue.com/show_lang_family.asp?code=sck 1950000 tcy Tulu incubator-wikipedia India Karnataka, South Kanara (Dakshina Kannada) and Udipi districts; Kerala, Kasargod District; scattered in other states in India. Kannada script. Tulu script, philosophical texts and religious verses are sometimes written in this script. Dravidian, Southern, Tulu http://www.ethnologue.com/show_lang_family.asp?code=tcy 1950000 gno Gondi, Northern India Madhya Pradesh, Betul, Chhindwara, Seoni, Mandla, Balaghat districts; Maharashtra state, Amravati, Wardha, Nagpur, Bhandara, Yavatmal districts. Devanagari script. Dravidian, South-Central, Gondi-Kui, Gondi http://www.ethnologue.com/show_lang_family.asp?code=gno A member of macrolanguage Gondi [gon http://www.ethnologue.com/show_language.asp?code=gon] (India http://www.ethnologue.com/show_country.asp?name=IN). 1930000 wbq Waddar India Andhra Pradesh; Karnataka; Maharashtra, Jalgaon District. Dravidian, South-Central, Telugu http://www.ethnologue.com/show_lang_family.asp?code=wbq 1916000 yao Yao Malawi Southeast tip of Lake Malawi area, bordering Mozambique. Also in Mozambique, Tanzania, Zambia. Latin script. Niger-Congo, Atlantic-Congo, Volta-Congo, Benue-Congo, Bantoid, Southern, Narrow Bantu, Central, P, Yao (P.20) http://www.ethnologue.com/show_lang_family.asp?code=yao 1900000 quc K’iche’, Central Guatemala Central highlands, Totonicapán, southern El Quiché, eastern Sololá, eastern Quezaltenango departments. Latin script. Mayan, Quichean-Mamean, Greater Quichean, Quichean, Quiche-Achi http://www.ethnologue.com/show_lang_family.asp?code=quc 1900000 bhk Bicolano, Albay Philippines Luzon, west Albay Province and Buhi, Camarines Sur. Austronesian, Malayo-Polynesian, Philippine, Greater Central Philippine, Central Philippine, Bikol, Inland, Buhi-Daraga http://www.ethnologue.com/show_lang_family.asp?code=bhk A member of macrolanguage Bikol [bik http://www.ethnologue.com/show_language.asp?code=bik] (Philippines http://www.ethnologue.com/show_country.asp?name=PH). 1880000 mfp Malay, Makassar Indonesia (Sulawesi) South Sulawesi, Makassar port area. Austronesian, Malayo-Polynesian, Malayo-Sumbawan, North and East, Malayic, Malay, Trade http://www.ethnologue.com/show_lang_family.asp?code=mfp 1880000 hno Hindko, Northern Pakistan Hazara Division, Mansehra and Abbotabad districts, Indus and Kaghan valleys and valleys of Indus tributaries, NWFP. Arabic script. Indo-European, Indo-Iranian, Indo-Aryan, Northwestern zone, Lahnda http://www.ethnologue.com/show_lang_family.asp?code=hno A member of macrolanguage Lahnda [lah http://www.ethnologue.com/show_language.asp?code=lah] (Pakistan http://www.ethnologue.com/show_country.asp?name=PK). 1860000 ymm Maay Somalia South, Gedo region, Middle and Lower Shabeelle, Middle and Lower Jubba, Baay, and Bakool regions. Also in Ethiopia, Kenya, Sudan, United States. Afro-Asiatic, Cushitic, East, Somali http://www.ethnologue.com/show_lang_family.asp?code=ymm 1849000 teo Teso incubator-wikipedia Uganda East, Katakwi (mainly), Soroti, Kaberamaido, Kumi, Pallisa, and Tororo districts. Lokathan, Madial area, Nangeya Mountains north end. Also in Kenya. Latin script. Nilo-Saharan, Eastern Sudanic, Nilotic, Eastern, Lotuxo-Teso, Teso-Turkana, Teso http://www.ethnologue.com/show_lang_family.asp?code=teo 1840000 zzj Zhuang, Zuojiang China Southwest Guangxi Province, Tiandeng, Daxin, Chongzuo, Ningming, Longzhou and Pingxiang Jingxi counties; Yunnan Province, Funing County, a few villages. Also in Viet Nam. Han (Hanzi, Kanji, Hanja) script. Tai-Kadai, Kam-Tai, Be-Tai, Tai-Sek, Tai, Central http://www.ethnologue.com/show_lang_family.asp?code=zzj A member of macrolanguage Zhuang [zha http://www.ethnologue.com/show_language.asp?code=zha] (China http://www.ethnologue.com/show_country.asp?name=CN). 1830000 bjq Malagasy, Southern Betsimisaraka Madagascar East coast, Toamasina Province, Mahanoro District; Fianarantsoa Province, Nosy Varika, Mananjary, Manakara Atsimo districts. Austronesian, Malayo-Polynesian, Greater Barito, East, Malagasy http://www.ethnologue.com/show_lang_family.asp?code=bjq A member of macrolanguage Malagasy [mlg http://www.ethnologue.com/show_language.asp?code=mlg] (Madagascar http://www.ethnologue.com/show_country.asp?name=MG). 1810000 zyn Zhuang, Yongnan China South Guangxi, south Yongning, Longan, Fusui, Shangsi, Qinzhou and Fangcheng counties; some in Jingxi County; Yunnan, Funing County. Also in Viet Nam. Tai-Kadai, Kam-Tai, Be-Tai, Tai-Sek, Tai, Central http://www.ethnologue.com/show_lang_family.asp?code=zyn A member of macrolanguage Zhuang [zha http://www.ethnologue.com/show_language.asp?code=zha] (China http://www.ethnologue.com/show_country.asp?name=CN). 1740000 mer Kimîîru Kenya Eastern Province, Meru District, northeast of Mt. Kenya. Latin script. Niger-Congo, Atlantic-Congo, Volta-Congo, Benue-Congo, Bantoid, Southern, Narrow Bantu, Central, E, Kikuyu-Kamba (E.20), Meru http://www.ethnologue.com/show_lang_family.asp?code=mer 1710000 wbr Wagdi India Rajasthan, south Udaipur, Dungarpur, Banswara districts; Gujarat, Sabarkantha, Panchmahals; Andhra Pradesh, Hyderabad. Devanagari script. Indo-European, Indo-Iranian, Indo-Aryan, Central zone, Bhil http://www.ethnologue.com/show_lang_family.asp?code=wbr A member of macrolanguage Rajasthani [raj http://www.ethnologue.com/show_language.asp?code=raj] (India http://www.ethnologue.com/show_country.asp?name=IN). 1710000 fuv Fulfulde, Nigerian Nigeria Kano-Katsina, Kano, Katsina, Zaria, Jos Plateau and southeast to Bauchi, Gombe is center; Bororro in Bornu state, Maiduguri is center; Sokoto in Sokoto state. Also in Cameroon, Chad. Arabic script, Ajami style. Latin script. Niger-Congo, Atlantic-Congo, Atlantic, Northern, Senegambian, Fulani-Wolof, Fula, East Central http://www.ethnologue.com/show_lang_family.asp?code=fuv A member of macrolanguage Fulah [ful http://www.ethnologue.com/show_language.asp?code=ful] (Senegal http://www.ethnologue.com/show_country.asp?name=SN). 1700000 xnr Kangri India Himachal Pradesh, Kangra, Hamirpur, Una districts. Devanagari script. Indo-European, Indo-Iranian, Indo-Aryan, Northern zone, Western Pahari http://www.ethnologue.com/show_lang_family.asp?code=xnr A member of macrolanguage Dogri [doi http://www.ethnologue.com/show_language.asp?code=doi] (India http://www.ethnologue.com/show_country.asp?name=IN). 1700000 rif Tarifit incubator-wikipedia Morocco North. Dialects listed are near Al Hoceima. Also in Algeria, Belgium, France, Germany, Netherlands, Spain. Arabic script. Latin script. Tifinagh (Berber) script. Afro-Asiatic, Berber, Northern, Zenati, Riff http://www.ethnologue.com/show_lang_family.asp?code=rif 1690000 avl Arabic, Eastern Egyptian Bedawi Spoken Egypt Bedouin regions in Sinai; parts of Red Sea coast, almost to south border; entire east bank. Also in Israel, Jordan, Palestinian West Bank and Gaza, Syria. Arabic script. Afro-Asiatic, Semitic, Central, South, Arabic http://www.ethnologue.com/show_lang_family.asp?code=avl A member of macrolanguage Arabic [ara http://www.ethnologue.com/show_language.asp?code=ara] (Saudi Arabia http://www.ethnologue.com/show_country.asp?name=SA). 1637000 lgg Lugbara Uganda Northwest, Arua and Yumbe districts. Also in Democratic Republic of the Congo. Latin script. Nilo-Saharan, Central Sudanic, East, Moru-Madi, Central http://www.ethnologue.com/show_lang_family.asp?code=lgg 1600000 mak Makasar incubator-wikipedia Indonesia (Sulawesi) South Sulawesi, southwest corner of the peninsula, most of Pangkep, Maros, Gowa, Bantaeng, Jeneponto, and Takalar districts. Buginese script. Latin script. Austronesian, Malayo-Polynesian, South Sulawesi, Makassar http://www.ethnologue.com/show_lang_family.asp?code=mak 1580000 khn Khandesi India Maharashtra, Dhule District, Sakri tahsil, Nasik District, Satna tahsil, Nandurbar District, Nandurbar and Shahada tahsils; Gujarat. Indo-European, Indo-Iranian, Indo-Aryan, Central zone, Khandesi http://www.ethnologue.com/show_lang_family.asp?code=khn 1580000 cgg Chiga Uganda Extreme southwest: Kanungu, Kabale, Kisoro, Ntungamo, and Rukungiri districts. Latin script. Niger-Congo, Atlantic-Congo, Volta-Congo, Benue-Congo, Bantoid, Southern, Narrow Bantu, Central, J, Nyoro-Ganda (J.10) http://www.ethnologue.com/show_lang_family.asp?code=cgg 1572800 nde nd Ndebele incubator-wikipedia Zimbabwe Matabeleland, Bulawayo area. Also in Botswana, Zambia. Latin script. Niger-Congo, Atlantic-Congo, Volta-Congo, Benue-Congo, Bantoid, Southern, Narrow Bantu, Central, S, Nguni (S.40) http://www.ethnologue.com/show_lang_family.asp?code=nde 1560280 unr Mundari India Jharkhand, south and west Ranchi District; Orissa; Madhya Pradesh; West Bengal; Himachal Pradesh; Assam; Tripura; Andaman and Nicobar Islands. Also in Bangladesh, Nepal. Bengali script. Devanagari script. Latin script. Oriya script. Austro-Asiatic, Munda, North Munda, Kherwari, Mundari http://www.ethnologue.com/show_lang_family.asp?code=unr 1560000 zlj Zhuang, Liujiang China Guangxi Zhuang Autonomous Region: Liujiang, N. Laibin, Yishan, Liucheng, N. Xincheng. Tai-Kadai, Kam-Tai, Be-Tai, Tai-Sek, Tai, Northern http://www.ethnologue.com/show_lang_family.asp?code=zlj A member of macrolanguage Zhuang [zha http://www.ethnologue.com/show_language.asp?code=zha] (China http://www.ethnologue.com/show_country.asp?name=CN). 1543300 brx Bodo incubator-wikipedia India Assam, mainly in Darrang, Nagaon, Kamrup districts; also in Goalpara, Sibsagar, Lakhimpur districts; West Bengal, Darjeeling, Jalpaiguri, Cooch-Behar districts; Manipur, Chandel (Tengnoupal) District; Meghalaya, West Garo Hills District, 7 villages in the Tikrikilla block, East Khasi Hills District. Also in Nepal. Bengali script. Devanagari script. Latin script. Sino-Tibetan, Tibeto-Burman, Jingpho-Konyak-Bodo, Konyak-Bodo-Garo, Bodo-Garo, Bodo http://www.ethnologue.com/show_lang_family.asp?code=brx 1510000 lub lu Luba-Katanga Democratic Republic of the Congo Katanga Province, Haut-Lomami District. Latin script. Niger-Congo, Atlantic-Congo, Volta-Congo, Benue-Congo, Bantoid, Southern, Narrow Bantu, Central, L, Luba (L.30) http://www.ethnologue.com/show_lang_family.asp?code=lub 1500000 zgb Zhuang, Guibei China Guangxi Zhuang Autonomous Region: Longsheng, Sanjiang, Yongfu, Rongan, Rongshui, Luocheng, Huanjiang, Hechi, Nandan, Tian’e, Donglan, Bama. Tai-Kadai, Kam-Tai, Be-Tai, Tai-Sek, Tai, Northern http://www.ethnologue.com/show_lang_family.asp?code=zgb A member of macrolanguage Zhuang [zha http://www.ethnologue.com/show_language.asp?code=zha] (China http://www.ethnologue.com/show_country.asp?name=CN). 1500000 rhg Rohingya Myanmar Rakhine state. Also in Bangladesh, Malaysia, Saudi Arabia, Thailand. Indo-European, Indo-Iranian, Indo-Aryan, Eastern zone, Bengali-Assamese http://www.ethnologue.com/show_lang_family.asp?code=rhg 1500000 qxq Kashkay incubator-wikipedia Iran Southwest Iran, Fars and South Kohgiluyeh va Boyerahmad Province. Shiraz, Gachsaran, and Firuzabad are centers. Arabic script. Altaic, Turkic, Southern, Azerbaijani http://www.ethnologue.com/show_lang_family.asp?code=qxq 1500000 quz Quechua, Cusco Peru Departments of Cusco, half of Puno, and northeast Arequipa. Latin script. Quechuan, Quechua II, C http://www.ethnologue.com/show_lang_family.asp?code=quz A member of macrolanguage Quechua [que http://www.ethnologue.com/show_language.asp?code=que] (Peru http://www.ethnologue.com/show_country.asp?name=PE). 1500000 ngl Lomwe Mozambique Northeast and central, most of Zambezia Province, south Nampula Province. Prestige center is Alto Molocue, Zambezia Province. Latin script. Niger-Congo, Atlantic-Congo, Volta-Congo, Benue-Congo, Bantoid, Southern, Narrow Bantu, Central, P, Makua (P.30) http://www.ethnologue.com/show_lang_family.asp?code=ngl 1500000 lrc Luri, Northern incubator-wikipedia Iran Western Iran: central and south Lorestan, north Khuzestan, south Hamadan Province, south edge of Markazi Province, some regions of Ilam; Khorramabad, Borujerd, Andimeshk; possibly eastern Iraq. Arabic script. Indo-European, Indo-Iranian, Iranian, Western, Southwestern, Luri http://www.ethnologue.com/show_lang_family.asp?code=lrc 1500000 hoc Ho India Jharkhand, Singhbhum District, Kolhan, Seraikella, Dhalbhum areas; Orissa, Mayurbhanj, and Koenjhar districts; West Bengal. Devanagari script, used in Bihar. Oriya script, used in Orissa. Varang Kshiti script. Austro-Asiatic, Munda, North Munda, Kherwari, Mundari http://www.ethnologue.com/show_lang_family.asp?code=hoc 1499700 men Mende Sierra Leone South central. Expanding along the coast and south and east. Also in Liberia. Latin script. Mende script, little used except for correspondence and record keeping, especially accounting. Niger-Congo, Mande, Western, Central-Southwestern, Southwestern, Mende-Loma, Mende-Bandi, Mende-Loko http://www.ethnologue.com/show_lang_family.asp?code=men 1490000 laj Lango Uganda Central, Apac and Lira districts, north of Lake Kyoga. Latin script. Nilo-Saharan, Eastern Sudanic, Nilotic, Western, Luo, Southern, Luo-Acholi, Alur-Acholi, Lango-Acholi http://www.ethnologue.com/show_lang_family.asp?code=laj 1490000 khg Tibetan, Khams China Northeast Tibet, Changdu (Qamdo) and Naqu (Nagqu) districts; west Sichuan, Ganzi (Garzê) Tibetan Autonomous Prefecture; northwest Yunnan Province, Diqing (Dêqên) Tibetan Autonomous Prefecture; southwest Qinghai Province, Yushu Tibetan Autonomous Prefecture. Tibetan script. Sino-Tibetan, Tibeto-Burman, Himalayish, Tibeto-Kanauri, Tibetic, Tibetan, Northern http://www.ethnologue.com/show_lang_family.asp?code=khg 1480000 tyz Tày Viet Nam Central and northeast, near the China border, Cao Bàng, Lang Son, Hà Giang, Tuye Quang, Bác Thái, Quang Ninh, Hà Bac, Lam Dòng provinces; some settled south in Tung Nghia and Song Mao. Possibly also in Laos. Also in France, United States. Latin script. Tai-Kadai, Kam-Tai, Be-Tai, Tai-Sek, Tai, Central http://www.ethnologue.com/show_lang_family.asp?code=tyz 1480000 ksw Karen, S’gaw incubator-wikipedia Myanmar Ayeyawaddy (Irrawaddy) delta area, Taninthayi (Tenasserim) Division, the Pegu range between the Irrawaddy and Sittang rivers, the eastern hills Kayin (Karen) state. Also in Thailand. Latin script, no longer in use. Myanmar (Burmese) script, Sgaw extensions. Sino-Tibetan, Tibeto-Burman, Karen, Sgaw-Bghai, Sgaw http://www.ethnologue.com/show_lang_family.asp?code=ksw 1440000 gog Gogo Tanzania Dodoma region; Singida region, Manyoni District. Latin script. Niger-Congo, Atlantic-Congo, Volta-Congo, Benue-Congo, Bantoid, Southern, Narrow Bantu, Central, G, Gogo (G.10) http://www.ethnologue.com/show_lang_family.asp?code=gog 1435500 fon Fon Benin Zou Province, Atlantic Province, south Abomey-Calavi and Ouidah Subprefectures; Littoral Province, Cotonou. Interspersed with other groups south and in towns north. Also in Togo. Latin script. Niger-Congo, Atlantic-Congo, Volta-Congo, Kwa, Left Bank, Gbe, Fon http://www.ethnologue.com/show_lang_family.asp?code=fon 1430000 noe Nimadi India Madhya Pradesh, Khandwa, Khargone, Barwani, and south Dhar districts; Uttar Pradesh; Maharashtra. Devanagari script. Indo-European, Indo-Iranian, Indo-Aryan, Central zone, Rajasthani, Unclassified http://www.ethnologue.com/show_lang_family.asp?code=noe 1400000 shy Tachawit Algeria Aurès Mountains, south and southeast of Grand Kabylie. Arabic script, major usage. Latin script, minor but increasing usage. Tifinagh (Berber) script. Afro-Asiatic, Berber, Northern, Zenati, Shawiya http://www.ethnologue.com/show_lang_family.asp?code=shy 1400000 kxm Khmer, Northern Thailand Northeast, mainly Surin, Sisaket, Buriram, Khorat provinces. Thai script. Austro-Asiatic, Mon-Khmer, Eastern Mon-Khmer, Khmer http://www.ethnologue.com/show_lang_family.asp?code=kxm 1400000 cqd Miao, Chuanqiandian Cluster China West Guizhou, west Guangxi, south Sichuan, Yunnan (especially southeast and northeast). Hmong-Mien, Hmongic, Chuanqiandian http://www.ethnologue.com/show_lang_family.asp?code=cqd A member of macrolanguage Hmong [hmn http://www.ethnologue.com/show_language.asp?code=hmn] (China http://www.ethnologue.com/show_country.asp?name=CN). 1400000 anw Anaang Nigeria Akwa Ibom state, Ikot Ekpene, Essien Udim, Abak, Ukanafun, and Oruk-Anam LGAs. Niger-Congo, Atlantic-Congo, Volta-Congo, Benue-Congo, Cross River, Delta Cross, Lower Cross, Obolo, Efik http://www.ethnologue.com/show_lang_family.asp?code=anw 1391000 mni Meitei incubator-wikipedia India Manipur; Assam, Cachar, Karimganji; Nagaland; Tripura, West and North Tripura districts; Uttar Pradesh; West Bengal. Also in Bangladesh, Myanmar. Bengali script. Meetei Mayek script. Sino-Tibetan, Tibeto-Burman, Meitei http://www.ethnologue.com/show_lang_family.asp?code=mni 1367000 alz Alur Democratic Republic of the Congo Orientale Province, Mahagi Territory, northwest to Djalasiga area. Also in Uganda. Latin script. Nilo-Saharan, Eastern Sudanic, Nilotic, Western, Luo, Southern, Luo-Acholi, Alur-Acholi, Alur http://www.ethnologue.com/show_lang_family.asp?code=alz 1348000 mgh Makhuwa-Meetto Mozambique Cabo Delgado and Niassa provinces. Also in Tanzania. Latin script. Niger-Congo, Atlantic-Congo, Volta-Congo, Benue-Congo, Bantoid, Southern, Narrow Bantu, Central, P, Makua (P.30) http://www.ethnologue.com/show_lang_family.asp?code=mgh 1346000 mnk Mandinka Senegal Southeast and south central. Also in Gambia, Guinea-Bissau. Arabic script. Latin script. Niger-Congo, Mande, Western, Central-Southwestern, Central, Manding-Jogo, Manding-Vai, Manding-Mokole, Manding, Manding-West http://www.ethnologue.com/show_lang_family.asp?code=mnk A member of macrolanguage Mandingo [man http://www.ethnologue.com/show_language.asp?code=man] (Guinea http://www.ethnologue.com/show_country.asp?name=GN). 1340000 seh Sena Mozambique Northwest, Sofala, Manica, Tete, and Zambezia provinces, lower Zambezi River region. Latin script. Niger-Congo, Atlantic-Congo, Volta-Congo, Benue-Congo, Bantoid, Southern, Narrow Bantu, Central, N, Senga-Sena (N.40), Sena http://www.ethnologue.com/show_lang_family.asp?code=seh 1340000 kde Makonde Tanzania Mtwara region, primarily Mtwara Urban, Mtwara Rural, Tandahomba, and Newala districts. Also in Mozambique. Latin script. Niger-Congo, Atlantic-Congo, Volta-Congo, Benue-Congo, Bantoid, Southern, Narrow Bantu, Central, P, Yao (P.20) http://www.ethnologue.com/show_lang_family.asp?code=kde 1300000 hay Haya Tanzania Kagera region, mainly Bukoba Urban and Bukoba Rural districts. Latin script. Niger-Congo, Atlantic-Congo, Volta-Congo, Benue-Congo, Bantoid, Southern, Narrow Bantu, Central, J, Haya-Jita (J.20) http://www.ethnologue.com/show_lang_family.asp?code=hay 1300000 bhb Bhili incubator-wikipedia India Madhya Pradesh, Jhabua, Dhar, Ratlam, Indore, Khargone districts; Gujarat, Sabarkantha, Panchmahals, and Dahod districts. Devanagari script. Gujarati script. Indo-European, Indo-Iranian, Indo-Aryan, Central zone, Bhil http://www.ethnologue.com/show_lang_family.asp?code=bhb 1280000 xmw Malagasy, Tsimihety Madagascar North central. Latin script. Austronesian, Malayo-Polynesian, Greater Barito, East, Malagasy http://www.ethnologue.com/show_lang_family.asp?code=xmw A member of macrolanguage Malagasy [mlg http://www.ethnologue.com/show_language.asp?code=mlg] (Madagascar http://www.ethnologue.com/show_country.asp?name=MG). 1250000 snk Soninke Mali Nioro, Nara, Banamba, Yélémané, Kayes principal towns. Possibly in Niger. Also in Côte d’Ivoire, Gambia, Guinea, Guinea-Bissau, Mauritania, Senegal. Arabic script. Latin script. Niger-Congo, Mande, Western, Northwestern, Soninke-Bobo, Soninke-Boso, Soninke http://www.ethnologue.com/show_lang_family.asp?code=snk 1250000 hea Miao, Northern Qiandong China East and south Guizhou Province, Majiang, Danzhai, Leishan, Taijiang, Huangping, Shibing, Jianhe, Zhenyuan, Sansui, Fuquan, Pingba, Zhenning, Xingren, Anlong, Guanling, Zhenfeng and Ziyun counties, Kaili Qingzhen municipalities; northwest Guangxi Province, Longlin County. Hmong-Mien, Hmongic, Qiandong http://www.ethnologue.com/show_lang_family.asp?code=hea A member of macrolanguage Hmong [hmn http://www.ethnologue.com/show_language.asp?code=hmn] (China http://www.ethnologue.com/show_country.asp?name=CN). 1240000 gmo Gamo-Gofa-Dawro Ethiopia Omo region, Arba Minch area; mountains west to Lake Abaya. Ethiopic script. Latin script. Afro-Asiatic, Omotic, North, Gonga-Gimojan, Gimojan, Ometo-Gimira, Ometo, Central http://www.ethnologue.com/show_lang_family.asp?code=gmo 1230000 wal Wolaytta Ethiopia Wolaytta region, Lake Abaya area. Ethiopic script. Afro-Asiatic, Omotic, North, Gonga-Gimojan, Gimojan, Ometo-Gimira, Ometo, Central http://www.ethnologue.com/show_lang_family.asp?code=wal 1230000 tem Themne Sierra Leone Northern Province, west of Sewa River to Little Scarcie. Latin script. Niger-Congo, Atlantic-Congo, Atlantic, Southern, Mel, Temne, Temne-Banta http://www.ethnologue.com/show_lang_family.asp?code=tem 1230000 fuh Fulfulde, Western Niger Niger West, Burkina Faso border east to Dogondoutchi area. Also in Benin, Burkina Faso. Arabic script, Ajami style. Latin script. Niger-Congo, Atlantic-Congo, Atlantic, Northern, Senegambian, Fulani-Wolof, Fula, East Central http://www.ethnologue.com/show_lang_family.asp?code=fuh A member of macrolanguage Fulah [ful http://www.ethnologue.com/show_language.asp?code=ful] (Senegal http://www.ethnologue.com/show_country.asp?name=SN). 1229000 dyu Jula incubator-wikipedia Burkina Faso Comoé, Kénédougou, Houet, and Leraba provinces. Also in Côte d’Ivoire, Mali. Arabic script. Latin script. Niger-Congo, Mande, Western, Central-Southwestern, Central, Manding-Jogo, Manding-Vai, Manding-Mokole, Manding, Manding-East, Northeastern Manding, Bamana http://www.ethnologue.com/show_lang_family.asp?code=dyu 1215000 ach Acholi incubator-wikipedia Uganda North, Kitgum District; Adjumani and Pader districts. Also in Sudan. Latin script. Nilo-Saharan, Eastern Sudanic, Nilotic, Western, Luo, Southern, Luo-Acholi, Alur-Acholi, Lango-Acholi http://www.ethnologue.com/show_lang_family.asp?code=ach 1200000 zeh Zhuang, Eastern Hongshuihe China Guangxi Zhuang Autonomous Region, south of eastern Hongshuihe River and south of Qianjiang River, includes south Shanglin, south Xincheng, south Xingbin, north Guigang, west Guiping and south Wuxuan. Tai-Kadai, Kam-Tai, Be-Tai, Tai-Sek, Tai, Northern http://www.ethnologue.com/show_lang_family.asp?code=zeh A member of macrolanguage Zhuang [zha http://www.ethnologue.com/show_language.asp?code=zha] (China http://www.ethnologue.com/show_country.asp?name=CN). 1200000 vas Vasavi India Maharashtra, Nandurbar District, Tapti River area; Gujarat, Surat, Bharuch districts, north of Tapti River in southern areas of Akkalkuwa and Akrani (Dhadgaon) tahsils, a narrow belt of land between Satpudas and Tapti banks; Satpudas; south of Tapti in central and north Nandurbar and Nawapur tahsils. Devanagari script. Gujarati script. Indo-European, Indo-Iranian, Indo-Aryan, Central zone, Gujarati http://www.ethnologue.com/show_lang_family.asp?code=vas 1200000 bts Batak Simalungun incubator-wikipedia Indonesia (Sumatra) North, northeast of Lake Toba. Batak script. Latin script. Austronesian, Malayo-Polynesian, Northwest Sumatra-Barrier Islands, Batak, Simalungan http://www.ethnologue.com/show_lang_family.asp?code=bts 1200000 btd Batak Dairi incubator-wikipedia Indonesia (Sumatra) Northern, southwest of Lake Toba around Sidikalang. Batak script. Austronesian, Malayo-Polynesian, Northwest Sumatra-Barrier Islands, Batak, Northern http://www.ethnologue.com/show_lang_family.asp?code=btd 1188000 prk Wa, Parauk Myanmar Northeast Shan state, upper Salween River area; East Shan state, Kengtung area. Also in China. Austro-Asiatic, Mon-Khmer, Northern Mon-Khmer, Palaungic, Eastern Palaungic, Waic, Wa http://www.ethnologue.com/show_lang_family.asp?code=prk 1186000 bej Bedawiyet Sudan Northeast along Red Sea coast. Also in Egypt, Eritrea. Arabic script. Latin script. Afro-Asiatic, Cushitic, North http://www.ethnologue.com/show_lang_family.asp?code=bej 1182000 abr Abron Ghana Southwest, northwest of Asante Twi [aka http://www.ethnologue.com/show_language.asp?code=aka]. Also in Côte d’Ivoire. Niger-Congo, Atlantic-Congo, Volta-Congo, Kwa, Nyo, Potou-Tano, Tano, Central, Akan http://www.ethnologue.com/show_lang_family.asp?code=abr 1180000 tsc Tswa Mozambique South, most of Inhambane Province. Also in South Africa, Zimbabwe. Latin script. Niger-Congo, Atlantic-Congo, Volta-Congo, Benue-Congo, Bantoid, Southern, Narrow Bantu, Central, S, Tswa-Ronga (S.50) http://www.ethnologue.com/show_lang_family.asp?code=tsc 1162040 pag Pangasinan wikipedia Philippines Pangasinan Province, Luzon. Also in United States. Latin script. Austronesian, Malayo-Polynesian, Philippine, Northern Luzon, Meso-Cordilleran, South-Central Cordilleran, Southern Cordilleran, West Southern Cordilleran http://www.ethnologue.com/show_lang_family.asp?code=pag 1161900 srr Serer-Sine Senegal West central; Sine and Saloum River valleys. Also in Gambia. Arabic script. Latin script. Niger-Congo, Atlantic-Congo, Atlantic, Northern, Senegambian, Serer http://www.ethnologue.com/show_lang_family.asp?code=srr 1160000 mkw Kituba Congo Mainly between Brazzaville and Pointe-Noire. Latin script. Creole, Kongo based http://www.ethnologue.com/show_lang_family.asp?code=mkw 1150000 bhi Bhilali India Madhya Pradesh, Khargone (Segaon), Barwani (Rajpur), southern Jhabua and southern Dhar districts; Maharashtra, Dhule District; some in Gujarat; Karnataka; Rajasthan. Indo-European, Indo-Iranian, Indo-Aryan, Central zone, Bhil http://www.ethnologue.com/show_lang_family.asp?code=bhi 1142000 zne Zande Democratic Republic of the Congo Far north of Orientale Province, Bas-Uele District. Also in Central African Republic, Sudan. Latin script. Niger-Congo, Atlantic-Congo, Volta-Congo, North, Adamawa-Ubangi, Ubangi, Zande, Zande-Nzakara http://www.ethnologue.com/show_lang_family.asp?code=zne 1142000 tum Tumbuka wikipedia Malawi Northern Province, west shore of Lake Malawi, south of the Ngonde, north of the Tonga and Ngoni. Also in Zambia. Latin script. Niger-Congo, Atlantic-Congo, Volta-Congo, Benue-Congo, Bantoid, Southern, Narrow Bantu, Central, N, Tumbuka (N.20) http://www.ethnologue.com/show_lang_family.asp?code=tum 1140000 mtq Muong Viet Nam Mostly north central mountains, Hoa Bình, Thanh Hóa, Vinh Phú, Yen Bai, Son La, Ninh Binh provinces. Latin script. Austro-Asiatic, Mon-Khmer, Viet-Muong, Muong http://www.ethnologue.com/show_lang_family.asp?code=mtq 1139000 shu Arabic, Chadian Spoken Chad Salamat, Ouaddaï, Wadi Fira regions, Batha region center and west, much of Chari-Baguirmi; Mayo-Kebbi; north Tandjilé; Guéra. Also in Cameroon, Central African Republic, Niger, Nigeria. Arabic script. Latin script. Afro-Asiatic, Semitic, Central, South, Arabic http://www.ethnologue.com/show_lang_family.asp?code=shu A member of macrolanguage Arabic [ara http://www.ethnologue.com/show_language.asp?code=ara] (Saudi Arabia http://www.ethnologue.com/show_country.asp?name=SA). 1127000 toi Tonga Zambia Southern and Western provinces. With Ila [ilb http://www.ethnologue.com/show_language.asp?code=ilb] it predominates south. Also in Zimbabwe. Latin script. Niger-Congo, Atlantic-Congo, Volta-Congo, Benue-Congo, Bantoid, Southern, Narrow Bantu, Central, M, Lenje-Tonga (M.60), Tonga http://www.ethnologue.com/show_lang_family.asp?code=toi 1120000 myx Masaaba Uganda East, Mbale and Sirinko districts, adjacent to Mount Elgon. Latin script. Niger-Congo, Atlantic-Congo, Volta-Congo, Benue-Congo, Bantoid, Southern, Narrow Bantu, Central, J, Masaba-Luyia (J.30) http://www.ethnologue.com/show_lang_family.asp?code=myx 1105000 nyy Nyakyusa-Ngonde Tanzania South Mbeya region, Lake Malawi north end; Iringa region, Makete District. Also in Malawi. Latin script. Niger-Congo, Atlantic-Congo, Volta-Congo, Benue-Congo, Bantoid, Southern, Narrow Bantu, Central, M, Nyakyusa (M.30) http://www.ethnologue.com/show_lang_family.asp?code=nyy 1100000 btm Batak Mandailing incubator-wikipedia Indonesia (Sumatra) North. Batak script. Austronesian, Malayo-Polynesian, Northwest Sumatra-Barrier Islands, Batak, Southern http://www.ethnologue.com/show_lang_family.asp?code=btm 1080000 zch Zhuang, Central Hongshuihe China Guangxi Zhuang Autonomous Region, either side of central stretch of HSH River, including Du’an, Dahua, Mashan, north Shanglin and possibly other border areas such as east Pingguo. Tai-Kadai, Kam-Tai, Be-Tai, Tai-Sek, Tai, Northern http://www.ethnologue.com/show_lang_family.asp?code=zch A member of macrolanguage Zhuang [zha http://www.ethnologue.com/show_language.asp?code=zha] (China http://www.ethnologue.com/show_country.asp?name=CN). 1078200 aar aa Afar incubator-wikibooks,incubator-wiktionary,incubator-wikipedia Ethiopia Eastern lowlands, Afar region. May be in Somalia. Also in Djibouti, Eritrea. Ethiopic script, used in Ethiopia. Latin script. Afro-Asiatic, Cushitic, East, Saho-Afar http://www.ethnologue.com/show_lang_family.asp?code=aar 1070000 ndo ng Ndonga incubator-wikipedia,incubator-wikisource Namibia Ovamboland. Also in Angola. Latin script. Niger-Congo, Atlantic-Congo, Volta-Congo, Benue-Congo, Bantoid, Southern, Narrow Bantu, Central, R, Ndonga (R.20) http://www.ethnologue.com/show_lang_family.asp?code=ndo 1062000 tsg Tausug incubator-wikipedia Philippines Jolo, Sulu Archipelago. Palawan Island, Basilan Island, Zamboanga City and environs. Also in Indonesia (Kalimantan), Malaysia (Sabah). Arabic script. Latin script. Austronesian, Malayo-Polynesian, Philippine, Greater Central Philippine, Central Philippine, Bisayan, South, Butuan-Tausug http://www.ethnologue.com/show_lang_family.asp?code=tsg 1060280 sus Susu Guinea Mainly southwest and west. Also in Guinea-Bissau, Senegal, Sierra Leone. Arabic script. Latin script. Niger-Congo, Mande, Western, Central-Southwestern, Central, Susu-Yalunka http://www.ethnologue.com/show_lang_family.asp?code=sus 1056400 yom Yombe Democratic Republic of the Congo Western Bas-Congo Province, Mayombe Forest. Also in Angola, Congo. Niger-Congo, Atlantic-Congo, Volta-Congo, Benue-Congo, Bantoid, Southern, Narrow Bantu, Central, H, Kongo (H.10) http://www.ethnologue.com/show_lang_family.asp?code=yom 1050000 tig Tigré Eritrea Also in Sudan. Ethiopic script. Afro-Asiatic, Semitic, South, Ethiopian, North http://www.ethnologue.com/show_lang_family.asp?code=tig 1050000 kjp Karen, Pwo Eastern Myanmar Kayin (Karen) state, Mon state, Taninthayi (Tensserim) Division. Also in Thailand. Leke script. Myanmar (Burmese) script. Thai script, used in Thailand. Sino-Tibetan, Tibeto-Burman, Karen, Pwo http://www.ethnologue.com/show_lang_family.asp?code=kjp 1045000 pmu Panjabi, Mirpur India Kashmir, Mirpur area, near Pakistan border. Possibly in Pakistan. Also in United Kingdom. Indo-European, Indo-Iranian, Indo-Aryan, Northwestern zone, Lahnda http://www.ethnologue.com/show_lang_family.asp?code=pmu A member of macrolanguage Lahnda [lah http://www.ethnologue.com/show_language.asp?code=lah] (Pakistan http://www.ethnologue.com/show_country.asp?name=PK). 1045000 mas Maasai incubator-wikisource Kenya Rift Valley Province, Kajiado and Narok districts. Also in Tanzania. Latin script. Nilo-Saharan, Eastern Sudanic, Nilotic, Eastern, Lotuxo-Teso, Lotuxo-Maa, Ongamo-Maa http://www.ethnologue.com/show_lang_family.asp?code=mas 1045000 kzh Kenuzi-Dongola Sudan North, Northern Province, mainly Dongola area. North boundary with Nobiin [fia http://www.ethnologue.com/show_language.asp?code=fia] is Burgeg. Also in Egypt. Arabic script. Coptic script, Old Nubian. Latin script. Nilo-Saharan, Eastern Sudanic, Eastern, Nubian, Central, Dongolawi http://www.ethnologue.com/show_lang_family.asp?code=kzh 1027900 fan Fang Equatorial Guinea Interior. Also in Cameroon, Congo, Gabon, São Tomé e Príncipe. Latin script. Niger-Congo, Atlantic-Congo, Volta-Congo, Benue-Congo, Bantoid, Southern, Narrow Bantu, Northwest, A, Yaunde-Fang (A.70) http://www.ethnologue.com/show_lang_family.asp?code=fan 1025000 mxc Manyika Zimbabwe Manicaland Province and adjacent areas, northeast of Umtali. Also in Mozambique. Latin script. Niger-Congo, Atlantic-Congo, Volta-Congo, Benue-Congo, Bantoid, Southern, Narrow Bantu, Central, S, Shona (S.10) http://www.ethnologue.com/show_lang_family.asp?code=mxc 1016650 nga Ngbaka Democratic Republic of the Congo Equateur Province, Gemena Territory area. 850 villages. Also in Central African Republic, Congo. Latin script. Niger-Congo, Atlantic-Congo, Volta-Congo, North, Adamawa-Ubangi, Ubangi, Gbaya-Manza-Ngbaka, East http://www.ethnologue.com/show_lang_family.asp?code=nga 1009780 cjk Chokwe Democratic Republic of the Congo Near Angola border, southeast Bandundu, Kasaï Occidental, and Katanga provinces. Also in Angola, Namibia, Zambia. Latin script. Niger-Congo, Atlantic-Congo, Volta-Congo, Benue-Congo, Bantoid, Southern, Narrow Bantu, Central, K, Chokwe-Luchazi (K.20) http://www.ethnologue.com/show_lang_family.asp?code=cjk 1008500 ffm Fulfulde, Maasina Mali Central. Western in Segou and Macina areas; Eastern from north of Mopti to Boni east. Also in Côte d’Ivoire, Ghana. Arabic script, Ajami style. Latin script. Niger-Congo, Atlantic-Congo, Atlantic, Northern, Senegambian, Fulani-Wolof, Fula, West Central http://www.ethnologue.com/show_lang_family.asp?code=ffm A member of macrolanguage Fulah [ful http://www.ethnologue.com/show_language.asp?code=ful] (Senegal http://www.ethnologue.com/show_country.asp?name=SN). 1000000 zgn Zhuang, Guibian China Guangxi Zhuang Autonomous Region; Fengshan, Tianlin, Longlin, Xilin, Lingyun, Leyun; Yunnan, Funing, N. Guangnan. Tai-Kadai, Kam-Tai, Be-Tai, Tai-Sek, Tai, Northern http://www.ethnologue.com/show_lang_family.asp?code=zgn A member of macrolanguage Zhuang [zha http://www.ethnologue.com/show_language.asp?code=zha] (China http://www.ethnologue.com/show_country.asp?name=CN). 1000000 tdx Malagasy, Tandroy-Mahafaly Madagascar South, Toliara Province, Beloha, Tsihombe, Ambovombe, Bekily districts. Austronesian, Malayo-Polynesian, Greater Barito, East, Malagasy http://www.ethnologue.com/show_lang_family.asp?code=tdx A member of macrolanguage Malagasy [mlg http://www.ethnologue.com/show_language.asp?code=mlg] (Madagascar http://www.ethnologue.com/show_country.asp?name=MG). 1000000 stv Silt’e Ethiopia South of Addis Ababa 150km, Werabey Town. Ethiopic script. Afro-Asiatic, Semitic, South, Ethiopian, South, Transversal, Harari-East Gurage http://www.ethnologue.com/show_lang_family.asp?code=stv 1000000 sop Songe Democratic Republic of the Congo Kasaï Oriental Province, between Sankuru and Lualaba rivers, mainly Kabinda zone and east into Katanga Province, Kongolo and Kabolo territories of. Latin script. Niger-Congo, Atlantic-Congo, Volta-Congo, Benue-Congo, Bantoid, Southern, Narrow Bantu, Central, L, Songye (L.20) http://www.ethnologue.com/show_lang_family.asp?code=sop 1000000 qug Quichua, Chimborazo Highland incubator-wikipedia Ecuador Central highlands, Chimborazo and Bolivar provinces. Quechuan, Quechua II, B http://www.ethnologue.com/show_lang_family.asp?code=qug A member of macrolanguage Quechua [que http://www.ethnologue.com/show_language.asp?code=que] (Peru http://www.ethnologue.com/show_country.asp?name=PE). 1000000 mfa Malay, Pattani incubator-wikipedia Thailand North, Songkhla (Singgora) Province, Chana (Chenok) region, south through Pattani, Narathiwat, Yala, Saiburi, Tak Bai. Arabic script. Thai script. Austronesian, Malayo-Polynesian, Malayo-Sumbawan, North and East, Malayic, Malay http://www.ethnologue.com/show_lang_family.asp?code=mfa A member of macrolanguage Malay [msa http://www.ethnologue.com/show_language.asp?code=msa] (Malaysia http://www.ethnologue.com/show_country.asp?name=MY). 1000000 mdh Maguindanao Philippines Maguindanao, North Cotabato, South Cotabato, Sultan Kuderat, and Zamboanga del Sur provinces; Iranun also in Bukidnon, Mindanao. Latin script. Austronesian, Malayo-Polynesian, Philippine, Greater Central Philippine, Danao, Magindanao http://www.ethnologue.com/show_lang_family.asp?code=mdh 1000000 lki Laki Iran Western Iran, Ilam, Lorestan provinces, cities of Aleshtar, Kuhdesht, Nurabad-e Dolfan, Khorramabad. Indo-European, Indo-Iranian, Iranian, Western, Northwestern, Kurdish http://www.ethnologue.com/show_lang_family.asp?code=lki 1000000 kmc Dong, Southern China Area where west Hunan and north Guangxi provinces meet, southeast Guizhou (Yuping Autonomous County); Guangxi Zhuang Autonomous Region. 20 contiguous counties. Tai-Kadai, Kam-Tai, Kam-Sui http://www.ethnologue.com/show_lang_family.asp?code=kmc 1000000 jax Malay, Jambi Indonesia (Sumatra) Southeast Sumatra, Jambi Province. Austronesian, Malayo-Polynesian, Malayo-Sumbawan, North and East, Malayic, Malay http://www.ethnologue.com/show_lang_family.asp?code=jax A member of macrolanguage Malay [msa http://www.ethnologue.com/show_language.asp?code=msa] (Malaysia http://www.ethnologue.com/show_country.asp?name=MY). 1000000 ijc Izon Nigeria Bayelsa state, Yenagoa, South Ijaw, Kolokuma-Opokuma, Ekeremor, and Sagbama LGAs; Delta state, Burutu, Warri, and Ughelli LGAs; Ondo state, Ilaje, Ese-Odo LGAs; Ekiti state, Ikole LGA. Latin script. Niger-Congo, Atlantic-Congo, Ijoid, Ijo, West Ijo http://www.ethnologue.com/show_lang_family.asp?code=ijc 1000000 igb Ebira Nigeria Kwara state, Okene, Okehi, and Kogi LGAs; Nassarawa state, Nasarawa LGA; Edo state, Akoko-Edo LGA. Latin script. Niger-Congo, Atlantic-Congo, Volta-Congo, Benue-Congo, Nupoid, Ebira-Gade http://www.ethnologue.com/show_lang_family.asp?code=igb 1000000 eth Ethiopian Sign Language Ethiopia Deaf sign language http://www.ethnologue.com/show_lang_family.asp?code=eth 1000000 diq Dimli incubator-wiktionary Turkey (Asia) East central, mainly Elazig, Bingol, and Diyarbakir provinces, upper courses of Euphrates, Kizilirmaq, and Murat rivers. Also in Germany. Latin script. Indo-European, Indo-Iranian, Iranian, Western, Northwestern, Zaza-Gorani http://www.ethnologue.com/show_lang_family.asp?code=diq A member of macrolanguage Zaza [zza http://www.ethnologue.com/show_language.asp?code=zza] (Turkey (Asia) http://www.ethnologue.com/show_country.asp?name=TRA). 1000000 bqi Bakhtiâri incubator-wikipedia Iran Southwest Iran: west Chahar-Mahal va Bakhtiari, east Khuzestan, east Lorestan, west Esfahan. Masjed-e Soleiman, Shahr-e Kord, Dorud. Arabic script. Indo-European, Indo-Iranian, Iranian, Western, Southwestern, Luri http://www.ethnologue.com/show_lang_family.asp?code=bqi 1000000 bmm Malagasy, Northern Betsimisaraka Madagascar East coast, Toamasina Province, Mananara Avaratra, Soanierana-Ivongo, Fenoarivo Antsinana, Vavatenina, Toamasina districts. Austronesian, Malayo-Polynesian, Greater Barito, East, Malagasy http://www.ethnologue.com/show_lang_family.asp?code=bmm A member of macrolanguage Malagasy [mlg http://www.ethnologue.com/show_language.asp?code=mlg] (Madagascar http://www.ethnologue.com/show_country.asp?name=MG). 1000000 bin Edo incubator-wikipedia Nigeria Bendel state, Ovia, Oredo, and Orhionmwon LGAs. Latin script. Niger-Congo, Atlantic-Congo, Volta-Congo, Benue-Congo, Edoid, North-Central, Edo-Esan-Ora http://www.ethnologue.com/show_lang_family.asp?code=bin 1000000 bfz Pahari, Mahasu India Himachal Pradesh, Shimla (Simla) and Solan districts. Devanagari script. Indo-European, Indo-Iranian, Indo-Aryan, Northern zone, Western Pahari http://www.ethnologue.com/show_lang_family.asp?code=bfz
An interesting "aside" on this would be...
What is the quality of the foreign-language Wiki's that currently exist. For example; the articles in my specific technical topic area have a few foreign language equivalents. Most are two or three lines.
It would be interesting to see this question expanded to "what number of languages with more than 1M readers do not have a Wikipedia with a reasonable depth of coverage".
Tom
On 05/22/2011 06:41 PM, Thomas Morton wrote:
An interesting "aside" on this would be...
What is the quality of the foreign-language Wiki's that currently exist. For example; the articles in my specific technical topic area have a few foreign language equivalents. Most are two or three lines.
It would be interesting to see this question expanded to "what number of languages with more than 1M readers do not have a Wikipedia with a reasonable depth of coverage".
I haven't understood you. Articles in English Wikipedia are very good and English is a foreign language.
On Sun, May 22, 2011 at 11:18 PM, Milos Rancic millosh@gmail.com wrote:
On 05/22/2011 06:41 PM, Thomas Morton wrote:
An interesting "aside" on this would be...
What is the quality of the foreign-language Wiki's that currently exist. For example; the articles in my specific technical topic area have a few foreign language equivalents. Most are two or three lines.
It would be interesting to see this question expanded to "what number of languages with more than 1M readers do not have a Wikipedia with a reasonable depth of coverage".
I haven't understood you. Articles in English Wikipedia are very good and English is a foreign language.
That answers the question for one language (although using an unfounded sweeping generalisation - I am pretty sure that not _all_ articles in the English Wikipedia are 'very good'), leaves 280 languages (or 279 if you consider one of them not a foreign language) to answer the question for.
Sorry Milos, systematic bias at work there :)
I mean... English Wikipedia has the largest collection of content, and on average I would guess that the content is of a higher, or at least a more complete, quality than most other language wiki's. Computer forensics... is relatively decent on en-wp; it has a number of foreign language equivilants but none of them are of particular length or completeness.
What I am really saying is.. alongside the figures of how many language Wiki's we are missing should be a figure of how many language Wiki's don't yet constitute a reasonably detailed encyclopaedia.
Just theorizing on a related topic :)
Tom
On 22 May 2011 22:18, Milos Rancic millosh@gmail.com wrote:
On 05/22/2011 06:41 PM, Thomas Morton wrote:
An interesting "aside" on this would be...
What is the quality of the foreign-language Wiki's that currently exist.
For
example; the articles in my specific technical topic area have a few
foreign
language equivalents. Most are two or three lines.
It would be interesting to see this question expanded to "what number of languages with more than 1M readers do not have a Wikipedia with a reasonable depth of coverage".
I haven't understood you. Articles in English Wikipedia are very good and English is a foreign language.
foundation-l mailing list foundation-l@lists.wikimedia.org Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l
My blog is my hobby....
http://sdgunung03.blogspot.com
Powered by Telkomsel BlackBerry®
-----Original Message----- From: Thomas Morton morton.thomas@googlemail.com Sender: foundation-l-bounces@lists.wikimedia.org Date: Mon, 23 May 2011 00:38:24 To: Wikimedia Foundation Mailing Listfoundation-l@lists.wikimedia.org Reply-To: Wikimedia Foundation Mailing List foundation-l@lists.wikimedia.org Subject: Re: [Foundation-l] 1.3 billion of humans don't have Wikipedia in their native language
Sorry Milos, systematic bias at work there :)
I mean... English Wikipedia has the largest collection of content, and on average I would guess that the content is of a higher, or at least a more complete, quality than most other language wiki's. Computer forensics... is relatively decent on en-wp; it has a number of foreign language equivilants but none of them are of particular length or completeness.
What I am really saying is.. alongside the figures of how many language Wiki's we are missing should be a figure of how many language Wiki's don't yet constitute a reasonably detailed encyclopaedia.
Just theorizing on a related topic :)
Tom
On 22 May 2011 22:18, Milos Rancic millosh@gmail.com wrote:
On 05/22/2011 06:41 PM, Thomas Morton wrote:
An interesting "aside" on this would be...
What is the quality of the foreign-language Wiki's that currently exist.
For
example; the articles in my specific technical topic area have a few
foreign
language equivalents. Most are two or three lines.
It would be interesting to see this question expanded to "what number of languages with more than 1M readers do not have a Wikipedia with a reasonable depth of coverage".
I haven't understood you. Articles in English Wikipedia are very good and English is a foreign language.
foundation-l mailing list foundation-l@lists.wikimedia.org Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l
_______________________________________________ foundation-l mailing list foundation-l@lists.wikimedia.org Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l
On Sun, May 22, 2011 at 5:30 AM, Milos Rancic millosh@gmail.com wrote:
(excellent long form work)
Thank you, Milos. Very informative.
Out of curiosity - I assume those are the "native speakers" counts for that language. Do we have "exclusive speakers" counts as well?
I don't know for sure what the right answer is to this, but one could assert and consider that we may want to preferentially support those who don't speak a more common national language that already has content.
On 05/22/2011 01:28 PM, George Herbert wrote:
Good work generally, but regarding this last list...
Afghanistan has many languages in use (Pashto, Tajik, Hazara, Uzbek); Algeria uses Arabic, Berber, and French; Jordan's official language is Arabic (though the spoken one is a dialect); and generally so forth.
Can you break this out by which languages we are missing, not just by country, as country isn't specific enough?
Waiting for list admins to allow ~250k mail :)
On Sun, 2011-05-22 at 14:32 +0200, Milos Rancic wrote:
On 05/22/2011 01:28 PM, George Herbert wrote:
Can you break this out by which languages we are missing, not just
by
country, as country isn't specific enough?
Waiting for list admins to allow ~250k mail :)
If only someone would make some website anyone could edit, you could make such a list there and send us the link :)
On 05/22/2011 03:22 PM, Nikola Smolenski wrote:
On Sun, 2011-05-22 at 14:32 +0200, Milos Rancic wrote:
On 05/22/2011 01:28 PM, George Herbert wrote:
Can you break this out by which languages we are missing, not just
by
country, as country isn't specific enough?
Waiting for list admins to allow ~250k mail :)
If only someone would make some website anyone could edit, you could make such a list there and send us the link :)
The results are preliminary, raw and with possible errors. On internal-l I've been informed that Filipino shouldn't be treated as valid Wikimedia language.
I sent that inside of email to address curiosity. (And I hope that some good list admin will press the right button :) )
On Sun, 2011-05-22 at 13:15 +0200, Milos Rancic wrote:
- Jin Chinese, 45M, China
- Xiang Chinese, 36, China, incubator
- Min Bei Chinese, 10.3M, China, incubator
Aren't these languages written with Chinese characters and thus their speakers can read and write the Chinese Wikipedia?
On 05/22/2011 01:37 PM, Nikola Smolenski wrote:
On Sun, 2011-05-22 at 13:15 +0200, Milos Rancic wrote:
- Jin Chinese, 45M, China
- Xiang Chinese, 36, China, incubator
- Min Bei Chinese, 10.3M, China, incubator
Aren't these languages written with Chinese characters and thus their speakers can read and write the Chinese Wikipedia?
Chinese script is logo-syllabic. That means that other Chinese languages may be different from the standard one in syllabic and syntactic part.
For example, you don't want to write in English phrase like "boy girl and" (different syntax), as well as it would sound strange to write "boyf and girlf" for plural (different suffixes/syllables), even the roots could be written at the same way.
Besides that, two of three languages have incubator projects, which means that they think that they should have Wikipedia in their languages.
On Sun, 2011-05-22 at 14:47 +0200, Milos Rancic wrote:
On 05/22/2011 01:37 PM, Nikola Smolenski wrote:
On Sun, 2011-05-22 at 13:15 +0200, Milos Rancic wrote:
- Jin Chinese, 45M, China
- Xiang Chinese, 36, China, incubator
- Min Bei Chinese, 10.3M, China, incubator
Aren't these languages written with Chinese characters and thus their speakers can read and write the Chinese Wikipedia?
Chinese script is logo-syllabic. That means that other Chinese languages may be different from the standard one in syllabic and syntactic part.
Yea, but how much is any of these different? They may be practically the same or possible to be added as a variant of Chinese.
On 05/22/2011 03:20 PM, Nikola Smolenski wrote:
Yea, but how much is any of these different? They may be practically the same or possible to be added as a variant of Chinese.
I don't know, but the known fact is that even the languages in the closest families, like Slavic languages are, have different suffixes, different usage of prepositions and different meanings of the same words.
My blog is my hobby http://sdgunung03.blogspot.com Powered by Telkomsel BlackBerry®
-----Original Message----- From: Milos Rancic millosh@gmail.com Sender: foundation-l-bounces@lists.wikimedia.org Date: Sun, 22 May 2011 14:47:05 To: Wikimedia Foundation Mailing Listfoundation-l@lists.wikimedia.org Reply-To: Wikimedia Foundation Mailing List foundation-l@lists.wikimedia.org Subject: Re: [Foundation-l] 1.3 billion of humans don't have Wikipedia in their native language
On 05/22/2011 01:37 PM, Nikola Smolenski wrote:
On Sun, 2011-05-22 at 13:15 +0200, Milos Rancic wrote:
- Jin Chinese, 45M, China
- Xiang Chinese, 36, China, incubator
- Min Bei Chinese, 10.3M, China, incubator
Aren't these languages written with Chinese characters and thus their speakers can read and write the Chinese Wikipedia?
Chinese script is logo-syllabic. That means that other Chinese languages may be different from the standard one in syllabic and syntactic part.
For example, you don't want to write in English phrase like "boy girl and" (different syntax), as well as it would sound strange to write "boyf and girlf" for plural (different suffixes/syllables), even the roots could be written at the same way.
Besides that, two of three languages have incubator projects, which means that they think that they should have Wikipedia in their languages.
_______________________________________________ foundation-l mailing list foundation-l@lists.wikimedia.org Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l
Hi Milos, thanks for those interesting facts!
On 05/22/2011 01:15 PM, Milos Rancic wrote:
Those are preliminary results. We have two chapters (and strategic focus) in countries of the list above. Inside of the longer list, which should be verified, we have more chapters. I noted that there are even two languages of Germany without Wikipedia, but with more than million of speakers: Mainfränkisch and Upper Saxon (the later one without test Wikipedia).
These are German dialects and (I suppose) everyone who speaks them, also understands German ("hochdeutsche") Wikipedia. Almost all written text is in German, those dialects are exclusively used in the spoken language (there might be some rare exceptions). So I guess while it's fine to have Wikipedia language version in local dialects of German, it should not be a priority. Denny's question is key:
On 05/22/2011 01:22 PM, Denny Vrandecic wrote:
Can you also provide at least a rough estimate of the number of humans who don't have Wikipedia in any language they speak reasonably well (understanding very well that this is a hard problem of definition)?
-- Tobias
My blog is my hobby http://sdgunung03.blogspot.com Powered by Telkomsel BlackBerry®
-----Original Message----- From: "church.of.emacs.ml" church.of.emacs.ml@googlemail.com Sender: foundation-l-bounces@lists.wikimedia.org Date: Sun, 22 May 2011 15:45:21 To: Wikimedia Foundation Mailing Listfoundation-l@lists.wikimedia.org Reply-To: Wikimedia Foundation Mailing List foundation-l@lists.wikimedia.org Subject: Re: [Foundation-l] 1.3 billion of humans don't have Wikipedia in their native language
_______________________________________________ foundation-l mailing list foundation-l@lists.wikimedia.org Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l
On Sun, May 22, 2011 at 1:15 PM, Milos Rancic millosh@gmail.com wrote:
Those are preliminary results. We have two chapters (and strategic focus) in countries of the list above. Inside of the longer list, which should be verified, we have more chapters. I noted that there are even two languages of Germany without Wikipedia, but with more than million of speakers: Mainfränkisch and Upper Saxon (the later one without test Wikipedia).
I think that also shows that this is quite relative. People who speak Mainfränkisch or Upper Saxon, will use German in more formal situations; in particular, the great majority will read and write German more and better than their regional language, even when they are using the second when talking with their family members or neighbours. This may well be the case in some of the cases you give too. Which does not mean we should not include those languages, but it does mean that it does not have the high priority that "a multi million language that we do not have" seems to entail.
On Sun, May 22, 2011 at 1:15 PM, Milos Rancic millosh@gmail.com wrote:
I am preparing document for Wikimania. Presently, I am in process of analyzing data (SIL [1], Ethnologue [2], Wikimedia projects). I am using Ethnologue data for population estimates.
The statistics are not realistic considering only the speakers.
The "correct" statistics should have a "maturity model" to check if one language can receive one Wikipedia.
This means, at least, to consider:
a) number of potential writers/readers b) percentage of illiteracy c) level of education d) computer literacy
This means that there is no sense to say that 30 Millions of Nigerian pidgin don't have a Wikipedia if this language is used for daily communication, it is not written or, if written, it is spoken by a population of 35% of literacy and 2% of persons with sufficient education level (these are not real data, but it's only an example).
This means that the Pidgin Nigerian has a potential population of Wikipedia's users/writers of less than 1 million of persons, less than a dialect spoken in a region of Europe where the literacy is higher.
I use the definition of user/writer because there is no sense to have a Wikipedia with passive users.
Honestly I see a lot of statistics in the bosom of WMF with the *wrong* acceptance that one speaker = one potential wikipedian, this means that all strategies are wrong because they start from a wrong theory.
I know that it's simple to put in one side the number of speakers and in the other side the number of users or the presence of one Wikipedia, but the world is not so simple. In that way the error is so high that all definitions are not realistic and probably "surrealistic".
Ilario
On 05/24/2011 03:03 PM, Ilario Valdelli wrote:
On Sun, May 22, 2011 at 1:15 PM, Milos Rancic millosh@gmail.com wrote:
I am preparing document for Wikimania. Presently, I am in process of analyzing data (SIL [1], Ethnologue [2], Wikimedia projects). I am using Ethnologue data for population estimates.
The statistics are not realistic considering only the speakers.
The "correct" statistics should have a "maturity model" to check if one language can receive one Wikipedia.
This means, at least, to consider:
a) number of potential writers/readers b) percentage of illiteracy c) level of education d) computer literacy
This means that there is no sense to say that 30 Millions of Nigerian pidgin don't have a Wikipedia if this language is used for daily communication, it is not written or, if written, it is spoken by a population of 35% of literacy and 2% of persons with sufficient education level (these are not real data, but it's only an example).
This means that the Pidgin Nigerian has a potential population of Wikipedia's users/writers of less than 1 million of persons, less than a dialect spoken in a region of Europe where the literacy is higher.
I use the definition of user/writer because there is no sense to have a Wikipedia with passive users.
Honestly I see a lot of statistics in the bosom of WMF with the *wrong* acceptance that one speaker = one potential wikipedian, this means that all strategies are wrong because they start from a wrong theory.
I know that it's simple to put in one side the number of speakers and in the other side the number of users or the presence of one Wikipedia, but the world is not so simple. In that way the error is so high that all definitions are not realistic and probably "surrealistic".
That doesn't mean that there are no 30 millions of humans there. That just means that our strategy for reaching them should be different and on longer term.
BTW, points like your own are valid, but just if they have proposals how to deal with such problems.
I started article [1] at Strategy wiki. Feel free to add you own.
Zitat von Ilario Valdelli valdelli@gmail.com:
On Sun, May 22, 2011 at 1:15 PM, Milos Rancic millosh@gmail.com wrote:
I am preparing document for Wikimania. Presently, I am in process of analyzing data (SIL [1], Ethnologue [2], Wikimedia projects). I am using Ethnologue data for population estimates.
The statistics are not realistic considering only the speakers.
The "correct" statistics should have a "maturity model" to check if one language can receive one Wikipedia.
This means, at least, to consider:
a) number of potential writers/readers b) percentage of illiteracy c) level of education d) computer literacy
This means that there is no sense to say that 30 Millions of Nigerian pidgin don't have a Wikipedia if this language is used for daily communication, it is not written or, if written, it is spoken by a population of 35% of literacy and 2% of persons with sufficient education level (these are not real data, but it's only an example).
This means that the Pidgin Nigerian has a potential population of Wikipedia's users/writers of less than 1 million of persons, less than a dialect spoken in a region of Europe where the literacy is higher.
But don't forget that the users follow the content. As soon as the internet provides decent content in Nigerian pidgin the content consumers will also appear. And if there's a wide variety of interesting content relevant to the speakers of Nigerian pidgin the literacy rate for the language will also rise automatically. As was said, some people learn to read to be able to go on Facebook. A supply of content is a prerequisite to stimulate demand in content.
A single dedicated person could be enough to put a project in motion. A dean of a Nigerian college who integrates Wikipedia article creation in the instruction plan ("if you create 200 Nigerian pidgin Wikipedia articles this semester you'll get X extra credit points for your degree") could be enough to get the project to 100,000 articles in a year (200 articles*2 semesters*250 students = 100,000 articles in a year).
Wouldn't be an "encyclopedia written by volunteer contributors", but if the project has 100,000 decent articles after a year some people will probably stay with the project or join it as volunteers. After all one of the biggest problems of a fledgling Wikipedia is that few people have the dedication to work on a project that will probably not "take off" for years to come. It's easier to win contributors if the project already has taken off and people know that they are not wasting their time.
A teacher in Nigeria gets about 50,000 Naira a year (says Google), that's about 4500 Euro. For one million Euro you should be able to hire 100 educated people to write Wikipedia article as a full job. Depending on the detailedness you aim for, that should be enough to create 100,000 articles a year. With 10 million Euro you could make Nigerian pidgin Wikipedia a million article project.
10 million is a big number, but not a number that's not spend for much dumber things than a Wikipedia every day. There are many billionaires in the world and a single one of them could push 100 languages to the million article milestone.
Just some thought games with numbers. I guess many people wouldn't like Wikipedia being written by paid staff. Me neither. But it's better than only waiting for volunteers who may never join because without some boosting the projects lack the criticial mass to reach a state of sustainable growth.
If we just want to wait for volunteers to appear by themselves the whole discussion in this thread is stale, because that's already what we have done so far and it didn't achieve much for the non-nation-state languages.
Marcus Buck User:Slomox
Zitat von me@marcusbuck.org:
A supply of content is a prerequisite to stimulate demand in content.
After all one of the biggest problems of a fledgling Wikipedia is that few people have the dedication to work on a project that will probably not "take off" for years to come. It's easier to win contributors if the project already has taken off and people know that they are not wasting their time.
To further emphasize this: Wikipedia was created in 2001. Why 2001? The Internet existed since 1991 and it wouldn't have been hard to create a wiki-like site to build a free encyclopedia. After all Tim Berners-Lee's idea of the Internet was that of a wiki-like editable net. So why did it take 10 years to build the first successful free encyclopedia? The idea existed but nobody realized it. Nobody had enough faith in the idea to dedicate time on building it and those who tried did not reach the critical mass to take off, probably due to lack of dedication to the project when nobody was sure whether it could work.
The same happens nowadays. There sure are many people who would like to have decent content in their native language. But they have no faith that anybody would ever read it and they fear that they waste their life writing stuff nobody will read. People _will_ read it, but only after the project has reached a state where it's worth it looking up things on it. If 70% of your searches deliver no useful result, you enthusiasm as a reader will not be great. If 90% of your searches deliver useful results you will probably accept it as your main body of reference.
This gap between the start and becoming useful to readers was easily taken by the big languages that have many educated speakers with much free time. But the less "strong" a language is the more this gap becomes an obstacle that cannot be taken.
Marcus Buck User:Slomox
On Tue, May 24, 2011 at 4:47 PM, me@marcusbuck.org wrote:
A single dedicated person could be enough to put a project in motion. A dean of a Nigerian college who integrates Wikipedia article creation in the instruction plan ("if you create 200 Nigerian pidgin Wikipedia articles this semester you'll get X extra credit points for your degree") could be enough to get the project to 100,000 articles in a year (200 articles*2 semesters*250 students = 100,000 articles in a year).
I don't agree. Wikipedia is a "collaborative" encyclopedia, it's not an encyclopedia.
It means that one person cannot "drive" the project because he will impose a single point of view.
It makes sense where there are no encyclopedia in this language and Wikipedia can be the first one, but it should be interesting to analyze why there were no encyclopedias before.
I have experienced this solution in some minor languages and it doesn't work. It's difficult to aggregate people around a small core of articles because they are attracted by more active languages or because they don't have sufficient knowledge of their daily language to put their ideas in written sentences. It seems strange, but if someone should use their daily language (technically it's a "change of linguistic register") to write something, they like to switch language and to use English or Hindi or Chinese.
Some languages don't have a literature, don't have words to translate technical words of legal words, don't have a dictionary or a formal grammar. It means that the community should build their written language around Wikipedia in order to start to contribute. It's another project.
Ilario
An'n 24.05.2011 19:13, hett Ilario Valdelli schreven:
On Tue, May 24, 2011 at 4:47 PM,me@marcusbuck.org wrote:
A single dedicated person could be enough to put a project in motion. A dean of a Nigerian college who integrates Wikipedia article creation in the instruction plan ("if you create 200 Nigerian pidgin Wikipedia articles this semester you'll get X extra credit points for your degree") could be enough to get the project to 100,000 articles in a year (200 articles*2 semesters*250 students = 100,000 articles in a year).
I don't agree. Wikipedia is a "collaborative" encyclopedia, it's not an encyclopedia.
It means that one person cannot "drive" the project because he will impose a single point of view.
I didn't say anything about a single person somehow taking control. I just said that a single person is enough to kickstart a Wikipedia into vivid activity.
It makes sense where there are no encyclopedia in this language and Wikipedia can be the first one, but it should be interesting to analyze why there were no encyclopedias before.
I have experienced this solution in some minor languages and it doesn't work.
Feel free to share links so others can learn from it.
It's difficult to aggregate people around a small core of articles because they are attracted by more active languages or because they don't have sufficient knowledge of their daily language to put their ideas in written sentences. It seems strange, but if someone should use their daily language (technically it's a "change of linguistic register") to write something, they like to switch language and to use English or Hindi or Chinese.
Some languages don't have a literature, don't have words to translate technical words of legal words, don't have a dictionary or a formal grammar. It means that the community should build their written language around Wikipedia in order to start to contribute. It's another project.
There's hardly any language in the world with a sizable number of speakers that hasn't ever been written. All languages have something to build on. Of course you won't be able to write about quantum physics in Nigerian pidgin without importing terms from English. The solution: start with writing articles about topics that are relevant to the community. If it's relevant to the community they will have words for it. If the language lacks a write dictionary: collect the words in Wiktionary.
The problems you name are challenges that need to be addressed while building the project but they are certainly no showstoppers.
Marcus Buck User:Slomox
wikimedia-l@lists.wikimedia.org