When you get data, at some point of time you start thinking about
quite fringe comparisons. But that could actually give some useful
conclusions, like this time it did [1].
We did the next:
* Used the number of primary speakers from Ethnologue. (Erik Zachte is
using approximate number of primary + secondary speakers; that could
be good for correction of this data.)
* Categorized languages according to the logarithmic number of
speakers: >=10k, >=100k, >=1M, >=10M, >=100M.
* Took the number of articles of Wikipedia in particular language and
created ration (number of articles / number of speakers).
* This list is consisted just of languages with Ethnologue status 1
(national), 2 (provincial) or 3 (wider communication). In fact, we
have a lot of projects (more than 100) with worse language status; a
number of them are actually threatened or even on the edge of
extinction.
Those are the preliminary results and I will definitely have to pass
through all the numbers. I fixed manually some serious errors, like
not having English Wikipedia itself inside of data :D
Putting the languages into the logarithmic categories proved to be
useful, as we are now able to compare the Wikipedias according to
their gross capacity (numbers of speakers). I suppose somebody well
introduced into statistics could even create the function which could
be used to check how good one project stays, no matter of those strict
categories.
It's obvious that as more speakers one language has, it's harder to
the community to follow the ratio.
So, the winners per category are:
1) >= 1k: Hawaiian, ratio 0.96900
2) >= 10k: Mirandese, ratio 0.18073
3) >= 100k: Basque, ratio 0.38061
4) >= 1M: Swedish, ratio 0.21381
5) >= 10M: Dutch, ratio 0.08305
6) >= 100M: English, ratio 0.01447
However, keep in mind that we removed languages not inside categories
1, 2 or 3. That affected >=10k languages, as, for example, Upper
Sorbian stays much better than Mirandese (0.67). (Will fix it while
creating the full report. Obviously, in this case logarithmic
categories of numbers of speakers are much more important than what's
the state of the language.)
It's obvious that we could draw the line between 1:1 for 1-10k
speakers to 10:1 for >=100M speakers. But, again, I would like to get
input of somebody more competent.
One very important category is missing here and it's about the level
of development of the speakers. That could be added: GDP/PPP per
capita for spoken country or countries would be useful as measurement.
And I suppose somebody with statistical knowledge would be able to
give us the number which would have meaning "ability to create
Wikipedia article".
Completed in such way, we'd be able to measure the success of
particular Wikimedia groups and organizations. OK. Articles per
speaker are not the only way to do so, but we could use other
parameters, as well: number of new/active/very active editors etc. And
we could put it into time scale.
I'll make some other results. And to remind: I'd like to have the
formula to count "ability to create Wikipedia article" and then to
produce "level of particular community success in creating Wikipedia
articles". And, of course, to implement it for editors.
[1] https://docs.google.com/spreadsheets/d/1TYyhETevEJ5MhfRheRn-aGc4cs_6k45Gwk_…
Asaf Bartov, 13/06/2015 02:42:
> The (already existing) metric of active-editors-per-million-speakers is,
> it seems to me, a far more robust metric. Erik Z.'s stats.wikimedia.org
> <http://stats.wikimedia.org> is offering that metric.
I personally agree on this in general, but Millosh is trying something
different in his current quest, i.e. content ingestion and content
coverage assessment, also for missing language subdomains. (By the way,
I created the category, please add stuff:
https://meta.wikimedia.org/wiki/Category:Content_coverage .)
Mere article count tells us very little and he acknowledged it. As you
added analytics: maybe when https://phabricator.wikimedia.org/T44259 is
fixed we can also do fancy things like join various tables and count
(countable) articles above a minimum threshold of hits, or something
like that.
Oh, and the total number of internal links in a wiki is also an
interesting metric in many cases: they're often a good indicator of how
curated a wiki globally is, while bot-created articles are often orphan.
(Locally there might be overlinking but that's rarely a wiki-wide
issue.) I don't remember how reliable the WikiStats numbers are, but
they often give a good clue already.
Nemo
Read the rest :P
On Jun 13, 2015 02:43, "Asaf Bartov" <abartov(a)wikimedia.org> wrote:
> (adding Analytics, as a relevant group for this discussion.)
>
> I think this is next to meaningless, because the differing bot policies and
> practices on different wikis skew the data into incoherence.
>
> The (already existing) metric of active-editors-per-million-speakers is, it
> seems to me, a far more robust metric. Erik Z.'s stats.wikimedia.org is
> offering that metric.
>
> A.
>
> On Sun, Jun 7, 2015 at 3:23 PM, Milos Rancic <millosh(a)gmail.com> wrote:
>
> > When you get data, at some point of time you start thinking about
> > quite fringe comparisons. But that could actually give some useful
> > conclusions, like this time it did [1].
> >
> > We did the next:
> > * Used the number of primary speakers from Ethnologue. (Erik Zachte is
> > using approximate number of primary + secondary speakers; that could
> > be good for correction of this data.)
> > * Categorized languages according to the logarithmic number of
> > speakers: >=10k, >=100k, >=1M, >=10M, >=100M.
> > * Took the number of articles of Wikipedia in particular language and
> > created ration (number of articles / number of speakers).
> > * This list is consisted just of languages with Ethnologue status 1
> > (national), 2 (provincial) or 3 (wider communication). In fact, we
> > have a lot of projects (more than 100) with worse language status; a
> > number of them are actually threatened or even on the edge of
> > extinction.
> >
> > Those are the preliminary results and I will definitely have to pass
> > through all the numbers. I fixed manually some serious errors, like
> > not having English Wikipedia itself inside of data :D
> >
> > Putting the languages into the logarithmic categories proved to be
> > useful, as we are now able to compare the Wikipedias according to
> > their gross capacity (numbers of speakers). I suppose somebody well
> > introduced into statistics could even create the function which could
> > be used to check how good one project stays, no matter of those strict
> > categories.
> >
> > It's obvious that as more speakers one language has, it's harder to
> > the community to follow the ratio.
> >
> > So, the winners per category are:
> > 1) >= 1k: Hawaiian, ratio 0.96900
> > 2) >= 10k: Mirandese, ratio 0.18073
> > 3) >= 100k: Basque, ratio 0.38061
> > 4) >= 1M: Swedish, ratio 0.21381
> > 5) >= 10M: Dutch, ratio 0.08305
> > 6) >= 100M: English, ratio 0.01447
> >
> > However, keep in mind that we removed languages not inside categories
> > 1, 2 or 3. That affected >=10k languages, as, for example, Upper
> > Sorbian stays much better than Mirandese (0.67). (Will fix it while
> > creating the full report. Obviously, in this case logarithmic
> > categories of numbers of speakers are much more important than what's
> > the state of the language.)
> >
> > It's obvious that we could draw the line between 1:1 for 1-10k
> > speakers to 10:1 for >=100M speakers. But, again, I would like to get
> > input of somebody more competent.
> >
> > One very important category is missing here and it's about the level
> > of development of the speakers. That could be added: GDP/PPP per
> > capita for spoken country or countries would be useful as measurement.
> > And I suppose somebody with statistical knowledge would be able to
> > give us the number which would have meaning "ability to create
> > Wikipedia article".
> >
> > Completed in such way, we'd be able to measure the success of
> > particular Wikimedia groups and organizations. OK. Articles per
> > speaker are not the only way to do so, but we could use other
> > parameters, as well: number of new/active/very active editors etc. And
> > we could put it into time scale.
> >
> > I'll make some other results. And to remind: I'd like to have the
> > formula to count "ability to create Wikipedia article" and then to
> > produce "level of particular community success in creating Wikipedia
> > articles". And, of course, to implement it for editors.
> >
> > [1]
> >
> https://docs.google.com/spreadsheets/d/1TYyhETevEJ5MhfRheRn-aGc4cs_6k45Gwk_…
> >
> > _______________________________________________
> > Wikimedia-l mailing list, guidelines at:
> > https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines
> > Wikimedia-l(a)lists.wikimedia.org
> > Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l,
> > <mailto:wikimedia-l-request@lists.wikimedia.org?subject=unsubscribe>
>
>
>
>
> --
> Asaf Bartov
> Wikimedia Foundation <http://www.wikimediafoundation.org>
>
> Imagine a world in which every single human being can freely share in the
> sum of all knowledge. Help us make it a reality!
> https://donate.wikimedia.org
> _______________________________________________
> Wikimedia-l mailing list, guidelines at:
> https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines
> Wikimedia-l(a)lists.wikimedia.org
> Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l,
> <mailto:wikimedia-l-request@lists.wikimedia.org?subject=unsubscribe>
Read the rest :P
On Jun 13, 2015 02:43, "Asaf Bartov" <abartov(a)wikimedia.org> wrote:
> (adding Analytics, as a relevant group for this discussion.)
>
> I think this is next to meaningless, because the differing bot policies and
> practices on different wikis skew the data into incoherence.
>
> The (already existing) metric of active-editors-per-million-speakers is, it
> seems to me, a far more robust metric. Erik Z.'s stats.wikimedia.org is
> offering that metric.
>
> A.
>
> On Sun, Jun 7, 2015 at 3:23 PM, Milos Rancic <millosh(a)gmail.com> wrote:
>
> > When you get data, at some point of time you start thinking about
> > quite fringe comparisons. But that could actually give some useful
> > conclusions, like this time it did [1].
> >
> > We did the next:
> > * Used the number of primary speakers from Ethnologue. (Erik Zachte is
> > using approximate number of primary + secondary speakers; that could
> > be good for correction of this data.)
> > * Categorized languages according to the logarithmic number of
> > speakers: >=10k, >=100k, >=1M, >=10M, >=100M.
> > * Took the number of articles of Wikipedia in particular language and
> > created ration (number of articles / number of speakers).
> > * This list is consisted just of languages with Ethnologue status 1
> > (national), 2 (provincial) or 3 (wider communication). In fact, we
> > have a lot of projects (more than 100) with worse language status; a
> > number of them are actually threatened or even on the edge of
> > extinction.
> >
> > Those are the preliminary results and I will definitely have to pass
> > through all the numbers. I fixed manually some serious errors, like
> > not having English Wikipedia itself inside of data :D
> >
> > Putting the languages into the logarithmic categories proved to be
> > useful, as we are now able to compare the Wikipedias according to
> > their gross capacity (numbers of speakers). I suppose somebody well
> > introduced into statistics could even create the function which could
> > be used to check how good one project stays, no matter of those strict
> > categories.
> >
> > It's obvious that as more speakers one language has, it's harder to
> > the community to follow the ratio.
> >
> > So, the winners per category are:
> > 1) >= 1k: Hawaiian, ratio 0.96900
> > 2) >= 10k: Mirandese, ratio 0.18073
> > 3) >= 100k: Basque, ratio 0.38061
> > 4) >= 1M: Swedish, ratio 0.21381
> > 5) >= 10M: Dutch, ratio 0.08305
> > 6) >= 100M: English, ratio 0.01447
> >
> > However, keep in mind that we removed languages not inside categories
> > 1, 2 or 3. That affected >=10k languages, as, for example, Upper
> > Sorbian stays much better than Mirandese (0.67). (Will fix it while
> > creating the full report. Obviously, in this case logarithmic
> > categories of numbers of speakers are much more important than what's
> > the state of the language.)
> >
> > It's obvious that we could draw the line between 1:1 for 1-10k
> > speakers to 10:1 for >=100M speakers. But, again, I would like to get
> > input of somebody more competent.
> >
> > One very important category is missing here and it's about the level
> > of development of the speakers. That could be added: GDP/PPP per
> > capita for spoken country or countries would be useful as measurement.
> > And I suppose somebody with statistical knowledge would be able to
> > give us the number which would have meaning "ability to create
> > Wikipedia article".
> >
> > Completed in such way, we'd be able to measure the success of
> > particular Wikimedia groups and organizations. OK. Articles per
> > speaker are not the only way to do so, but we could use other
> > parameters, as well: number of new/active/very active editors etc. And
> > we could put it into time scale.
> >
> > I'll make some other results. And to remind: I'd like to have the
> > formula to count "ability to create Wikipedia article" and then to
> > produce "level of particular community success in creating Wikipedia
> > articles". And, of course, to implement it for editors.
> >
> > [1]
> >
> https://docs.google.com/spreadsheets/d/1TYyhETevEJ5MhfRheRn-aGc4cs_6k45Gwk_…
> >
> > _______________________________________________
> > Wikimedia-l mailing list, guidelines at:
> > https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines
> > Wikimedia-l(a)lists.wikimedia.org
> > Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l,
> > <mailto:wikimedia-l-request@lists.wikimedia.org?subject=unsubscribe>
>
>
>
>
> --
> Asaf Bartov
> Wikimedia Foundation <http://www.wikimediafoundation.org>
>
> Imagine a world in which every single human being can freely share in the
> sum of all knowledge. Help us make it a reality!
> https://donate.wikimedia.org
> _______________________________________________
> Wikimedia-l mailing list, guidelines at:
> https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines
> Wikimedia-l(a)lists.wikimedia.org
> Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l,
> <mailto:wikimedia-l-request@lists.wikimedia.org?subject=unsubscribe>
I've created the draft for our future approach in building new
Wikipedia (and other Wikimedia projects) editions:
https://meta.wikimedia.org/wiki/Developing_new_language_editions_of_Wikiped…
Feel free to contribute to it :)
* * *
{{draft}}
The aim of this page is to help creation of new Wikipedia editions.
== Rationale ==
As of 2015, any new request for Wikipedia will almost for sure assume
specific anthropological environment.
[[List of articles every Wikipedia should have]] is not just heavily
biased in favor of the broad Western civilization (thousands of years
of Chinese and Indian civilizations have been mostly neglected), but
it's not appropriate approach to the ethnolinguistic communities which
usually don't even have the contemporary concepts. And while the list
has been created as eventualist recommendation, it's de facto used as
requirement for new projects.
That turned to be counter-productive, as people willing to share their
knowledge have been stuck in searching for the right terminology
source, usually not easily accessible; or they've been forced to start
creating neologisms, which are generally forbidden by Wikipedia rules,
as original research.
To move this process from the dead end, we have to adapt our
recommendations from the point of the willingness to create Wikipedia
in particular language to the point of having fully developed
community, capable to deal with contemporary knowledge.
== Steps ==
The steps are broad and presently built on assumption that: (1)
particular language has writing system which could be used in
contemporary computers (i.e. mapped in Unicode, standard fonts exist
and could be easily reached); (2) it has enough bilinguals, capable to
communicate with broader Wikimedia community; (3) internet access
exists in the area (that includes migrants living in capital cities or
other countries).
Those are the minimums necessary for sustainability of
Wikipedia/Wikimedia community. It's likely that we'd build in the
future recommendations for the languages which don't fulfill
requirements above.
=== Step 1: Local knowledge ===
Both -- contemporary educated bilinguals and native monolinguals --
are the most interested in preserving local knowledge. That knowledge
is usually not according to the standards of one contemporary
encyclopedia, but gathering that kind of knowledge is fundamental
because of two reasons: (1) we want them to share their knowledge with
us, no matter if it's mythological or consisted of protoscientific
descriptions of the nature around them; (2) it's necessary to attract
population to build knowledge repository in their native language.
It's likely that this type of knowledge will have common topics all
over the world: what's sky, what's river, local flora and fauna etc.
However, while potential persons engaged in field work could ask them
to describe something particularly, this should be fully open to the
will of local population.
As mentioned above, the first step assumes field work. Persons doing
the job should be equipped with tools and knowledge to record spoken
language and then transcribe it (in Wikisource) and build articles
about particular topics in Wikipedia.
This type of work is against common rules, which forbid original
research. However, this is the only way to start the project. This
type of the knowledge should be the only category which would assume
violating OR rule. (It should be mentioned that this was and is a
common practice on the most not well developed Wikimedia projects,
although they exist in much more developed and literate environment
than the languages with which we are starting to deal.)
As mentioned above, this is also the time when we should start using
Wikisource for collecting data. Depending on the capacity of
particular ethnolinguistic group, it should be hosted either on
Multilingual Wikisource or on their own edition of Wikisource.
Besides Wikisource, Wiktionary is also important. If it's about very
small ethnolinguistic community -- with less than 1,000 speakers and
declining number of them -- it's much more rationale to use one of the
larger Wiktionaries (likely in dominant L2) to describe the meanings
of the words. However, if it's about a sustainable ethnolinguistic
community, it would be better to create Wiktionary in their own
language.
Organizationally, this step -- which could last for a long time, even
after the completion of the next two steps -- assumes significant
field work. This could be done by Wikimedians and Wikimedia
organizations, but it could also be done by unaffiliated
organizations. In the second case, we should coordinate with them.
Besides our own aims, there are many other organizations with similar,
educational goals. It's also useful to work in coordination with them.
For example, if one organization is interested in donating laptops to
particular ethnolinguistic group, we should be ready to prepare
Wikimedia-related program once those people get laptops.
=== Step 2: Primary school knowledge ===
Although this type of knowledge shouldn't be OR, it will largely
diverge from the [[List of articles every Wikipedia should have]]. It
doesn't assume the systematized "most important" concepts of the
contemporary [Western] civilization, but contemporary knowledge useful
for 6-15 years old persons. For example, it's more useful to them to
learn about local writers (not necessarily the writers in particular
language), than about famous artists not situationally important.
I suppose that we should build the lists of the scientific concepts
appropriate for that level of knowledge, while we should leave local
population to build their own lists of social concepts and important
people, important to primary school students to know.
While preparing this work, if relevant academic institutions of
particular ethnolinguistic group exist, the contact should be made
with them, as well as we should coordinate our efforts with them.
At this moment of time, Wikibooks creation makes perfect sense. It's
possible that we could actually build books for particular pouplation,
which would be used in their primary schools.
=== Step 3: Secondary school knowledge ===
This should correspond to a kind of modified [[List of articles every
Wikipedia should have]]. If somebody lives in Latin America, it's more
important to them to have knowledge about important Latin American
writers, than about East European ones. But that list should be
definitely compiled by Wikimedia community.
At the end of this phase, we should have self-sustainable community of
Wikimedia editors, working on their own.
(For list moderators: my previous message was rejected as it was too long;
because of the quoted text below the message. So, just ignore it. Though,
you could definitely raise the limit to 1MB -- from 40kb.)
Sylvian, you've opened one very interesting and important question and it's
about the various states of the languages in the world.
Kichwa is far from being in the worst position, but it's also example for
the languages with which we will be dealing in the future.
I'd suggest the approach in the few phases, as well as it could be the
draft of the road for the similarly developed languages.
1) As you said, the first one has to be about local knowledge. It would be
good to list the categories about which the contributors would write. That
could be a common place for other languages all over the world.
2) In a year or so start writing the most basic scientific articles. I
think we should start with the primary school knowledge, maybe even move to
build the textbooks into the future Kichwa Wikibooks. After we complete it,
it would be possible for Kichwa children to be educated in their native
language.
3) Around that time we should approach Academy and talk with them about the
standardization of terminology. That would allow us to build knowledge on
high school level in five to ten years.
In other words, I'd tell you that you should go with your idea and start
collecting local knowledge in Kichwa.
The only other question is related to the MediaWiki interface. Is it
possible to translate the most common messages in it?
Based on your input I will start collecting recommendations on a Meta page.
Hello all -
I’m new here, but really interested in the topic. My name is Eddie Avila and I’m the Director of Rising Voices, an initiative of Global Voices <http://www.globalvoicesonline.org/>. We work to support new, diverse, and underrepresented voices as they use participatory digital media to tell their own stories on their own terms.
A special interest of ours centers on how the Internet is helping communities create new digital content in their native languages. Whether through blogs, digital video, social media, or audio podcasts, we are seeing inspiring work by people committed towards preserving and revitalizing their native languages.
Starting with an activity last year in Mexico <http://globalvoicesonline.org/2014/11/09/a-network-of-indigenous-language-d…> we brought together indigenous language “digital activists” to share their experiences and teach and learn from one another.
Here, we partnered with our friends from Wikimedia Mexico to help organize the sessions around creating new or translating existing content into Wikipedia in native languages. We felt that Wikimedia shares our mission of supporting communities to be able to share their knowledge online in their own languages.
Now, we are continuing this process with a workshop in Bogotá <http://globalvoicesonline.org/2015/05/18/a-gathering-to-connect-indigenous-…> in a couple of weeks. Again, we are partnering with both Wikimedia Colombia and Wikimedia Venezuela to showcase the possibilities of creating versions of Wikipedia in native languages and building a more mutually supportive network.
We are eager to explore how Rising Voices can help support the ongoing work of communities around the world, but especially throughout Latin America, that are interested in contributing or creating their own versions of this important resource. Our partnerships with your communities have been incredibly rewarding.
Looking forward to seeing how we can continue to work together.
Thanks,
Eddie Avila
@barrioflores @risingvoices
Yes, we have one of the active editors from Wikipedia in Wayuunaiki, Leonardi Fernández, who will be participating in the event. He’ll be arriving from Venezuela, but since languages know no boundaries, we thought it was important for him to join us.
> On Jun 2, 2015, at 6:27 PM, languages-request(a)lists.wikimedia.org wrote:
>
> Send Languages mailing list submissions to
> languages(a)lists.wikimedia.org
>
> To subscribe or unsubscribe via the World Wide Web, visit
> https://lists.wikimedia.org/mailman/listinfo/languages
> or, via email, send a message with subject or body 'help' to
> languages-request(a)lists.wikimedia.org
>
> You can reach the person managing the list at
> languages-owner(a)lists.wikimedia.org
>
> When replying, please edit your Subject line so it is more specific
> than "Re: Contents of Languages digest..."
>
>
> Today's Topics:
>
> 1. Indigenous Language Digital Activism (eddie avila)
> 2. Re: Indigenous Language Digital Activism (Milos Rancic)
> 3. Re: Indigenous Language Digital Activism (Milos Rancic)
>
>
> ----------------------------------------------------------------------
>
> Message: 1
> Date: Tue, 2 Jun 2015 17:04:36 -0400
> From: eddie avila <eduardo13(a)gmail.com>
> To: languages(a)lists.wikimedia.org
> Subject: [Languages] Indigenous Language Digital Activism
> Message-ID: <261B0DE9-62E9-4F0E-861A-83C62028FC0A(a)gmail.com>
> Content-Type: text/plain; charset="utf-8"
>
> Hello all -
>
> I’m new here, but really interested in the topic. My name is Eddie Avila and I’m the Director of Rising Voices, an initiative of Global Voices <http://www.globalvoicesonline.org/>. We work to support new, diverse, and underrepresented voices as they use participatory digital media to tell their own stories on their own terms.
>
> A special interest of ours centers on how the Internet is helping communities create new digital content in their native languages. Whether through blogs, digital video, social media, or audio podcasts, we are seeing inspiring work by people committed towards preserving and revitalizing their native languages.
>
> Starting with an activity last year in Mexico <http://globalvoicesonline.org/2014/11/09/a-network-of-indigenous-language-d…> we brought together indigenous language “digital activists” to share their experiences and teach and learn from one another.
>
> Here, we partnered with our friends from Wikimedia Mexico to help organize the sessions around creating new or translating existing content into Wikipedia in native languages. We felt that Wikimedia shares our mission of supporting communities to be able to share their knowledge online in their own languages.
>
> Now, we are continuing this process with a workshop in Bogotá <http://globalvoicesonline.org/2015/05/18/a-gathering-to-connect-indigenous-…> in a couple of weeks. Again, we are partnering with both Wikimedia Colombia and Wikimedia Venezuela to showcase the possibilities of creating versions of Wikipedia in native languages and building a more mutually supportive network.
>
> We are eager to explore how Rising Voices can help support the ongoing work of communities around the world, but especially throughout Latin America, that are interested in contributing or creating their own versions of this important resource. Our partnerships with your communities have been incredibly rewarding.
>
> Looking forward to seeing how we can continue to work together.
>
> Thanks,
>
> Eddie Avila
> @barrioflores @risingvoices
>
>
>
Forwarded to the languages list.
JP
---------- Forwarded message ----------
From: Milos Rancic <millosh(a)gmail.com>
Date: 2015-05-27 15:04 GMT-06:00
Subject: [Wikimedia-l] Priority languages
To: Wikimedia Mailing List <wikimedia-l(a)lists.wikimedia.org>
Below is the list of the languages sorted by the number of L2 speakers
(more than one million of them).
L2 speakers appear in two occasions:
* First and important to us is about languages used for wider
communication. For example, French is L2 among educated people of West
Africa.
* The second type is related to the native languages in not so good
position (either dying or reviving). For example, English is L1 language of
the most of Native Americans, as well as Russian is L1 language of the most
of ethnicities of former Soviet Union, while their own languages are L2
ones. (They are important in other cases, but not for this purpose.)
I omitted English (there is no sense, as we are communicating in English
and English is default for all the localization) and few spoken languages
(our content is [mostly] written).
I also removed some languages which belong to the second category (Irish
Gaelic and Scots, for example), but it could be the case that some of the
languages from the list belong to that category, as well (though I am
pretty sure they don't).
There are languages inside of this list with well developed Wikimedia
projects and without particular need to promote work on Wikimedia projects
among them: French, Spanish and German are the examples. There is no
Russian inside of the list, as it's usually L1 language, as mentioned
above, but it belongs to the category of the languages with well developed
Wikimedia projects.
There are also languages spoken in countries with low level of internet
access and issues much more important than writing an encyclopedia, like
Congo Swahili is. Those are the areas not yet ready even for the projects
like OLPC is and we don't have a lot to do there.
But there are a number of languages in between with active chapter(s) or
user group(s) inside of relevant countries. Those languages should be the
priority in promotion collaboration.
They are: Arabic (Arabic user groups), Indonesian (WM ID), Hindi (WM IN),
Urdu (Pakistani user group), Thai (Thailand UG), Bengali (WM BD), Zulu (WM
ZA), Hausa (West African user groups), Xhosa (WM ZA), Afrikaans (WM ZA),
Kannada (WM IN), Telugu (WM IN), Tsonga (WM ZA), Malay (WM ID and Malaysian
Wikimedians), Marathi (WM IN).
The priorities for those languages should include (but likely not limited
to):
* Translation of MediaWiki messages should be 100%.
* Those languages should be priorities for every document which should be
translated. For example, ongoing Board elections; but also various Meta
pages.
* We should have the pool of literate people in those languages for various
purposes, not just for translation. For example, if we want to create
projects in languages of Pakistan, we should have a number of literate Urdu
speakers, willing to help newcomers speaking Urdu as their L2 language.
Will be back with other languages-related data :)
LanguageCodeL1 speakersL2 speakersStandard
Arabicarb206,000,000246,000,000Mandarin
Chinesecmn847,808,270178,000,000Indonesianind23,200,480140,000,000Hindihin
260,333,620120,000,000Spanishspa398,931,84096,990,000Urduurd64,035,800
94,000,000Frenchfra75,916,15087,000,000Thaitha20,396,93040,000,000Bengaliben
189,261,20019,200,000Zuluzul11,969,10015,700,000Hausahau25,109,00015,000,000
Xhosaxho8,177,30011,000,000Afrikaansafr7,096,81010,300,000Bamanankanbam
4,072,04010,000,000Burmesemya32,035,30010,000,000Congo Swahiliswc1,000
9,100,000Northern Sothonso4,631,0009,100,000Kannadakan37,739,0409,000,000
Germandeu78,093,9808,000,000Tamiltam68,776,4608,000,000Juladyu2,550,000
7,000,000Lingalalin2,141,3007,000,000Koongokng5,016,5005,000,000Telugutel
74,049,0005,000,000Ibibioibb1,500,0004,500,000Tok Pisintpi122,0004,000,000
Kriokri495,6004,000,000Amharicamh21,811,6004,000,000Bangalabxg~0
3,500,000Tsongatso4,009,0003,400,000Malayzlm15,848,5003,000,000Marathimar
71,780,6603,000,000Sinhalasin15,613,9802,000,000Efikefi405,2602,000,000Duala
dua87,7002,000,000Yorubayor19,380,8002,000,000Shonasna10,741,7001,800,000
Vendaven1,294,0001,700,000Sangosag404,0001,600,000Manado Malayxmm850,000
1,500,000Sylhetisyl10,300,0001,500,000Ambonese Malayabs245,0201,400,000
Ndebelenbl1,090,0001,400,000Rakhinerki1,000,0001,020,000Gandalug4,130,000
1,000,000Akanaka8,314,6001,000,000Khmerkhm14,224,5001,000,000
_______________________________________________
Wikimedia-l mailing list, guidelines at:
https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines
Wikimedia-l(a)lists.wikimedia.org
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l,
<mailto:wikimedia-l-request@lists.wikimedia.org?subject=unsubscribe>