Hi,
It is well-known that the size of a Wikipedia in a given language is not proportional to the number of people who speak that language. By "size" I mean the article count and the active editor count.
This begs the question: Is it proportional to anything else?
I can think of a bunch of possible things (to most items you can add "... in the countries where this language is spoken"):
* Penetration of Internet access * Quality of education * Number of people who know other major languages, such as English, French, Russian, Spanish, etc. * Number of people who *don't* know other major languages * Gross domestic product * Human Development Index * The level of usage of this language in the education system (in some countries schools function in foreign languages) * Amount of published literature in that language * Level of censorship and press freedom * [[Language planning]] policies (think Catalonia, Ukraine, Quebec, Israel)
It is quite possible that the size of a Wikipedia is proportional not to one of these things, but to a combination of them. It is also possible that it is not proportional to any of the above, or to anything at all.
Did anybody ever try to research this?
-- Amir Elisha Aharoni · אָמִיר אֱלִישָׁע אַהֲרוֹנִי http://aharoni.wordpress.com “We're living in pieces, I want to live in peace.” – T. Moore
(And yes, I know that Language planning and some of the other items are not measurable as numbers. I'm throwing ideas around.)
-- Amir Elisha Aharoni · אָמִיר אֱלִישָׁע אַהֲרוֹנִי http://aharoni.wordpress.com “We're living in pieces, I want to live in peace.” – T. Moore
2015-01-25 17:57 GMT-08:00 Amir E. Aharoni amir.aharoni@mail.huji.ac.il:
Hi,
It is well-known that the size of a Wikipedia in a given language is not proportional to the number of people who speak that language. By "size" I mean the article count and the active editor count.
This begs the question: Is it proportional to anything else?
I can think of a bunch of possible things (to most items you can add "... in the countries where this language is spoken"):
- Penetration of Internet access
- Quality of education
- Number of people who know other major languages, such as English,
French, Russian, Spanish, etc.
- Number of people who *don't* know other major languages
- Gross domestic product
- Human Development Index
- The level of usage of this language in the education system (in some
countries schools function in foreign languages)
- Amount of published literature in that language
- Level of censorship and press freedom
- [[Language planning]] policies (think Catalonia, Ukraine, Quebec, Israel)
It is quite possible that the size of a Wikipedia is proportional not to one of these things, but to a combination of them. It is also possible that it is not proportional to any of the above, or to anything at all.
Did anybody ever try to research this?
-- Amir Elisha Aharoni · אָמִיר אֱלִישָׁע אַהֲרוֹנִי http://aharoni.wordpress.com “We're living in pieces, I want to live in peace.” – T. Moore
Hi Amir
What exactly do you want from this? Is this just some personal curiosity or are you going to do someting with it?
And by the way, you forgot bots - the article counts for some of the WP are driven by bots - the Dutch WP, for example.
Rui
2015-01-26 4:05 GMT+02:00 Amir E. Aharoni amir.aharoni@mail.huji.ac.il:
(And yes, I know that Language planning and some of the other items are not measurable as numbers. I'm throwing ideas around.)
-- Amir Elisha Aharoni · אָמִיר אֱלִישָׁע אַהֲרוֹנִי http://aharoni.wordpress.com “We're living in pieces, I want to live in peace.” – T. Moore
2015-01-25 17:57 GMT-08:00 Amir E. Aharoni amir.aharoni@mail.huji.ac.il:
Hi,
It is well-known that the size of a Wikipedia in a given language is not proportional to the number of people who speak that language. By "size" I mean the article count and the active editor count.
This begs the question: Is it proportional to anything else?
I can think of a bunch of possible things (to most items you can add "... in the countries where this language is spoken"):
- Penetration of Internet access
- Quality of education
- Number of people who know other major languages, such as English,
French, Russian, Spanish, etc.
- Number of people who *don't* know other major languages
- Gross domestic product
- Human Development Index
- The level of usage of this language in the education system (in some
countries schools function in foreign languages)
- Amount of published literature in that language
- Level of censorship and press freedom
- [[Language planning]] policies (think Catalonia, Ukraine, Quebec, Israel)
It is quite possible that the size of a Wikipedia is proportional not to one of these things, but to a combination of them. It is also possible that it is not proportional to any of the above, or to anything at all.
Did anybody ever try to research this?
-- Amir Elisha Aharoni · אָמִיר אֱלִישָׁע אַהֲרוֹנִי http://aharoni.wordpress.com “We're living in pieces, I want to live in peace.” – T. Moore
Wikimedia-l mailing list, guidelines at: https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines Wikimedia-l@lists.wikimedia.org Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l, mailto:wikimedia-l-request@lists.wikimedia.org?subject=unsubscribe
I develop the features of MediaWiki that are tightly related to the multilingual nature of Wikimedia projects, I want to develop these features in a way that will benefit as many people and languages as (reasonably) possible, so I want to know whatever can be known about what affects the size of wikis.
And no, I didn't forget about bots, I just didn't want to overload the email with disclaimers :)
-- Amir Elisha Aharoni · אָמִיר אֱלִישָׁע אַהֲרוֹנִי http://aharoni.wordpress.com “We're living in pieces, I want to live in peace.” – T. Moore
2015-01-25 22:16 GMT-08:00 Rui Correia correia.rui@gmail.com:
Hi Amir
What exactly do you want from this? Is this just some personal curiosity or are you going to do someting with it?
And by the way, you forgot bots - the article counts for some of the WP are driven by bots - the Dutch WP, for example.
Rui
2015-01-26 4:05 GMT+02:00 Amir E. Aharoni amir.aharoni@mail.huji.ac.il:
(And yes, I know that Language planning and some of the other items are
not
measurable as numbers. I'm throwing ideas around.)
-- Amir Elisha Aharoni · אָמִיר אֱלִישָׁע אַהֲרוֹנִי http://aharoni.wordpress.com “We're living in pieces, I want to live in peace.” – T. Moore
2015-01-25 17:57 GMT-08:00 Amir E. Aharoni <amir.aharoni@mail.huji.ac.il :
Hi,
It is well-known that the size of a Wikipedia in a given language is not proportional to the number of people who speak that language. By "size"
I
mean the article count and the active editor count.
This begs the question: Is it proportional to anything else?
I can think of a bunch of possible things (to most items you can add
"...
in the countries where this language is spoken"):
- Penetration of Internet access
- Quality of education
- Number of people who know other major languages, such as English,
French, Russian, Spanish, etc.
- Number of people who *don't* know other major languages
- Gross domestic product
- Human Development Index
- The level of usage of this language in the education system (in some
countries schools function in foreign languages)
- Amount of published literature in that language
- Level of censorship and press freedom
- [[Language planning]] policies (think Catalonia, Ukraine, Quebec,
Israel)
It is quite possible that the size of a Wikipedia is proportional not to one of these things, but to a combination of them. It is also possible
that
it is not proportional to any of the above, or to anything at all.
Did anybody ever try to research this?
-- Amir Elisha Aharoni · אָמִיר אֱלִישָׁע אַהֲרוֹנִי http://aharoni.wordpress.com “We're living in pieces, I want to live in peace.” – T. Moore
Wikimedia-l mailing list, guidelines at:
https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines
Wikimedia-l@lists.wikimedia.org Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l,
mailto:wikimedia-l-request@lists.wikimedia.org?subject=unsubscribe
-- _________________________ Rui Correia Advocacy, Human Rights, Media and Language Work Consultant Bridge to Angola - Angola Liaison Consultant
Mobile Number in South Africa +27 74 425 4186 Número de Telemóvel na África do Sul +27 74 425 4186 _______________
Wikimedia-l mailing list, guidelines at: https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines Wikimedia-l@lists.wikimedia.org Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l, mailto:wikimedia-l-request@lists.wikimedia.org?subject=unsubscribe
Amir E. Aharoni, 26/01/2015 02:57:
By "size" I mean the article count and the active editor count.
If you drop this parenthesis, the whole email is fine. :)
Article count is a totally useless metric for size; comparing useless metrics to uncertain metrics is quite sure to produce garbage. Since 2008, total size on Wikimedia projects is defined by usage: https://meta.wikimedia.org/wiki/Top_Ten_Wikipedias While the active editor count is our official measure of current size of a project.
Nemo
Useful link, thank you! I know that article count is very far from perfect as a metric, of course. All of the things at that page are relevant and my question could be asked about any of them. בתאריך 26 בינו 2015 01:36, "Federico Leva (Nemo)" nemowiki@gmail.com כתב:
Amir E. Aharoni, 26/01/2015 02:57:
By "size" I mean the article count and the active editor count.
If you drop this parenthesis, the whole email is fine. :)
Article count is a totally useless metric for size; comparing useless metrics to uncertain metrics is quite sure to produce garbage. Since 2008, total size on Wikimedia projects is defined by usage: https://meta.wikimedia.org/wiki/Top_Ten_Wikipedias While the active editor count is our official measure of current size of a project.
Nemo
Wikimedia-l mailing list, guidelines at: https://meta.wikimedia.org/ wiki/Mailing_lists/Guidelines Wikimedia-l@lists.wikimedia.org Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l, mailto:wikimedia-l-request@lists.wikimedia.org?subject=unsubscribe
A few weeks back, I was playing around with some numbers. It is true that number of effective speakers isn't a very good predictor, but it is a place to start.
Most of our mature editor communities have about 20 active editors per 1 million effective speakers, give or take a factor of 4. In other words, among communities with at least 100 active editors most range from 5 to 80 active editors per million effective speakers. Admittedly, that is not a very precise range. English, for example, is right at 20 on this metric. There are also some important outliers, such as the Chinese and Arabic communities (both less than 2 active editors per million speakers), which probably have yet to reach parity with the other active languages. There are also a few major languages (e.g. Hindi, Bengali, and Malay) that arguably haven't even begun. Those have fewer than 100 active editors and less than 0.5 editors per 1 million speakers, despite hundreds of millions of speakers.
I suspect that if one could start adjusting for other factors, e.g. speakers with internet access, one might be able to narrow that predicted range. Economic and cultural factors are also probably important, as well as the penetration of secondary languages like English.
Structurally, it seems like this kind of data analysis problem would be fairly amenable to various kinds of regression analysis. The main difficulty would be gathering the right data, e.g. number of effective speakers (which probably needs to subdivided by country in order to compare to other data sets), internet penetration, economic indicators, access to education, etc. Anyone happen to know where there is comprehensive language data broken down by country?
As others have suggested, I would emphasize community participation or readership metrics rather than article metrics due to bot biasing, etc.
Anyway, if one uses 20 active editors per 1 million speakers as a rough guide, one can estimate which languages have the most natural potential for growth. The top 15 on that list would be in order: Chinese, Hindi, Arabic, Malay, Spanish, Indonesian, Bengali, Portuguese, Russian, Punjabi, Marathi, Tagalog, Javanese, Wu, and Telugu. Those would collectively account 70% of "missing" editors if we assume that we roughly expect 20 editors / 1 million speakers. In terms of feature development for under-utilized languages, those are probably a reasonable set to be thinking about.
Most of the list is from Asian countries, and with the exception of Spanish and Portuguese, they are all languages that use non-latin character sets. So support for other scripts is obviously important. On the other hand, it is also possible that many of these language are "missing" in part because the computer literate among the populations who speak these languages actually prefer to edit in some other language (e.g. English).
Anyway, just a few thoughts.
-Robert Rohde
On Sun, Jan 25, 2015 at 5:57 PM, Amir E. Aharoni < amir.aharoni@mail.huji.ac.il> wrote:
Hi,
It is well-known that the size of a Wikipedia in a given language is not proportional to the number of people who speak that language. By "size" I mean the article count and the active editor count.
This begs the question: Is it proportional to anything else?
I can think of a bunch of possible things (to most items you can add "... in the countries where this language is spoken"):
- Penetration of Internet access
- Quality of education
- Number of people who know other major languages, such as English, French,
Russian, Spanish, etc.
- Number of people who *don't* know other major languages
- Gross domestic product
- Human Development Index
- The level of usage of this language in the education system (in some
countries schools function in foreign languages)
- Amount of published literature in that language
- Level of censorship and press freedom
- [[Language planning]] policies (think Catalonia, Ukraine, Quebec, Israel)
It is quite possible that the size of a Wikipedia is proportional not to one of these things, but to a combination of them. It is also possible that it is not proportional to any of the above, or to anything at all.
Did anybody ever try to research this?
-- Amir Elisha Aharoni · אָמִיר אֱלִישָׁע אַהֲרוֹנִי http://aharoni.wordpress.com “We're living in pieces, I want to live in peace.” – T. Moore _______________________________________________ Wikimedia-l mailing list, guidelines at: https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines Wikimedia-l@lists.wikimedia.org Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l, mailto:wikimedia-l-request@lists.wikimedia.org?subject=unsubscribe
I assume, the answer is "all of this".
Erlend Bjørtvedt Oslo
Den mandag 26. januar 2015 skrev Amir E. Aharoni < amir.aharoni@mail.huji.ac.il> følgende:
Hi,
It is well-known that the size of a Wikipedia in a given language is not proportional to the number of people who speak that language. By "size" I mean the article count and the active editor count.
This begs the question: Is it proportional to anything else?
I can think of a bunch of possible things (to most items you can add "... in the countries where this language is spoken"):
- Penetration of Internet access
- Quality of education
- Number of people who know other major languages, such as English, French,
Russian, Spanish, etc.
- Number of people who *don't* know other major languages
- Gross domestic product
- Human Development Index
- The level of usage of this language in the education system (in some
countries schools function in foreign languages)
- Amount of published literature in that language
- Level of censorship and press freedom
- [[Language planning]] policies (think Catalonia, Ukraine, Quebec, Israel)
It is quite possible that the size of a Wikipedia is proportional not to one of these things, but to a combination of them. It is also possible that it is not proportional to any of the above, or to anything at all.
Did anybody ever try to research this?
-- Amir Elisha Aharoni · אָמִיר אֱלִישָׁע אַהֲרוֹנִי http://aharoni.wordpress.com “We're living in pieces, I want to live in peace.” – T. Moore _______________________________________________ Wikimedia-l mailing list, guidelines at: https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines Wikimedia-l@lists.wikimedia.org Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l, <mailto:wikimedia-l-request@lists.wikimedia.org javascript:; ?subject=unsubscribe>
wikimedia-l@lists.wikimedia.org