Dear all,
I have some findings to show the page views per Internet user measurement may help comparing different language editions of Wikipedia. Criticism and suggestions are welcome.
----- http://people.oii.ox.ac.uk/hanteng/2015/03/15/comparing-language-development...
Which language version of Wikipedia enjoys the most page views per language Internet user than expected? It is Finnish. In terms of absolute positive and negative gap, English has the widest positive gap whereas Chinese has the largest negative gap.
......
In particular, it is known that Wikipedia (and Google which often favours Wikipedia) faces local competition in the People's Republic of China and South Korea. Therefore it is understandable the page views may be lower in Chinese and Korean Wikipedia language projects simply because some users' need to read user-generated encyclopedias are satisfied by other websites. However, it remains an important question to examine why these particular Latin and Asian languages are under-developed for Wikipedia projects.
Awesome work! It's interesting to see Finnish as the outlier here. Do we have any fi-users on the list who can comment on this and might know what's going on? (And, in the absence of Finns: Jan, heard anything from across the border? :p)
The only caution I'd raise is that these numbers don't include spider filtering. Why is this important? Well, a lot of traffic is driven by crawlers and spiders and automata, particularly on smaller projects, and it can lead to weirdness as a result. With the granular pagecount files there's some work that can be done to detect this (for example, using burst detection and a few heuristics around concentration measures to eliminate pages that are clearly driven by automated traffic - see the recent analytics mailing list thread) but only some. I appreciate this is a flaw in the data we are releasing, not in your work, which is an excellent read and highly interesting :). I agree that understanding the lack of development in the PRC and ROK is crucial - we keep talking about the "next billion readers" but only talking :(
On 16 March 2015 at 02:21, h hanteng@gmail.com wrote:
Dear all,
I have some findings to show the page views per Internet user
measurement may help comparing different language editions of Wikipedia. Criticism and suggestions are welcome.
http://people.oii.ox.ac.uk/hanteng/2015/03/15/comparing-language-development...
Which language version of Wikipedia enjoys the most page views per language Internet user than expected? It is Finnish. In terms of absolute positive and negative gap, English has the widest positive gap whereas Chinese has the largest negative gap.
......
In particular, it is known that Wikipedia (and Google which often favours Wikipedia) faces local competition in the People's Republic of China and South Korea. Therefore it is understandable the page views may be lower in Chinese and Korean Wikipedia language projects simply because some users' need to read user-generated encyclopedias are satisfied by other websites. However, it remains an important question to examine why these particular Latin and Asian languages are under-developed for Wikipedia projects.
Wiki-research-l mailing list Wiki-research-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
Hello,
With all admiration for the maths, I think that we can learn from these figures less than we might hope to. In these statistics I often see a strangly high proportion of traffic from the US or other countries that is difficult to explain. Why, for example, should there be to many people in the US who are interested in Frisian Wikipedia?
Even if the numbers and proportions are right: there are too many factors to consider.
Some years ago I did some research to single Wikipedia language versions, and it still seems to be the most useful way to combine several methods. Very important are interviews with the local Wikipedians. It would be great to have more interviews with readers or potential readers (or potential non-readers) in order to find out why a Wikipedia language version does grow, or not.
Kind regards Ziko
Am Montag, 16. März 2015 schrieb Oliver Keyes :
Awesome work! It's interesting to see Finnish as the outlier here. Do we have any fi-users on the list who can comment on this and might know what's going on? (And, in the absence of Finns: Jan, heard anything from across the border? :p)
The only caution I'd raise is that these numbers don't include spider filtering. Why is this important? Well, a lot of traffic is driven by crawlers and spiders and automata, particularly on smaller projects, and it can lead to weirdness as a result. With the granular pagecount files there's some work that can be done to detect this (for example, using burst detection and a few heuristics around concentration measures to eliminate pages that are clearly driven by automated traffic - see the recent analytics mailing list thread) but only some. I appreciate this is a flaw in the data we are releasing, not in your work, which is an excellent read and highly interesting :). I agree that understanding the lack of development in the PRC and ROK is crucial - we keep talking about the "next billion readers" but only talking :(
On 16 March 2015 at 02:21, h <hanteng@gmail.com javascript:;> wrote:
Dear all,
I have some findings to show the page views per Internet user
measurement may help comparing different language editions of Wikipedia. Criticism and suggestions are welcome.
http://people.oii.ox.ac.uk/hanteng/2015/03/15/comparing-language-development...
Which language version of Wikipedia enjoys the most page views per
language
Internet user than expected? It is Finnish. In terms of absolute positive and negative gap, English has the widest positive gap whereas Chinese has the largest negative gap.
......
In particular, it is known that Wikipedia (and Google which often favours Wikipedia) faces local competition in the People's Republic of China and South Korea. Therefore it is understandable the page views may be lower
in
Chinese and Korean Wikipedia language projects simply because some users' need to read user-generated encyclopedias are satisfied by other
websites.
However, it remains an important question to examine why these particular Latin and Asian languages are under-developed for Wikipedia projects.
Wiki-research-l mailing list Wiki-research-l@lists.wikimedia.org javascript:; https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
-- Oliver Keyes Research Analyst Wikimedia Foundation
Wiki-research-l mailing list Wiki-research-l@lists.wikimedia.org javascript:; https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
On 16.3.2015, at 11.30, Oliver Keyes okeyes@wikimedia.org wrote:
Awesome work! It's interesting to see Finnish as the outlier here. Do we have any fi-users on the list who can comment on this and might know what's going on? (And, in the absence of Finns: Jan, heard anything from across the border? :p)
To find a certain societal explanation for this, one good have a look of the Swedish, Norwegian, Danish, Icelandic Wikipedias. If they come high, too, I would explain this with the Nordic people’s interest in reading (and high level of general education).
Some earlier studies by UNESCO have demonstrated that the people in Nordic countries read more newspapers, books etc. than people in the rest of the world. If I remember right the Finns has beaten the Nordic countries also in these comparisons. So the explanation can be reading habits in generate, which is a cause of high level of basic education, library network, equal society etc.
- Teemu
-------------------------------------------------- Teemu Leinonen http://teemuleinonen.fi +358 50 351 6796 Media Lab http://mlab.uiah.fi Aalto University School of Arts, Design and Architecture --------------------------------------------------
A similar phenomenon occurs here in the US with northern cities (specifically Seattle and Minneapolis) often ranking highly on measures of literacy like library checkouts, newspaper circulation, and education levels. When people spend free time indoors due to cold and damp weather, I speculate that they tend to engage in activities like reading, playing musical instruments and computer games, improving their education, and of couse watching Seahawks football games. ;) I hope that we will make good use of this phenomena in Seattle by engaging more people in Cascadia Wikimedians activities.
Pine On Mar 16, 2015 1:32 PM, "Leinonen Teemu" teemu.leinonen@aalto.fi wrote:
On 16.3.2015, at 11.30, Oliver Keyes okeyes@wikimedia.org wrote:
Awesome work! It's interesting to see Finnish as the outlier here. Do we have any fi-users on the list who can comment on this and might know what's going on? (And, in the absence of Finns: Jan, heard anything from across the border? :p)
To find a certain societal explanation for this, one good have a look of the Swedish, Norwegian, Danish, Icelandic Wikipedias. If they come high, too, I would explain this with the Nordic people’s interest in reading (and high level of general education).
Some earlier studies by UNESCO have demonstrated that the people in Nordic countries read more newspapers, books etc. than people in the rest of the world. If I remember right the Finns has beaten the Nordic countries also in these comparisons. So the explanation can be reading habits in generate, which is a cause of high level of basic education, library network, equal society etc.
- Teemu
Teemu Leinonen http://teemuleinonen.fi +358 50 351 6796 Media Lab http://mlab.uiah.fi Aalto University School of Arts, Design and Architecture
Wiki-research-l mailing list Wiki-research-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
And very long winter nights? Are there seasonal variations?
-----Original Message----- From: wiki-research-l-bounces@lists.wikimedia.org [mailto:wiki-research-l-bounces@lists.wikimedia.org] On Behalf Of Leinonen Teemu Sent: Tuesday, 17 March 2015 4:32 AM To: Research into Wikimedia content and communities Subject: Re: [Wiki-research-l] (no subject)
On 16.3.2015, at 11.30, Oliver Keyes okeyes@wikimedia.org wrote:
Awesome work! It's interesting to see Finnish as the outlier here. Do we have any fi-users on the list who can comment on this and might know what's going on? (And, in the absence of Finns: Jan, heard anything from across the border? :p)
To find a certain societal explanation for this, one good have a look of the Swedish, Norwegian, Danish, Icelandic Wikipedias. If they come high, too, I would explain this with the Nordic people's interest in reading (and high level of general education).
Some earlier studies by UNESCO have demonstrated that the people in Nordic countries read more newspapers, books etc. than people in the rest of the world. If I remember right the Finns has beaten the Nordic countries also in these comparisons. So the explanation can be reading habits in generate, which is a cause of high level of basic education, library network, equal society etc.
- Teemu
-------------------------------------------------- Teemu Leinonen http://teemuleinonen.fi +358 50 351 6796 Media Lab http://mlab.uiah.fi Aalto University School of Arts, Design and Architecture -------------------------------------------------- _______________________________________________ Wiki-research-l mailing list Wiki-research-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
If the hypothesis on the climate's impact on literacy and thus Wikipedia viewing holds true, then perhaps Manchuria (or Northeast China) and/or Russia, should be the target for Wikipedia viewing.
The love for knowledge/information could be strong in East Asian/Confucius countries where both literacy rates and ICT development index have been high.
By using some comparative methods from comparative politics and social sciences, we may be able to tell a bigger stories about online literacy and the role of Wikipedia in it.
Best, han-teng liao
2015-03-16 22:02 GMT+01:00 Kerry Raymond kerry.raymond@gmail.com:
And very long winter nights? Are there seasonal variations?
-----Original Message----- From: wiki-research-l-bounces@lists.wikimedia.org [mailto:wiki-research-l-bounces@lists.wikimedia.org] On Behalf Of Leinonen Teemu Sent: Tuesday, 17 March 2015 4:32 AM To: Research into Wikimedia content and communities Subject: Re: [Wiki-research-l] (no subject)
On 16.3.2015, at 11.30, Oliver Keyes okeyes@wikimedia.org wrote:
Awesome work! It's interesting to see Finnish as the outlier here. Do we have any fi-users on the list who can comment on this and might know what's going on? (And, in the absence of Finns: Jan, heard anything from across the border? :p)
To find a certain societal explanation for this, one good have a look of the Swedish, Norwegian, Danish, Icelandic Wikipedias. If they come high, too, I would explain this with the Nordic people's interest in reading (and high level of general education).
Some earlier studies by UNESCO have demonstrated that the people in Nordic countries read more newspapers, books etc. than people in the rest of the world. If I remember right the Finns has beaten the Nordic countries also in these comparisons. So the explanation can be reading habits in generate, which is a cause of high level of basic education, library network, equal society etc.
- Teemu
Teemu Leinonen http://teemuleinonen.fi +358 50 351 6796 Media Lab http://mlab.uiah.fi Aalto University School of Arts, Design and Architecture
Wiki-research-l mailing list Wiki-research-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
Wiki-research-l mailing list Wiki-research-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
I've noted Finland (as a country) before when looking at Erik's data - IIRC, there's a vaguely normal-looking distribution of pages-per-internet-user-per-month for the Western European countries, and Finland is at the upper end but not a dramatic outlier, it's in a group with eg Sweden, Austria, etc.
http://stats.wikimedia.org/wikimedia/squids/SquidReportPageViewsPerCountryOv...
This pattern has been around since at least 2012:
http://web.archive.org/web/20120922063053/http://stats.wikimedia.org/wikimed...
(not sure why the 2012 per-country numbers are so much higher...)
Andrew.
On 16 March 2015 at 09:30, Oliver Keyes okeyes@wikimedia.org wrote:
Awesome work! It's interesting to see Finnish as the outlier here. Do we have any fi-users on the list who can comment on this and might know what's going on? (And, in the absence of Finns: Jan, heard anything from across the border? :p)
The only caution I'd raise is that these numbers don't include spider filtering. Why is this important? Well, a lot of traffic is driven by crawlers and spiders and automata, particularly on smaller projects, and it can lead to weirdness as a result. With the granular pagecount files there's some work that can be done to detect this (for example, using burst detection and a few heuristics around concentration measures to eliminate pages that are clearly driven by automated traffic - see the recent analytics mailing list thread) but only some. I appreciate this is a flaw in the data we are releasing, not in your work, which is an excellent read and highly interesting :). I agree that understanding the lack of development in the PRC and ROK is crucial - we keep talking about the "next billion readers" but only talking :(
On 16 March 2015 at 02:21, h hanteng@gmail.com wrote:
Dear all,
I have some findings to show the page views per Internet user
measurement may help comparing different language editions of Wikipedia. Criticism and suggestions are welcome.
http://people.oii.ox.ac.uk/hanteng/2015/03/15/comparing-language-development...
Which language version of Wikipedia enjoys the most page views per language Internet user than expected? It is Finnish. In terms of absolute positive and negative gap, English has the widest positive gap whereas Chinese has the largest negative gap.
......
In particular, it is known that Wikipedia (and Google which often favours Wikipedia) faces local competition in the People's Republic of China and South Korea. Therefore it is understandable the page views may be lower in Chinese and Korean Wikipedia language projects simply because some users' need to read user-generated encyclopedias are satisfied by other websites. However, it remains an important question to examine why these particular Latin and Asian languages are under-developed for Wikipedia projects.
Wiki-research-l mailing list Wiki-research-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
-- Oliver Keyes Research Analyst Wikimedia Foundation
Wiki-research-l mailing list Wiki-research-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
wiki-research-l@lists.wikimedia.org