Here is a Q&A on all issues raised: Q=question/R=Remark, A=answer
I put the more general questions on top.
Cheers, Erik Zachte
------------------------------------------
Q: Nikola Smolenski Is it first time these reports are published?
A: Yes, expect trend report to grow by accretion over time. Other reports will be built from data for recent (6) months only
------------------------------------------
R: Andrew Gray Andrew explains why distribution of page requests over countries favors Spanish and Portuguese speaking countries: 'Some Wikipedias - the ones which insist on only-free-images - do not use local uploads at all.'
A: Thanks for explaining this unexpected distribution of page views on Commons, I had no idea.
Spain 30.0% USA 29.2% Brazil 8.5% Argentina 4.8% Mexico 3.9% Germany 3.3% France 2.1% Venezuela 1.9% Chile 1.4% Costa Rica 1.4% Italy 1.4% Uruguay 1.2% Colombia 1.2% Portugal 1.1%
------------------------------------------
R: Mark Williamson
Two main factors influencing choice of Wikipedia language: # Fluency of the Internet-using population of a country in English. # Quality of the native Wikipedia.
A: Like you say. Many Scandinavians (and Dutch people I might add) probably switch between English and local content all the time. Personally I tend to look at English Wp first I many instances, because of obviously richer content and larger depth.
------------------------------------------
Q: Ziko van Dijk Why are 40 % of the visitors of ksh.WP (the dialect of Cologne) from Japan. Why are 25 % of the visitors of eu.WP (Basque) from Poland?
Q: Andre Engels I think bots are a likely explanation in the eu case (unless Erik is using an algorithm that filters out bots)
A: KSH used to be code for Kashmir. Still not Japan, but much closer than Cologne. Maybe Japanese mountaineers caused this spike ? (only half kidding)
As for eu.wp: Would Polish presume there also is a European Wikipedia? Just a guess.
I do filter bots
------------------------------------------
R: Teun Spaans For trends, I would expect a bar indicating upward or downward trend, not a percentage bar.
A: We can have both, a notion of importance and of change: I might color code cells as I do already in e.g. [1] This way large fluctuations really stand out. Let's first collect more history.
[1] http://stats.wikimedia.org/EN/TablesPageViewsMonthly.htm
------------------------------------------
Q: Nikola Smolenski Could we get this for other projects?
A: This question is of course not unexpected. One consideration is we need a certain sample size to make numbers significant. For other projects, with far less traffic, few country/language pairs would be backed by sufficient data. See also below on extending the current reports with more table rows.
------------------------------------------
Q: Nikola Smolenski: Please include at Wikipedia Page Views Per Country - Overview [1] number of Internet users from [2], and number of views per Internet user?
[1] http://tinyurl.com/yk43aq6 [2] http://tinyurl.com/yfv5bwn
A: Done
------------------------------------------
R: Nikola Smolenski It is obvious why Slovene Wikipedia is highly visited in Sierra Leone, and Serbian in Suriname; URLs do matter :) Although, I don't understand why so much. I would expect this distribution by visitors, perhaps, but not by visits.
A: Very interesting observation! So people from Sierra Leone try 'sl.wikipedia.org'. Why people from Surinam go to 'sr.wikimedia.org' is only slightly less obvious to me, but apparently is happens
For countries with just a few hits in the sampled log the distinction between visitors and visits gets blurred.
------------------------------------------
R: Andre Engels Ukrainian is not a small language by any means, yet Wikipedia visitors tend to be drawn to the Russian Wikipedia instead.
A: Yes but article growth in Ukrainian Wikipedia has been speeding up in recent months. [1]
[1] http://stats.wikimedia.org/EN/TablesWikipediaUK.htm
------------------------------------------
R: Andre Engels The Q3-Q4 comparison for most countries shows a shift from English to the 'vernacular'.
A: Interesting analysis. Let's see if this is a consistent trend. However the monthly page views per Wikipedia language for which we have 2 year history do not show very significant shift from large to smaller wikipedia's. See table 'Distribution of page views' at bottom of page of [1]: smaller languages gain in share of page views, but very slowly.
[1] http://stats.wikimedia.org/EN/TablesPageViewsMonthly.htm
------------------------------------------
Q: Nikola Smolenski / Milos Rancic At Wikipedia Page Views By Country - Breakdown [1] and Wikipedia Page Views By Country - Trends [2] could you include more languages (ideally all languages)? Some of the numbers are going below 0.1% of population, but some of them are not mentioned even they are larger than 0.5% of population.
[1] http://tinyurl.com/yhp3an7 [2] http://tinyurl.com/yzga2hm
A: Yes on some reports I do include smaller percentages for the largest Wikipedia's as those represent significant numbers of page views. I used different (and arbitrary) thresholds per report. The arbitrariness could change, but I want to plead for a notoriety threshold:
Here is a much more extended version of the breakdown report [1] (for this discussion only) It shows per country up to 50 Wikipedia's An extra column shows the total number of records for this country/language (for the 6 month period) on which the percentage is based. As you can see for the smallest countries that number is so low that it is no longer significant.
Let us say we cut off not at 1%, but at an (arbitrary) absolute threshold of x logged records per country/language pair (per row). Let us say we cut off at average 5 records per month. Everything below that threshold in the test report is in dark red. Personally I think this is still way too much detail for a general report. Not because of Kb's but information overload.
2010/1/15 Erik Zachte erikzachte@infodisiac.com:
Very interesting observation! So people from Sierra Leone try 'sl.wikipedia.org'. Why people from Surinam go to 'sr.wikimedia.org' is only slightly less obvious to me, but apparently is happens
Well, Suriname’s TLD is .sr, so it is quite obvious, isn’t it? The same frequent mistake is also the reason there is a redirection cz.wikipedia.org → cs.wikipedia.org (Czech language is “cs” according to ISO 639-1, but Czech Republic’s TLD is “.cz” according to ISO 3166-1).
-- [[cs:User:Mormegil | Petr Kadlec]]
I notice in that list both Belarusian Wikipedias are listed just as "Belarusian Wikipedia". It would be very informative to know which is which and to have visitor statistics on both :-)
skype: node.ue
On Fri, Jan 15, 2010 at 3:39 PM, Erik Zachte erikzachte@infodisiac.comwrote:
Here is a Q&A on all issues raised: Q=question/R=Remark, A=answer
I put the more general questions on top.
Cheers, Erik Zachte
Q: Nikola Smolenski Is it first time these reports are published?
A: Yes, expect trend report to grow by accretion over time. Other reports will be built from data for recent (6) months only
R: Andrew Gray Andrew explains why distribution of page requests over countries favors Spanish and Portuguese speaking countries: 'Some Wikipedias - the ones which insist on only-free-images - do not use local uploads at all.'
A: Thanks for explaining this unexpected distribution of page views on Commons, I had no idea.
Spain 30.0% USA 29.2% Brazil 8.5% Argentina 4.8% Mexico 3.9% Germany 3.3% France 2.1% Venezuela 1.9% Chile 1.4% Costa Rica 1.4% Italy 1.4% Uruguay 1.2% Colombia 1.2% Portugal 1.1%
R: Mark Williamson
Two main factors influencing choice of Wikipedia language: # Fluency of the Internet-using population of a country in English. # Quality of the native Wikipedia.
A: Like you say. Many Scandinavians (and Dutch people I might add) probably switch between English and local content all the time. Personally I tend to look at English Wp first I many instances, because of obviously richer content and larger depth.
Q: Ziko van Dijk Why are 40 % of the visitors of ksh.WP (the dialect of Cologne) from Japan. Why are 25 % of the visitors of eu.WP (Basque) from Poland?
Q: Andre Engels I think bots are a likely explanation in the eu case (unless Erik is using an algorithm that filters out bots)
A: KSH used to be code for Kashmir. Still not Japan, but much closer than Cologne. Maybe Japanese mountaineers caused this spike ? (only half kidding)
As for eu.wp: Would Polish presume there also is a European Wikipedia? Just a guess.
I do filter bots
R: Teun Spaans For trends, I would expect a bar indicating upward or downward trend, not a percentage bar.
A: We can have both, a notion of importance and of change: I might color code cells as I do already in e.g. [1] This way large fluctuations really stand out. Let's first collect more history.
[1] http://stats.wikimedia.org/EN/TablesPageViewsMonthly.htm
Q: Nikola Smolenski Could we get this for other projects?
A: This question is of course not unexpected. One consideration is we need a certain sample size to make numbers significant. For other projects, with far less traffic, few country/language pairs would be backed by sufficient data. See also below on extending the current reports with more table rows.
Q: Nikola Smolenski: Please include at Wikipedia Page Views Per Country - Overview [1] number of Internet users from [2], and number of views per Internet user?
[1] http://tinyurl.com/yk43aq6 [2] http://tinyurl.com/yfv5bwn
A: Done
R: Nikola Smolenski It is obvious why Slovene Wikipedia is highly visited in Sierra Leone, and Serbian in Suriname; URLs do matter :) Although, I don't understand why so much. I would expect this distribution by visitors, perhaps, but not by visits.
A: Very interesting observation! So people from Sierra Leone try 'sl.wikipedia.org'. Why people from Surinam go to 'sr.wikimedia.org' is only slightly less obvious to me, but apparently is happens
For countries with just a few hits in the sampled log the distinction between visitors and visits gets blurred.
R: Andre Engels Ukrainian is not a small language by any means, yet Wikipedia visitors tend to be drawn to the Russian Wikipedia instead.
A: Yes but article growth in Ukrainian Wikipedia has been speeding up in recent months. [1]
[1] http://stats.wikimedia.org/EN/TablesWikipediaUK.htm
R: Andre Engels The Q3-Q4 comparison for most countries shows a shift from English to the 'vernacular'.
A: Interesting analysis. Let's see if this is a consistent trend. However the monthly page views per Wikipedia language for which we have 2 year history do not show very significant shift from large to smaller wikipedia's. See table 'Distribution of page views' at bottom of page of [1]: smaller languages gain in share of page views, but very slowly.
[1] http://stats.wikimedia.org/EN/TablesPageViewsMonthly.htm
Q: Nikola Smolenski / Milos Rancic At Wikipedia Page Views By Country - Breakdown [1] and Wikipedia Page Views By Country - Trends [2] could you include more languages (ideally all languages)? Some of the numbers are going below 0.1% of population, but some of them are not mentioned even they are larger than 0.5% of population.
[1] http://tinyurl.com/yhp3an7 [2] http://tinyurl.com/yzga2hm
A: Yes on some reports I do include smaller percentages for the largest Wikipedia's as those represent significant numbers of page views. I used different (and arbitrary) thresholds per report. The arbitrariness could change, but I want to plead for a notoriety threshold:
Here is a much more extended version of the breakdown report [1] (for this discussion only) It shows per country up to 50 Wikipedia's An extra column shows the total number of records for this country/language (for the 6 month period) on which the percentage is based. As you can see for the smallest countries that number is so low that it is no longer significant.
Let us say we cut off not at 1%, but at an (arbitrary) absolute threshold of x logged records per country/language pair (per row). Let us say we cut off at average 5 records per month. Everything below that threshold in the test report is in dark red. Personally I think this is still way too much detail for a general report. Not because of Kb's but information overload.
[1] http://tinyurl.com/yjwoyre
foundation-l mailing list foundation-l@lists.wikimedia.org Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l
Dear Erik,
Maybe there is a dirty Polish word looked up by many Polish pupils, and when they Google it they come to eu.WP because a Basque word accidentally is alike? :-)
I am looking now for the interest in the native / the English Wikipedia in specific countries. It might be important how localized the software in general is. If you live in, say, Kenya, and your computer has Windows in English, the Internet Explorer and everything is oriented to English, and you google your home town in an English language Google, it is probable that you will get the Wikipedia article in English and not in Swahili.
Kind regards Ziko
2010/1/16 Mark Williamson node.ue@gmail.com:
I notice in that list both Belarusian Wikipedias are listed just as "Belarusian Wikipedia". It would be very informative to know which is which and to have visitor statistics on both :-)
skype: node.ue
On Fri, Jan 15, 2010 at 3:39 PM, Erik Zachte erikzachte@infodisiac.comwrote:
Here is a Q&A on all issues raised: Q=question/R=Remark, A=answer
I put the more general questions on top.
Cheers, Erik Zachte
Q: Nikola Smolenski Is it first time these reports are published?
A: Yes, expect trend report to grow by accretion over time. Other reports will be built from data for recent (6) months only
R: Andrew Gray Andrew explains why distribution of page requests over countries favors Spanish and Portuguese speaking countries: 'Some Wikipedias - the ones which insist on only-free-images - do not use local uploads at all.'
A: Thanks for explaining this unexpected distribution of page views on Commons, I had no idea.
Spain 30.0% USA 29.2% Brazil 8.5% Argentina 4.8% Mexico 3.9% Germany 3.3% France 2.1% Venezuela 1.9% Chile 1.4% Costa Rica 1.4% Italy 1.4% Uruguay 1.2% Colombia 1.2% Portugal 1.1%
R: Mark Williamson
Two main factors influencing choice of Wikipedia language: # Fluency of the Internet-using population of a country in English. # Quality of the native Wikipedia.
A: Like you say. Many Scandinavians (and Dutch people I might add) probably switch between English and local content all the time. Personally I tend to look at English Wp first I many instances, because of obviously richer content and larger depth.
Q: Ziko van Dijk Why are 40 % of the visitors of ksh.WP (the dialect of Cologne) from Japan. Why are 25 % of the visitors of eu.WP (Basque) from Poland?
Q: Andre Engels I think bots are a likely explanation in the eu case (unless Erik is using an algorithm that filters out bots)
A: KSH used to be code for Kashmir. Still not Japan, but much closer than Cologne. Maybe Japanese mountaineers caused this spike ? (only half kidding)
As for eu.wp: Would Polish presume there also is a European Wikipedia? Just a guess.
I do filter bots
R: Teun Spaans For trends, I would expect a bar indicating upward or downward trend, not a percentage bar.
A: We can have both, a notion of importance and of change: I might color code cells as I do already in e.g. [1] This way large fluctuations really stand out. Let's first collect more history.
[1] http://stats.wikimedia.org/EN/TablesPageViewsMonthly.htm
Q: Nikola Smolenski Could we get this for other projects?
A: This question is of course not unexpected. One consideration is we need a certain sample size to make numbers significant. For other projects, with far less traffic, few country/language pairs would be backed by sufficient data. See also below on extending the current reports with more table rows.
Q: Nikola Smolenski: Please include at Wikipedia Page Views Per Country - Overview [1] number of Internet users from [2], and number of views per Internet user?
[1] http://tinyurl.com/yk43aq6 [2] http://tinyurl.com/yfv5bwn
A: Done
R: Nikola Smolenski It is obvious why Slovene Wikipedia is highly visited in Sierra Leone, and Serbian in Suriname; URLs do matter :) Although, I don't understand why so much. I would expect this distribution by visitors, perhaps, but not by visits.
A: Very interesting observation! So people from Sierra Leone try 'sl.wikipedia.org'. Why people from Surinam go to 'sr.wikimedia.org' is only slightly less obvious to me, but apparently is happens
For countries with just a few hits in the sampled log the distinction between visitors and visits gets blurred.
R: Andre Engels Ukrainian is not a small language by any means, yet Wikipedia visitors tend to be drawn to the Russian Wikipedia instead.
A: Yes but article growth in Ukrainian Wikipedia has been speeding up in recent months. [1]
[1] http://stats.wikimedia.org/EN/TablesWikipediaUK.htm
R: Andre Engels The Q3-Q4 comparison for most countries shows a shift from English to the 'vernacular'.
A: Interesting analysis. Let's see if this is a consistent trend. However the monthly page views per Wikipedia language for which we have 2 year history do not show very significant shift from large to smaller wikipedia's. See table 'Distribution of page views' at bottom of page of [1]: smaller languages gain in share of page views, but very slowly.
[1] http://stats.wikimedia.org/EN/TablesPageViewsMonthly.htm
Q: Nikola Smolenski / Milos Rancic At Wikipedia Page Views By Country - Breakdown [1] and Wikipedia Page Views By Country - Trends [2] could you include more languages (ideally all languages)? Some of the numbers are going below 0.1% of population, but some of them are not mentioned even they are larger than 0.5% of population.
[1] http://tinyurl.com/yhp3an7 [2] http://tinyurl.com/yzga2hm
A: Yes on some reports I do include smaller percentages for the largest Wikipedia's as those represent significant numbers of page views. I used different (and arbitrary) thresholds per report. The arbitrariness could change, but I want to plead for a notoriety threshold:
Here is a much more extended version of the breakdown report [1] (for this discussion only) It shows per country up to 50 Wikipedia's An extra column shows the total number of records for this country/language (for the 6 month period) on which the percentage is based. As you can see for the smallest countries that number is so low that it is no longer significant.
Let us say we cut off not at 1%, but at an (arbitrary) absolute threshold of x logged records per country/language pair (per row). Let us say we cut off at average 5 records per month. Everything below that threshold in the test report is in dark red. Personally I think this is still way too much detail for a general report. Not because of Kb's but information overload.
[1] http://tinyurl.com/yjwoyre
foundation-l mailing list foundation-l@lists.wikimedia.org Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l
foundation-l mailing list foundation-l@lists.wikimedia.org Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l
Sociolinguistic situations around the world are very complex I think. In especially former European colonies, of which Kenya is but one example, the language of the former colonial power often has a unique position in society.
It is not surprising to me that the English Wikipedia is so popular compared to any other in Kenya, but it is quite a bit more surprising that Korean, Romanian, Bulgarian, Lithuanian, Iranian, etc. users prefer the English Wikipedia.
Mark
On Sat, Jan 16, 2010 at 2:25 AM, Ziko van Dijk zvandijk@googlemail.comwrote:
Dear Erik,
Maybe there is a dirty Polish word looked up by many Polish pupils, and when they Google it they come to eu.WP because a Basque word accidentally is alike? :-)
I am looking now for the interest in the native / the English Wikipedia in specific countries. It might be important how localized the software in general is. If you live in, say, Kenya, and your computer has Windows in English, the Internet Explorer and everything is oriented to English, and you google your home town in an English language Google, it is probable that you will get the Wikipedia article in English and not in Swahili.
Kind regards Ziko
2010/1/16 Mark Williamson node.ue@gmail.com:
I notice in that list both Belarusian Wikipedias are listed just as "Belarusian Wikipedia". It would be very informative to know which is
which
and to have visitor statistics on both :-)
skype: node.ue
On Fri, Jan 15, 2010 at 3:39 PM, Erik Zachte <erikzachte@infodisiac.com wrote:
Here is a Q&A on all issues raised: Q=question/R=Remark, A=answer
I put the more general questions on top.
Cheers, Erik Zachte
Q: Nikola Smolenski Is it first time these reports are published?
A: Yes, expect trend report to grow by accretion over time. Other reports will be built from data for recent (6) months only
R: Andrew Gray Andrew explains why distribution of page requests over countries favors Spanish and Portuguese speaking countries: 'Some Wikipedias - the ones which insist on only-free-images - do not
use
local uploads at all.'
A: Thanks for explaining this unexpected distribution of page views on Commons, I had no idea.
Spain 30.0% USA 29.2% Brazil 8.5% Argentina 4.8% Mexico 3.9% Germany 3.3% France 2.1% Venezuela 1.9% Chile 1.4% Costa Rica 1.4% Italy 1.4% Uruguay 1.2% Colombia 1.2% Portugal 1.1%
R: Mark Williamson
Two main factors influencing choice of Wikipedia language: # Fluency of the Internet-using population of a country in English. # Quality of the native Wikipedia.
A: Like you say. Many Scandinavians (and Dutch people I might add) probably switch between English and local content all the time. Personally I tend to look at English Wp first I many instances, because
of
obviously richer content and larger depth.
Q: Ziko van Dijk Why are 40 % of the visitors of ksh.WP (the dialect of Cologne) from
Japan.
Why are 25 % of the visitors of eu.WP (Basque) from Poland?
Q: Andre Engels I think bots are a likely explanation in the eu case (unless Erik is using an algorithm that filters out bots)
A: KSH used to be code for Kashmir. Still not Japan, but much closer than Cologne. Maybe Japanese mountaineers caused this spike ? (only half kidding)
As for eu.wp: Would Polish presume there also is a European Wikipedia?
Just
a guess.
I do filter bots
R: Teun Spaans For trends, I would expect a bar indicating upward or downward trend,
not a
percentage bar.
A: We can have both, a notion of importance and of change: I might color
code
cells as I do already in e.g. [1] This way large fluctuations really stand out. Let's first collect more history.
[1] http://stats.wikimedia.org/EN/TablesPageViewsMonthly.htm
Q: Nikola Smolenski Could we get this for other projects?
A: This question is of course not unexpected. One consideration is we need a certain sample size to make numbers significant. For other projects, with far less traffic, few country/language pairs
would
be backed by sufficient data. See also below on extending the current reports with more table rows.
Q: Nikola Smolenski: Please include at Wikipedia Page Views Per Country - Overview [1] number
of
Internet users from [2], and number of views per Internet user?
[1] http://tinyurl.com/yk43aq6 [2] http://tinyurl.com/yfv5bwn
A: Done
R: Nikola Smolenski It is obvious why Slovene Wikipedia is highly visited in Sierra Leone,
and
Serbian in Suriname; URLs do matter :) Although, I don't understand why so much. I would expect this
distribution
by visitors, perhaps, but not by visits.
A: Very interesting observation! So people from Sierra Leone try 'sl.wikipedia.org'. Why people from Surinam go to 'sr.wikimedia.org' is only slightly less obvious to me, but apparently is happens
For countries with just a few hits in the sampled log the distinction between visitors and visits gets blurred.
R: Andre Engels Ukrainian is not a small language by any means, yet Wikipedia visitors
tend
to be drawn to the Russian Wikipedia instead.
A: Yes but article growth in Ukrainian Wikipedia has been speeding up in recent months. [1]
[1] http://stats.wikimedia.org/EN/TablesWikipediaUK.htm
R: Andre Engels The Q3-Q4 comparison for most countries shows a shift from English to
the
'vernacular'.
A: Interesting analysis. Let's see if this is a consistent trend. However the monthly page views per Wikipedia language for which we have
2
year history do not show very significant shift from large to smaller wikipedia's. See table 'Distribution of page views' at bottom of page of [1]: smaller languages gain in share of page views, but very slowly.
[1] http://stats.wikimedia.org/EN/TablesPageViewsMonthly.htm
Q: Nikola Smolenski / Milos Rancic At Wikipedia Page Views By Country - Breakdown [1] and Wikipedia Page
Views
By Country - Trends [2] could you include more languages (ideally all languages)? Some of the numbers are going below 0.1% of population, but some of them are not mentioned even they are larger than 0.5% of population.
[1] http://tinyurl.com/yhp3an7 [2] http://tinyurl.com/yzga2hm
A: Yes on some reports I do include smaller percentages for the largest Wikipedia's as those represent significant numbers of page views. I used different (and arbitrary) thresholds per report. The
arbitrariness
could change, but I want to plead for a notoriety threshold:
Here is a much more extended version of the breakdown report [1] (for
this
discussion only) It shows per country up to 50 Wikipedia's An extra column shows the total number of records for this
country/language
(for the 6 month period) on which the percentage is based. As you can see for the smallest countries that number is so low that it
is
no longer significant.
Let us say we cut off not at 1%, but at an (arbitrary) absolute
threshold
of x logged records per country/language pair (per row). Let us say we cut off at average 5 records per month. Everything below
that
threshold in the test report is in dark red. Personally I think this is still way too much detail for a general
report.
Not because of Kb's but information overload.
[1] http://tinyurl.com/yjwoyre
foundation-l mailing list foundation-l@lists.wikimedia.org Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l
foundation-l mailing list foundation-l@lists.wikimedia.org Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l
-- Ziko van Dijk NL-Silvolde
foundation-l mailing list foundation-l@lists.wikimedia.org Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l
Дана Saturday 16 January 2010 10:40:06 Mark Williamson написа:
It is not surprising to me that the English Wikipedia is so popular compared to any other in Kenya, but it is quite a bit more surprising that Korean, Romanian, Bulgarian, Lithuanian, Iranian, etc. users prefer the English Wikipedia.
I don't think that they would prefer it, it's just that it covers much more topics, and generally covers the topics in much more depth.
I believe that I am fairly fluent in English, and yet I prefer to read Serbian Wikipedia, if I know that the topic is covered there and the article is better than the English one.
Next thing to do: Wikipedia Page Views By Country - Breakdown Adjusted by Wikipedia Size. Erik, are you planning to do this one as well? :D
Дана Saturday 16 January 2010 12:25:58 Nikola Smolenski написа:
Дана Saturday 16 January 2010 10:40:06 Mark Williamson написа:
It is not surprising to me that the English Wikipedia is so popular compared to any other in Kenya, but it is quite a bit more surprising that Korean, Romanian, Bulgarian, Lithuanian, Iranian, etc. users prefer the English Wikipedia.
Next thing to do: Wikipedia Page Views By Country - Breakdown Adjusted by Wikipedia Size. Erik, are you planning to do this one as well? :D
Did it: http://smolenski.rs/blog/2010/01/wikipedia-page-views-by-country-breakdown-w...
On Fri, Jan 15, 2010 at 11:39 PM, Erik Zachte erikzachte@infodisiac.com wrote:
Q: Nikola Smolenski / Milos Rancic At Wikipedia Page Views By Country - Breakdown [1] and Wikipedia Page Views By Country - Trends [2] could you include more languages (ideally all languages)? Some of the numbers are going below 0.1% of population, but some of them are not mentioned even they are larger than 0.5% of population.
[1] http://tinyurl.com/yhp3an7 [2] http://tinyurl.com/yzga2hm
A: Yes on some reports I do include smaller percentages for the largest Wikipedia's as those represent significant numbers of page views. I used different (and arbitrary) thresholds per report. The arbitrariness could change, but I want to plead for a notoriety threshold:
Here is a much more extended version of the breakdown report [1] (for this discussion only) It shows per country up to 50 Wikipedia's An extra column shows the total number of records for this country/language (for the 6 month period) on which the percentage is based. As you can see for the smallest countries that number is so low that it is no longer significant.
Let us say we cut off not at 1%, but at an (arbitrary) absolute threshold of x logged records per country/language pair (per row). Let us say we cut off at average 5 records per month. Everything below that threshold in the test report is in dark red. Personally I think this is still way too much detail for a general report. Not because of Kb's but information overload.
Detailed statistics have two very important values: * The first one is chapter-related. I want to know more details about tendencies in Serbia, so I would be able: (1) to analyze what is going on and what WM RS did; (2) to make a media event based on statistics. * The other value is of general sociolinguistic value. I may trace up to some extent where do speakers of some language live, what is the percentage of internet adoption (actually, Wikipedia adoption); all of that in comparison with, let's say, GDP, number of inhabitants and so on.
It would be great if you put some periodic job which would create such statistics at the end of every month. For example, I would really like to know about the trends in the past 6 months.
I noticed in your quarterly report that share of Serbian language in Serbia is raising. It is very important because it shows one (or both) of two things: Serbian Wikipedia quality is raising or/and Internet adoption among those who don't know English well enough is raising. If number of visits to English Wikipedia is stable enough, it is about the second; if number of visits is lower than previous, it is about first; and so on.
Also, I would like to know is it seasonal: which numbers are about tourists, and which are about general population behavior.
So, while such statistics are truly an information overload for creation of a general report, they are very valuable for particular reports.
Дана Friday 15 January 2010 23:39:38 Erik Zachte написа:
R: Nikola Smolenski It is obvious why Slovene Wikipedia is highly visited in Sierra Leone, and Serbian in Suriname; URLs do matter :) Although, I don't understand why so much. I would expect this distribution by visitors, perhaps, but not by visits.
A: Very interesting observation! So people from Sierra Leone try 'sl.wikipedia.org'. Why people from Surinam go to 'sr.wikimedia.org' is only slightly less obvious to me, but apparently is happens
ISO 3166-1 code for Surinam is 'sr'.
Дана Friday 15 January 2010 23:39:38 Erik Zachte написа:
Here is a much more extended version of the breakdown report [1] (for this discussion only) It shows per country up to 50 Wikipedia's An extra column shows the total number of records for this country/language (for the 6 month period) on which the percentage is based.
What exactly is this number of records? Thousands of visits?
I have one more question: how many log entries from italian IPs were discarded? Were they more than in other countries? In Italy, 13 % of broadband users are Fastweb customers. Fastweb uses NATs so that each Fastweb IP is normally used by hundreds of users. Moreover, Fastweb customers are often internet heavy-users. Hence, if you exclude all IPs logged twice you could exclude all Fastweb users (i.e. at least 13 % of italian broadband traffic)...
Nemo
wikimedia-l@lists.wikimedia.org