On Fri, Jan 15, 2010 at 11:39 PM, Erik Zachte erikzachte@infodisiac.com wrote:
Q: Nikola Smolenski / Milos Rancic At Wikipedia Page Views By Country - Breakdown [1] and Wikipedia Page Views By Country - Trends [2] could you include more languages (ideally all languages)? Some of the numbers are going below 0.1% of population, but some of them are not mentioned even they are larger than 0.5% of population.
[1] http://tinyurl.com/yhp3an7 [2] http://tinyurl.com/yzga2hm
A: Yes on some reports I do include smaller percentages for the largest Wikipedia's as those represent significant numbers of page views. I used different (and arbitrary) thresholds per report. The arbitrariness could change, but I want to plead for a notoriety threshold:
Here is a much more extended version of the breakdown report [1] (for this discussion only) It shows per country up to 50 Wikipedia's An extra column shows the total number of records for this country/language (for the 6 month period) on which the percentage is based. As you can see for the smallest countries that number is so low that it is no longer significant.
Let us say we cut off not at 1%, but at an (arbitrary) absolute threshold of x logged records per country/language pair (per row). Let us say we cut off at average 5 records per month. Everything below that threshold in the test report is in dark red. Personally I think this is still way too much detail for a general report. Not because of Kb's but information overload.
Detailed statistics have two very important values: * The first one is chapter-related. I want to know more details about tendencies in Serbia, so I would be able: (1) to analyze what is going on and what WM RS did; (2) to make a media event based on statistics. * The other value is of general sociolinguistic value. I may trace up to some extent where do speakers of some language live, what is the percentage of internet adoption (actually, Wikipedia adoption); all of that in comparison with, let's say, GDP, number of inhabitants and so on.
It would be great if you put some periodic job which would create such statistics at the end of every month. For example, I would really like to know about the trends in the past 6 months.
I noticed in your quarterly report that share of Serbian language in Serbia is raising. It is very important because it shows one (or both) of two things: Serbian Wikipedia quality is raising or/and Internet adoption among those who don't know English well enough is raising. If number of visits to English Wikipedia is stable enough, it is about the second; if number of visits is lower than previous, it is about first; and so on.
Also, I would like to know is it seasonal: which numbers are about tourists, and which are about general population behavior.
So, while such statistics are truly an information overload for creation of a general report, they are very valuable for particular reports.