Hi Oliver,
Interesting dataset! I am curious about why the Danish Wikipedia is so
highly acccessed from Sweden. Could it be an error, e.g., with Telia
IP-numbers?
In Python:
>> import pandas as pd
>> df =
pd.read_csv('http://files.figshare.com/1923822/language_pageviews_per_c…tsv',
sep='\t')
>> df.ix[df.project ==
'da.wikipedia.org', ['country',
'pageviews_percentage']].set_index('country')
pageviews_percentage
country
Austria 1
China 1
Denmark 61
Estonia 1
France 1
Germany 2
Netherlands 2
Norway 1
Sweden 18
United Kingdom 3
United States 3
Other 5
MaxMind has some numbers on their own accuracy:
https://www.maxmind.com/en/geoip2-city-database-accuracy
For Denmark 85% is "Correctly Resolved", for Sweden only 68%. I wonder
if this really could bias the result so much.
If the numbers are correct why would the Swedish read the Danish
Wikipedia so much? Bots? It does not apply the other way around: Only 2%
of the traffic to Swedish Wikipedia comes from Denmark.
best regards
Finn
On 02/25/2015 10:06 PM, Oliver Keyes wrote:
Hey all!
We've released a highly-aggregated dataset of readership data -
specifically, data about where, geographically, traffic to each of our
projects (and all of our projects) comes from. The data can be found
at
http://dx.doi.org/10.6084/m9.figshare.1317408 - additionally, I've
put together an exploration tool for it at
https://ironholds.shinyapps.io/WhereInTheWorldIsWikipedia/
Hope it's useful to people!
--
Finn Årup Nielsen
http://people.compute.dtu.dk/faan/