'Lots, but that's not currently anyone's job'

On Wednesday, 4 March 2015, Dario Taraborelli <dtaraborelli@wikimedia.org> wrote:
yay, shiny! The map is a pretty compelling way to show how dominant traffic from the US is, even for very minor languages (say bi.wikipedia.org), I wonder how many requests from US-based bots/automata we’re still failing to detect.

> On Mar 3, 2015, at 9:29 PM, Oliver Keyes <okeyes@wikimedia.org> wrote:
>
> Update: the original Shiny instance went down due to server load soon
> after release. It's now up again at http://datavis.wmflabs.org/where/
> on a dedicated Labs machine, where we hope to put...many more
> visualisations. It also now has mapping, largely thanks to Sarah
> Laplante (http://sarahlaplante.com/), and soon it will hopefully be
> /non-hideous/ mapping (the current mass of blue and grey is because my
> aesthetic tastes are...I don't actually have any aesthetic tastes)
>
> On 2 March 2015 at 22:36, Oliver Keyes <okeyes@wikimedia.org> wrote:
>> Indeed! Orienting it that way (pivoting on language rather than
>> project) is something several people have asked for; I plan to spend a
>> chunk of my spare time (that is, recreational time) trying to make it
>> work. Should be fairly trivial.
>>
>> On 2 March 2015 at 09:55, h <hanteng@gmail.com> wrote:
>>> Hello Finn,
>>>   I do not have a specific answer to your question. However, it might be
>>> worthwhile to add Finnish in to the comparison as according to the CLDR 26
>>> T-L information
>>> http://www.unicode.org/cldr/charts/26/supplemental/territory_language_information.html
>>>
>>>   You have some sizable Finnish language speakers in Sweden:
>>>
>>> Swedish {O} sv 95.0% 99.0%
>>> Finnish {OR} fi 2.2%
>>>
>>>    So if the similar query is executed on Finnish language, and the results
>>> also show some "undue" proportion of visits from Sweden, then what you
>>> observed as anomaly is the that unique. We probably need many iterations of
>>> comparative outcomes and normalization of data (Sweden does have higher
>>> population).  Also, it might be handy to have some statistics on immigration
>>> or residence, it is EU. I will not be surprised that for example the  visits
>>> from Oxford to Wikipedia website have sizable German language requests.
>>>
>>>    I am still a bit bothered by the number "1" in the current dataset. It
>>> does not feel right since the numbers of 1.4% and 0.6% is a notable
>>> difference in this regard. Perhaps we need some high precision "universal
>>> percentage" number for each territory-language pair. It would be also great
>>> to do another set of aggregation: i.e. given a territory, which language
>>> versions of Wikipedia are accessed....
>>>
>>> Best,
>>> han-teng liao
>>>
>>> 2015-03-02 13:54 GMT+01:00 Finn Årup Nielsen <fn@imm.dtu.dk>:
>>>>
>>>> Hi Oliver,
>>>>
>>>>
>>>> Interesting dataset! I am curious about why the Danish Wikipedia is so
>>>> highly acccessed from Sweden. Could it be an error, e.g., with Telia
>>>> IP-numbers?
>>>>
>>>> In Python:
>>>>
>>>>>>> import pandas as pd
>>>>>>> df =
>>>>>>> pd.read_csv('http://files.figshare.com/1923822/language_pageviews_per_country.tsv',
>>>>>>> sep='\t')
>>>>>>> df.ix[df.project == 'da.wikipedia.org', ['country',
>>>>>>> 'pageviews_percentage']].set_index('country') pageviews_percentage
>>>> country
>>>> Austria                            1
>>>> China                              1
>>>> Denmark                           61
>>>> Estonia                            1
>>>> France                             1
>>>> Germany                            2
>>>> Netherlands                        2
>>>> Norway                             1
>>>> Sweden                            18
>>>> United Kingdom                     3
>>>> United States                      3
>>>> Other                              5
>>>>
>>>>
>>>> MaxMind has some numbers on their own accuracy:
>>>>
>>>> https://www.maxmind.com/en/geoip2-city-database-accuracy
>>>>
>>>> For Denmark 85% is "Correctly Resolved", for Sweden only 68%. I wonder if
>>>> this really could bias the result so much.
>>>>
>>>> If the numbers are correct why would the Swedish read the Danish Wikipedia
>>>> so much? Bots? It does not apply the other way around: Only 2% of the
>>>> traffic to Swedish Wikipedia comes from Denmark.
>>>>
>>>>
>>>>
>>>> best regards
>>>> Finn
>>>>
>>>>
>>>>
>>>> On 02/25/2015 10:06 PM, Oliver Keyes wrote:
>>>>>
>>>>> Hey all!
>>>>>
>>>>> We've released a highly-aggregated dataset of readership data -
>>>>> specifically, data about where, geographically, traffic to each of our
>>>>> projects (and all of our projects) comes from. The data can be found
>>>>> at http://dx.doi.org/10.6084/m9.figshare.1317408 - additionally, I've
>>>>> put together an exploration tool for it at
>>>>> https://ironholds.shinyapps.io/WhereInTheWorldIsWikipedia/
>>>>>
>>>>> Hope it's useful to people!
>>>>>
>>>>
>>>>
>>>> --
>>>> Finn Årup Nielsen
>>>> http://people.compute.dtu.dk/faan/
>>>>
>>>>
>>>> _______________________________________________
>>>> Wiki-research-l mailing list
>>>> Wiki-research-l@lists.wikimedia.org
>>>> https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
>>>
>>>
>>>
>>> _______________________________________________
>>> Wiki-research-l mailing list
>>> Wiki-research-l@lists.wikimedia.org
>>> https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
>>>
>>
>>
>>
>> --
>> Oliver Keyes
>> Research Analyst
>> Wikimedia Foundation
>
>
>
> --
> Oliver Keyes
> Research Analyst
> Wikimedia Foundation
>
> _______________________________________________
> Wiki-research-l mailing list
> Wiki-research-l@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wiki-research-l


_______________________________________________
Wiki-research-l mailing list
Wiki-research-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l


--
Sent from my mobile computing device of Lovecraftian complexity and horror.