'Lots, but that's not currently anyone's job'
On Wednesday, 4 March 2015, Dario Taraborelli <dtaraborelli(a)wikimedia.org>
wrote:
yay, shiny! The map is a pretty compelling way to show
how dominant
traffic from the US is, even for very minor languages (say
bi.wikipedia.org), I wonder how many requests from US-based bots/automata
we’re still failing to detect.
On Mar 3, 2015, at 9:29 PM, Oliver Keyes
<okeyes(a)wikimedia.org
<javascript:;>> wrote:
Update: the original Shiny instance went down due to server load soon
after release. It's now up again at
http://datavis.wmflabs.org/where/
on a dedicated Labs machine, where we hope to put...many more
visualisations. It also now has mapping, largely thanks to Sarah
Laplante (
http://sarahlaplante.com/), and soon it will hopefully be
/non-hideous/ mapping (the current mass of blue and grey is because my
aesthetic tastes are...I don't actually have any aesthetic tastes)
On 2 March 2015 at 22:36, Oliver Keyes <okeyes(a)wikimedia.org
<javascript:;>> wrote:
> Indeed! Orienting it that way (pivoting on
language rather than
> project) is something several people have asked for; I plan to spend a
> chunk of my spare time (that is, recreational time) trying to make it
> work. Should be fairly trivial.
>
> On 2 March 2015 at 09:55, h <hanteng(a)gmail.com <javascript:;>> wrote:
>> Hello Finn,
>> I do not have a specific answer to your question. However, it might
be
>> worthwhile to add Finnish in to the
comparison as according to the
CLDR 26
>> T-L information
>>
http://www.unicode.org/cldr/charts/26/supplemental/territory_language_infor…
>>
>> You have some sizable Finnish language speakers in Sweden:
>>
>> Swedish {O} sv 95.0% 99.0%
>> Finnish {OR} fi 2.2%
>>
>> So if the similar query is executed on Finnish language, and the
results
>> also show some "undue"
proportion of visits from Sweden, then what you
>> observed as anomaly is the that unique. We probably need many
iterations
of
>> comparative outcomes and normalization of
data (Sweden does have higher
>> population). Also, it might be handy to have some statistics on
immigration
>> or residence, it is EU. I will not be
surprised that for example the
visits
>> from Oxford to Wikipedia website have
sizable German language requests.
>>
>> I am still a bit bothered by the number "1" in the current dataset.
It
>> does not feel right since the numbers of
1.4% and 0.6% is a notable
>> difference in this regard. Perhaps we need some high precision
"universal
>> percentage" number for each
territory-language pair. It would be also
great
>> to do another set of aggregation: i.e.
given a territory, which
language
>> versions of Wikipedia are accessed....
>>
>> Best,
>> han-teng liao
>>
>> 2015-03-02 13:54 GMT+01:00 Finn Årup Nielsen <fn(a)imm.dtu.dk
<javascript:;>>:
>>>
>>> Hi Oliver,
>>>
>>>
>>> Interesting dataset! I am curious about why the Danish Wikipedia is so
>>> highly acccessed from Sweden. Could it be an error, e.g., with Telia
>>> IP-numbers?
>>>
>>> In Python:
>>>
>>>>>> import pandas as pd
>>>>>> df =
>>>>>> pd.read_csv('
http://files.figshare.com/1923822/language_pageviews_per_country.tsv'#39;,
>>>>>> sep='\t')
>>>>>> df.ix[df.project == 'da.wikipedia.org',
['country',
>>>>>> 'pageviews_percentage']].set_index('country')
pageviews_percentage
>>> country
>>> Austria 1
>>> China 1
>>> Denmark 61
>>> Estonia 1
>>> France 1
>>> Germany 2
>>> Netherlands 2
>>> Norway 1
>>> Sweden 18
>>> United Kingdom 3
>>> United States 3
>>> Other 5
>>>
>>>
>>> MaxMind has some numbers on their own accuracy:
>>>
>>>
https://www.maxmind.com/en/geoip2-city-database-accuracy
>>>
>>> For Denmark 85% is "Correctly Resolved", for Sweden only 68%. I
wonder if
>>> this really could bias the result so
much.
>>>
>>> If the numbers are correct why would the Swedish read the Danish
Wikipedia
>>> so much? Bots? It does not apply the
other way around: Only 2% of the
>>> traffic to Swedish Wikipedia comes from Denmark.
>>>
>>>
>>>
>>> best regards
>>> Finn
>>>
>>>
>>>
>>> On 02/25/2015 10:06 PM, Oliver Keyes wrote:
>>>>
>>>> Hey all!
>>>>
>>>> We've released a highly-aggregated dataset of readership data -
>>>> specifically, data about where, geographically, traffic to each of
our
>>>> projects (and all of our
projects) comes from. The data can be found
>>>> at
http://dx.doi.org/10.6084/m9.figshare.1317408 - additionally,
I've
>> put together an exploration tool for it at
>>
https://ironholds.shinyapps.io/WhereInTheWorldIsWikipedia/
>>
>> Hope it's useful to people!
>>
>
>
> --
> Finn Årup Nielsen
>
http://people.compute.dtu.dk/faan/
>
>
> _______________________________________________
> Wiki-research-l mailing list
> Wiki-research-l(a)lists.wikimedia.org <javascript:;>
>
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
_______________________________________________
Wiki-research-l mailing list
Wiki-research-l(a)lists.wikimedia.org <javascript:;>
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
--
Oliver Keyes
Research Analyst
Wikimedia Foundation
--
Oliver Keyes
Research Analyst
Wikimedia Foundation
_______________________________________________
Wiki-research-l mailing list
Wiki-research-l(a)lists.wikimedia.org <javascript:;>
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
_______________________________________________
Wiki-research-l mailing list
Wiki-research-l(a)lists.wikimedia.org <javascript:;>
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
--
Sent from my mobile computing device of Lovecraftian complexity and horror.