Hey all!
We've released a highly-aggregated dataset of readership data - specifically, data about where, geographically, traffic to each of our projects (and all of our projects) comes from. The data can be found at http://dx.doi.org/10.6084/m9.figshare.1317408 - additionally, I've put together an exploration tool for it at https://ironholds.shinyapps.io/WhereInTheWorldIsWikipedia/
Hope it's useful to people!
Very nice. Do you think that you could pick out a few of your favorite graphs and add them to this week's Recent Research report in a gallery?
Thanks! Pine Hey all!
We've released a highly-aggregated dataset of readership data - specifically, data about where, geographically, traffic to each of our projects (and all of our projects) comes from. The data can be found at http://dx.doi.org/10.6084/m9.figshare.1317408 - additionally, I've put together an exploration tool for it at https://ironholds.shinyapps.io/WhereInTheWorldIsWikipedia/
Hope it's useful to people!
-- Oliver Keyes Research Analyst Wikimedia Foundation
_______________________________________________ Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics
Totally! I'm also going to get together with some NEU hackers tomorrow and work on actually visualising the data on *drumroll* maps, which'd probably be more interesting eye candy than infinite bar plots :)
On 25 February 2015 at 16:19, Pine W wiki.pine@gmail.com wrote:
Very nice. Do you think that you could pick out a few of your favorite graphs and add them to this week's Recent Research report in a gallery?
Thanks! Pine
Hey all!
We've released a highly-aggregated dataset of readership data - specifically, data about where, geographically, traffic to each of our projects (and all of our projects) comes from. The data can be found at http://dx.doi.org/10.6084/m9.figshare.1317408 - additionally, I've put together an exploration tool for it at https://ironholds.shinyapps.io/WhereInTheWorldIsWikipedia/
Hope it's useful to people!
-- Oliver Keyes Research Analyst Wikimedia Foundation
Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics
Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics
Excellent!
Pine On Feb 25, 2015 1:26 PM, "Oliver Keyes" okeyes@wikimedia.org wrote:
Totally! I'm also going to get together with some NEU hackers tomorrow and work on actually visualising the data on *drumroll* maps, which'd probably be more interesting eye candy than infinite bar plots :)
On 25 February 2015 at 16:19, Pine W wiki.pine@gmail.com wrote:
Very nice. Do you think that you could pick out a few of your favorite graphs and add them to this week's Recent Research report in a gallery?
Thanks! Pine
Hey all!
We've released a highly-aggregated dataset of readership data - specifically, data about where, geographically, traffic to each of our projects (and all of our projects) comes from. The data can be found at http://dx.doi.org/10.6084/m9.figshare.1317408 - additionally, I've put together an exploration tool for it at https://ironholds.shinyapps.io/WhereInTheWorldIsWikipedia/
Hope it's useful to people!
-- Oliver Keyes Research Analyst Wikimedia Foundation
Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics
Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics
-- Oliver Keyes Research Analyst Wikimedia Foundation
Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics
Great job.
Who knew Esperanto was big in Japan and China at #2 and #3?
On Wed, Feb 25, 2015 at 4:06 PM, Oliver Keyes okeyes@wikimedia.org wrote:
Hey all!
We've released a highly-aggregated dataset of readership data - specifically, data about where, geographically, traffic to each of our projects (and all of our projects) comes from. The data can be found at http://dx.doi.org/10.6084/m9.figshare.1317408 - additionally, I've put together an exploration tool for it at https://ironholds.shinyapps.io/WhereInTheWorldIsWikipedia/
Hope it's useful to people!
-- Oliver Keyes Research Analyst Wikimedia Foundation
Wiki-research-l mailing list Wiki-research-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
The one major caveat, I think, is that the danger of proportionate data is that it makes small projects very vulnerable to artificial traffic spikes. I'd go out on a limb and say that some of the massive bumps in popularity we see in particular combinations are likely due to either undetected automata or simply a project having so little traffic that a small number of people can sway the results outlandishly.
On 25 February 2015 at 16:32, Andrew Lih andrew.lih@gmail.com wrote:
Great job.
Who knew Esperanto was big in Japan and China at #2 and #3?
On Wed, Feb 25, 2015 at 4:06 PM, Oliver Keyes okeyes@wikimedia.org wrote:
Hey all!
We've released a highly-aggregated dataset of readership data - specifically, data about where, geographically, traffic to each of our projects (and all of our projects) comes from. The data can be found at http://dx.doi.org/10.6084/m9.figshare.1317408 - additionally, I've put together an exploration tool for it at https://ironholds.shinyapps.io/WhereInTheWorldIsWikipedia/
Hope it's useful to people!
-- Oliver Keyes Research Analyst Wikimedia Foundation
Wiki-research-l mailing list Wiki-research-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
Wiki-research-l mailing list Wiki-research-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
This is really, really cool, great job guys!
G
Giovanni Luca Ciampaglia
✎ 919 E 10th ∙ Bloomington 47408 IN ∙ USA ☞ http://www.glciampaglia.com/ ✆ +1 812 855-7261 ✉ gciampag@indiana.edu
2015-02-25 16:06 GMT-05:00 Oliver Keyes okeyes@wikimedia.org:
Hey all!
We've released a highly-aggregated dataset of readership data - specifically, data about where, geographically, traffic to each of our projects (and all of our projects) comes from. The data can be found at http://dx.doi.org/10.6084/m9.figshare.1317408 - additionally, I've put together an exploration tool for it at https://ironholds.shinyapps.io/WhereInTheWorldIsWikipedia/
Hope it's useful to people!
-- Oliver Keyes Research Analyst Wikimedia Foundation
Wiki-research-l mailing list Wiki-research-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
Hi Oliver,
Interesting dataset! I am curious about why the Danish Wikipedia is so highly acccessed from Sweden. Could it be an error, e.g., with Telia IP-numbers?
In Python:
import pandas as pd df =
pd.read_csv('http://files.figshare.com/1923822/language_pageviews_per_country.tsv', sep='\t')
df.ix[df.project == 'da.wikipedia.org', ['country',
'pageviews_percentage']].set_index('country') pageviews_percentage country Austria 1 China 1 Denmark 61 Estonia 1 France 1 Germany 2 Netherlands 2 Norway 1 Sweden 18 United Kingdom 3 United States 3 Other 5
MaxMind has some numbers on their own accuracy:
https://www.maxmind.com/en/geoip2-city-database-accuracy
For Denmark 85% is "Correctly Resolved", for Sweden only 68%. I wonder if this really could bias the result so much.
If the numbers are correct why would the Swedish read the Danish Wikipedia so much? Bots? It does not apply the other way around: Only 2% of the traffic to Swedish Wikipedia comes from Denmark.
best regards Finn
On 02/25/2015 10:06 PM, Oliver Keyes wrote:
Hey all!
We've released a highly-aggregated dataset of readership data - specifically, data about where, geographically, traffic to each of our projects (and all of our projects) comes from. The data can be found at http://dx.doi.org/10.6084/m9.figshare.1317408 - additionally, I've put together an exploration tool for it at https://ironholds.shinyapps.io/WhereInTheWorldIsWikipedia/
Hope it's useful to people!
Hello Finn, I do not have a specific answer to your question. However, it might be worthwhile to add Finnish in to the comparison as according to the CLDR 26 T-L information http://www.unicode.org/cldr/charts/26/supplemental/territory_language_inform...
You have some sizable Finnish language speakers in Sweden:
Swedish {O} sv 95.0% 99.0% Finnish {OR} fi 2.2%
So if the similar query is executed on Finnish language, and the results also show some "undue" proportion of visits from Sweden, then what you observed as anomaly is the that unique. We probably need many iterations of comparative outcomes and normalization of data (Sweden does have higher population). Also, it might be handy to have some statistics on immigration or residence, it is EU. I will not be surprised that for example the visits from Oxford to Wikipedia website have sizable German language requests.
I am still a bit bothered by the number "1" in the current dataset. It does not feel right since the numbers of 1.4% and 0.6% is a notable difference in this regard. Perhaps we need some high precision "universal percentage" number for each territory-language pair. It would be also great to do another set of aggregation: i.e. given a territory, which language versions of Wikipedia are accessed....
Best, han-teng liao
2015-03-02 13:54 GMT+01:00 Finn Årup Nielsen fn@imm.dtu.dk:
Hi Oliver,
Interesting dataset! I am curious about why the Danish Wikipedia is so highly acccessed from Sweden. Could it be an error, e.g., with Telia IP-numbers?
In Python:
import pandas as pd df = pd.read_csv('http://files.figshare.com/1923822/language_
pageviews_per_country.tsv', sep='\t')
df.ix[df.project == 'da.wikipedia.org', ['country',
'pageviews_percentage']].set_index('country') pageviews_percentage country Austria 1 China 1 Denmark 61 Estonia 1 France 1 Germany 2 Netherlands 2 Norway 1 Sweden 18 United Kingdom 3 United States 3 Other 5
MaxMind has some numbers on their own accuracy:
https://www.maxmind.com/en/geoip2-city-database-accuracy
For Denmark 85% is "Correctly Resolved", for Sweden only 68%. I wonder if this really could bias the result so much.
If the numbers are correct why would the Swedish read the Danish Wikipedia so much? Bots? It does not apply the other way around: Only 2% of the traffic to Swedish Wikipedia comes from Denmark.
best regards Finn
On 02/25/2015 10:06 PM, Oliver Keyes wrote:
Hey all!
We've released a highly-aggregated dataset of readership data - specifically, data about where, geographically, traffic to each of our projects (and all of our projects) comes from. The data can be found at http://dx.doi.org/10.6084/m9.figshare.1317408 - additionally, I've put together an exploration tool for it at https://ironholds.shinyapps.io/WhereInTheWorldIsWikipedia/
Hope it's useful to people!
-- Finn Årup Nielsen http://people.compute.dtu.dk/faan/
Wiki-research-l mailing list Wiki-research-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
Indeed! Orienting it that way (pivoting on language rather than project) is something several people have asked for; I plan to spend a chunk of my spare time (that is, recreational time) trying to make it work. Should be fairly trivial.
On 2 March 2015 at 09:55, h hanteng@gmail.com wrote:
Hello Finn, I do not have a specific answer to your question. However, it might be worthwhile to add Finnish in to the comparison as according to the CLDR 26 T-L information http://www.unicode.org/cldr/charts/26/supplemental/territory_language_inform...
You have some sizable Finnish language speakers in Sweden:
Swedish {O} sv 95.0% 99.0% Finnish {OR} fi 2.2%
So if the similar query is executed on Finnish language, and the results
also show some "undue" proportion of visits from Sweden, then what you observed as anomaly is the that unique. We probably need many iterations of comparative outcomes and normalization of data (Sweden does have higher population). Also, it might be handy to have some statistics on immigration or residence, it is EU. I will not be surprised that for example the visits from Oxford to Wikipedia website have sizable German language requests.
I am still a bit bothered by the number "1" in the current dataset. It
does not feel right since the numbers of 1.4% and 0.6% is a notable difference in this regard. Perhaps we need some high precision "universal percentage" number for each territory-language pair. It would be also great to do another set of aggregation: i.e. given a territory, which language versions of Wikipedia are accessed....
Best, han-teng liao
2015-03-02 13:54 GMT+01:00 Finn Årup Nielsen fn@imm.dtu.dk:
Hi Oliver,
Interesting dataset! I am curious about why the Danish Wikipedia is so highly acccessed from Sweden. Could it be an error, e.g., with Telia IP-numbers?
In Python:
import pandas as pd df = pd.read_csv('http://files.figshare.com/1923822/language_pageviews_per_country.tsv', sep='\t') df.ix[df.project == 'da.wikipedia.org', ['country', 'pageviews_percentage']].set_index('country') pageviews_percentage
country Austria 1 China 1 Denmark 61 Estonia 1 France 1 Germany 2 Netherlands 2 Norway 1 Sweden 18 United Kingdom 3 United States 3 Other 5
MaxMind has some numbers on their own accuracy:
https://www.maxmind.com/en/geoip2-city-database-accuracy
For Denmark 85% is "Correctly Resolved", for Sweden only 68%. I wonder if this really could bias the result so much.
If the numbers are correct why would the Swedish read the Danish Wikipedia so much? Bots? It does not apply the other way around: Only 2% of the traffic to Swedish Wikipedia comes from Denmark.
best regards Finn
On 02/25/2015 10:06 PM, Oliver Keyes wrote:
Hey all!
We've released a highly-aggregated dataset of readership data - specifically, data about where, geographically, traffic to each of our projects (and all of our projects) comes from. The data can be found at http://dx.doi.org/10.6084/m9.figshare.1317408 - additionally, I've put together an exploration tool for it at https://ironholds.shinyapps.io/WhereInTheWorldIsWikipedia/
Hope it's useful to people!
-- Finn Årup Nielsen http://people.compute.dtu.dk/faan/
Wiki-research-l mailing list Wiki-research-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
Wiki-research-l mailing list Wiki-research-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
Update: the original Shiny instance went down due to server load soon after release. It's now up again at http://datavis.wmflabs.org/where/ on a dedicated Labs machine, where we hope to put...many more visualisations. It also now has mapping, largely thanks to Sarah Laplante (http://sarahlaplante.com/), and soon it will hopefully be /non-hideous/ mapping (the current mass of blue and grey is because my aesthetic tastes are...I don't actually have any aesthetic tastes)
On 2 March 2015 at 22:36, Oliver Keyes okeyes@wikimedia.org wrote:
Indeed! Orienting it that way (pivoting on language rather than project) is something several people have asked for; I plan to spend a chunk of my spare time (that is, recreational time) trying to make it work. Should be fairly trivial.
On 2 March 2015 at 09:55, h hanteng@gmail.com wrote:
Hello Finn, I do not have a specific answer to your question. However, it might be worthwhile to add Finnish in to the comparison as according to the CLDR 26 T-L information http://www.unicode.org/cldr/charts/26/supplemental/territory_language_inform...
You have some sizable Finnish language speakers in Sweden:
Swedish {O} sv 95.0% 99.0% Finnish {OR} fi 2.2%
So if the similar query is executed on Finnish language, and the results
also show some "undue" proportion of visits from Sweden, then what you observed as anomaly is the that unique. We probably need many iterations of comparative outcomes and normalization of data (Sweden does have higher population). Also, it might be handy to have some statistics on immigration or residence, it is EU. I will not be surprised that for example the visits from Oxford to Wikipedia website have sizable German language requests.
I am still a bit bothered by the number "1" in the current dataset. It
does not feel right since the numbers of 1.4% and 0.6% is a notable difference in this regard. Perhaps we need some high precision "universal percentage" number for each territory-language pair. It would be also great to do another set of aggregation: i.e. given a territory, which language versions of Wikipedia are accessed....
Best, han-teng liao
2015-03-02 13:54 GMT+01:00 Finn Årup Nielsen fn@imm.dtu.dk:
Hi Oliver,
Interesting dataset! I am curious about why the Danish Wikipedia is so highly acccessed from Sweden. Could it be an error, e.g., with Telia IP-numbers?
In Python:
import pandas as pd df = pd.read_csv('http://files.figshare.com/1923822/language_pageviews_per_country.tsv', sep='\t') df.ix[df.project == 'da.wikipedia.org', ['country', 'pageviews_percentage']].set_index('country') pageviews_percentage
country Austria 1 China 1 Denmark 61 Estonia 1 France 1 Germany 2 Netherlands 2 Norway 1 Sweden 18 United Kingdom 3 United States 3 Other 5
MaxMind has some numbers on their own accuracy:
https://www.maxmind.com/en/geoip2-city-database-accuracy
For Denmark 85% is "Correctly Resolved", for Sweden only 68%. I wonder if this really could bias the result so much.
If the numbers are correct why would the Swedish read the Danish Wikipedia so much? Bots? It does not apply the other way around: Only 2% of the traffic to Swedish Wikipedia comes from Denmark.
best regards Finn
On 02/25/2015 10:06 PM, Oliver Keyes wrote:
Hey all!
We've released a highly-aggregated dataset of readership data - specifically, data about where, geographically, traffic to each of our projects (and all of our projects) comes from. The data can be found at http://dx.doi.org/10.6084/m9.figshare.1317408 - additionally, I've put together an exploration tool for it at https://ironholds.shinyapps.io/WhereInTheWorldIsWikipedia/
Hope it's useful to people!
-- Finn Årup Nielsen http://people.compute.dtu.dk/faan/
Wiki-research-l mailing list Wiki-research-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
Wiki-research-l mailing list Wiki-research-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
-- Oliver Keyes Research Analyst Wikimedia Foundation
yay, shiny! The map is a pretty compelling way to show how dominant traffic from the US is, even for very minor languages (say bi.wikipedia.org), I wonder how many requests from US-based bots/automata we’re still failing to detect.
On Mar 3, 2015, at 9:29 PM, Oliver Keyes okeyes@wikimedia.org wrote:
Update: the original Shiny instance went down due to server load soon after release. It's now up again at http://datavis.wmflabs.org/where/ on a dedicated Labs machine, where we hope to put...many more visualisations. It also now has mapping, largely thanks to Sarah Laplante (http://sarahlaplante.com/), and soon it will hopefully be /non-hideous/ mapping (the current mass of blue and grey is because my aesthetic tastes are...I don't actually have any aesthetic tastes)
On 2 March 2015 at 22:36, Oliver Keyes okeyes@wikimedia.org wrote:
Indeed! Orienting it that way (pivoting on language rather than project) is something several people have asked for; I plan to spend a chunk of my spare time (that is, recreational time) trying to make it work. Should be fairly trivial.
On 2 March 2015 at 09:55, h hanteng@gmail.com wrote:
Hello Finn, I do not have a specific answer to your question. However, it might be worthwhile to add Finnish in to the comparison as according to the CLDR 26 T-L information http://www.unicode.org/cldr/charts/26/supplemental/territory_language_inform...
You have some sizable Finnish language speakers in Sweden:
Swedish {O} sv 95.0% 99.0% Finnish {OR} fi 2.2%
So if the similar query is executed on Finnish language, and the results also show some "undue" proportion of visits from Sweden, then what you observed as anomaly is the that unique. We probably need many iterations of comparative outcomes and normalization of data (Sweden does have higher population). Also, it might be handy to have some statistics on immigration or residence, it is EU. I will not be surprised that for example the visits from Oxford to Wikipedia website have sizable German language requests.
I am still a bit bothered by the number "1" in the current dataset. It does not feel right since the numbers of 1.4% and 0.6% is a notable difference in this regard. Perhaps we need some high precision "universal percentage" number for each territory-language pair. It would be also great to do another set of aggregation: i.e. given a territory, which language versions of Wikipedia are accessed....
Best, han-teng liao
2015-03-02 13:54 GMT+01:00 Finn Årup Nielsen fn@imm.dtu.dk:
Hi Oliver,
Interesting dataset! I am curious about why the Danish Wikipedia is so highly acccessed from Sweden. Could it be an error, e.g., with Telia IP-numbers?
In Python:
> import pandas as pd > df = > pd.read_csv('http://files.figshare.com/1923822/language_pageviews_per_country.tsv', > sep='\t') > df.ix[df.project == 'da.wikipedia.org', ['country', > 'pageviews_percentage']].set_index('country') pageviews_percentage
country Austria 1 China 1 Denmark 61 Estonia 1 France 1 Germany 2 Netherlands 2 Norway 1 Sweden 18 United Kingdom 3 United States 3 Other 5
MaxMind has some numbers on their own accuracy:
https://www.maxmind.com/en/geoip2-city-database-accuracy
For Denmark 85% is "Correctly Resolved", for Sweden only 68%. I wonder if this really could bias the result so much.
If the numbers are correct why would the Swedish read the Danish Wikipedia so much? Bots? It does not apply the other way around: Only 2% of the traffic to Swedish Wikipedia comes from Denmark.
best regards Finn
On 02/25/2015 10:06 PM, Oliver Keyes wrote:
Hey all!
We've released a highly-aggregated dataset of readership data - specifically, data about where, geographically, traffic to each of our projects (and all of our projects) comes from. The data can be found at http://dx.doi.org/10.6084/m9.figshare.1317408 - additionally, I've put together an exploration tool for it at https://ironholds.shinyapps.io/WhereInTheWorldIsWikipedia/
Hope it's useful to people!
-- Finn Årup Nielsen http://people.compute.dtu.dk/faan/
Wiki-research-l mailing list Wiki-research-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
Wiki-research-l mailing list Wiki-research-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
-- Oliver Keyes Research Analyst Wikimedia Foundation
-- Oliver Keyes Research Analyst Wikimedia Foundation
Wiki-research-l mailing list Wiki-research-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
'Lots, but that's not currently anyone's job'
On Wednesday, 4 March 2015, Dario Taraborelli dtaraborelli@wikimedia.org wrote:
yay, shiny! The map is a pretty compelling way to show how dominant traffic from the US is, even for very minor languages (say bi.wikipedia.org), I wonder how many requests from US-based bots/automata we’re still failing to detect.
On Mar 3, 2015, at 9:29 PM, Oliver Keyes <okeyes@wikimedia.org
javascript:;> wrote:
Update: the original Shiny instance went down due to server load soon after release. It's now up again at http://datavis.wmflabs.org/where/ on a dedicated Labs machine, where we hope to put...many more visualisations. It also now has mapping, largely thanks to Sarah Laplante (http://sarahlaplante.com/), and soon it will hopefully be /non-hideous/ mapping (the current mass of blue and grey is because my aesthetic tastes are...I don't actually have any aesthetic tastes)
On 2 March 2015 at 22:36, Oliver Keyes <okeyes@wikimedia.org
javascript:;> wrote:
Indeed! Orienting it that way (pivoting on language rather than project) is something several people have asked for; I plan to spend a chunk of my spare time (that is, recreational time) trying to make it work. Should be fairly trivial.
On 2 March 2015 at 09:55, h <hanteng@gmail.com javascript:;> wrote:
Hello Finn, I do not have a specific answer to your question. However, it might
be
worthwhile to add Finnish in to the comparison as according to the
CLDR 26
T-L information
http://www.unicode.org/cldr/charts/26/supplemental/territory_language_inform...
You have some sizable Finnish language speakers in Sweden:
Swedish {O} sv 95.0% 99.0% Finnish {OR} fi 2.2%
So if the similar query is executed on Finnish language, and the
results
also show some "undue" proportion of visits from Sweden, then what you observed as anomaly is the that unique. We probably need many
iterations of
comparative outcomes and normalization of data (Sweden does have higher population). Also, it might be handy to have some statistics on
immigration
or residence, it is EU. I will not be surprised that for example the
visits
from Oxford to Wikipedia website have sizable German language requests.
I am still a bit bothered by the number "1" in the current dataset.
It
does not feel right since the numbers of 1.4% and 0.6% is a notable difference in this regard. Perhaps we need some high precision
"universal
percentage" number for each territory-language pair. It would be also
great
to do another set of aggregation: i.e. given a territory, which
language
versions of Wikipedia are accessed....
Best, han-teng liao
2015-03-02 13:54 GMT+01:00 Finn Årup Nielsen <fn@imm.dtu.dk
Hi Oliver,
Interesting dataset! I am curious about why the Danish Wikipedia is so highly acccessed from Sweden. Could it be an error, e.g., with Telia IP-numbers?
In Python:
>> import pandas as pd >> df = >> pd.read_csv('
http://files.figshare.com/1923822/language_pageviews_per_country.tsv',
>> sep='\t') >> df.ix[df.project == 'da.wikipedia.org', ['country', >> 'pageviews_percentage']].set_index('country') pageviews_percentage
country Austria 1 China 1 Denmark 61 Estonia 1 France 1 Germany 2 Netherlands 2 Norway 1 Sweden 18 United Kingdom 3 United States 3 Other 5
MaxMind has some numbers on their own accuracy:
https://www.maxmind.com/en/geoip2-city-database-accuracy
For Denmark 85% is "Correctly Resolved", for Sweden only 68%. I
wonder if
this really could bias the result so much.
If the numbers are correct why would the Swedish read the Danish
Wikipedia
so much? Bots? It does not apply the other way around: Only 2% of the traffic to Swedish Wikipedia comes from Denmark.
best regards Finn
On 02/25/2015 10:06 PM, Oliver Keyes wrote:
Hey all!
We've released a highly-aggregated dataset of readership data - specifically, data about where, geographically, traffic to each of
our
projects (and all of our projects) comes from. The data can be found at http://dx.doi.org/10.6084/m9.figshare.1317408 - additionally,
I've
put together an exploration tool for it at https://ironholds.shinyapps.io/WhereInTheWorldIsWikipedia/
Hope it's useful to people!
-- Finn Årup Nielsen http://people.compute.dtu.dk/faan/
Wiki-research-l mailing list Wiki-research-l@lists.wikimedia.org javascript:; https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
Wiki-research-l mailing list Wiki-research-l@lists.wikimedia.org javascript:; https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
-- Oliver Keyes Research Analyst Wikimedia Foundation
-- Oliver Keyes Research Analyst Wikimedia Foundation
Wiki-research-l mailing list Wiki-research-l@lists.wikimedia.org javascript:; https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
Wiki-research-l mailing list Wiki-research-l@lists.wikimedia.org javascript:; https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
2015-03-04 8:44 GMT+01:00 Dario Taraborelli dtaraborelli@wikimedia.org:
yay, shiny! The map is a pretty compelling way to show how dominant traffic from the US is, even for very minor languages (say bi.wikipedia.org), I wonder how many requests from US-based bots/automata we’re still failing to detect.
Still, the question could be: are we fulfilling the mission? (hint: probably not)
Cristian
That is the question, and I agree with your conclusion. I'm hoping to do more research into this; getting buyin internally has been tough, but I'm confident of making progress on that front over the next few weeks and months.
On 4 March 2015 at 04:13, Cristian Consonni kikkocristian@gmail.com wrote:
2015-03-04 8:44 GMT+01:00 Dario Taraborelli dtaraborelli@wikimedia.org:
yay, shiny! The map is a pretty compelling way to show how dominant traffic from the US is, even for very minor languages (say bi.wikipedia.org), I wonder how many requests from US-based bots/automata we’re still failing to detect.
Still, the question could be: are we fulfilling the mission? (hint: probably not)
Cristian
Wiki-research-l mailing list Wiki-research-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
I'm not sure how much influence I have, but I would be happy to make whispers in appropriate places to try to get more support, if that's helpful.
Perhaps you could show your work at the next Research and Data showcase? I for one would be interested in seeing a presentation.
Pine
*This is an Encyclopedia* https://www.wikipedia.org/
*One gateway to the wide garden of knowledge, where lies The deep rock of our past, in which we must delve The well of our future,The clear water we must leave untainted for those who come after us,The fertile earth, in which truth may grow in bright places, tended by many hands,And the broad fall of sunshine, warming our first steps toward knowing how much we do not know.*
*—Catherine Munro*
On Wed, Mar 4, 2015 at 1:25 AM, Oliver Keyes okeyes@wikimedia.org wrote:
That is the question, and I agree with your conclusion. I'm hoping to do more research into this; getting buyin internally has been tough, but I'm confident of making progress on that front over the next few weeks and months.
On 4 March 2015 at 04:13, Cristian Consonni kikkocristian@gmail.com wrote:
2015-03-04 8:44 GMT+01:00 Dario Taraborelli <dtaraborelli@wikimedia.org :
yay, shiny! The map is a pretty compelling way to show how dominant
traffic from the US is, even for very minor languages (say bi.wikipedia.org), I wonder how many requests from US-based bots/automata we’re still failing to detect.
Still, the question could be: are we fulfilling the mission? (hint: probably not)
Cristian
Wiki-research-l mailing list Wiki-research-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
-- Oliver Keyes Research Analyst Wikimedia Foundation
Wiki-research-l mailing list Wiki-research-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
On 4 March 2015 at 04:28, Pine W wiki.pine@gmail.com wrote:
I'm not sure how much influence I have, but I would be happy to make whispers in appropriate places to try to get more support, if that's helpful.
I think I'm probably good, but thank you.
Perhaps you could show your work at the next Research and Data showcase? I for one would be interested in seeing a presentation.
That's in 3 weeks; I'm not convinced that a piece of substantive, useful research about global reach could be done in that time period even if I could drop everything I currently have (which I can't). This problem is too big and too important to be scheduled around meetings; things should work the other way around.
Scott Hale and I have been working on a paper looking at global reach and how it tracks with internet access growth, in the context of editing, particularly looking at the mobile web. That, we should be done with by then; presenting it could be highly useful (Scott? ;p)
Pine
This is an Encyclopedia One gateway to the wide garden of knowledge, where lies The deep rock of our past, in which we must delve The well of our future, The clear water we must leave untainted for those who come after us, The fertile earth, in which truth may grow in bright places, tended by many hands, And the broad fall of sunshine, warming our first steps toward knowing how much we do not know. —Catherine Munro
On Wed, Mar 4, 2015 at 1:25 AM, Oliver Keyes okeyes@wikimedia.org wrote:
That is the question, and I agree with your conclusion. I'm hoping to do more research into this; getting buyin internally has been tough, but I'm confident of making progress on that front over the next few weeks and months.
On 4 March 2015 at 04:13, Cristian Consonni kikkocristian@gmail.com wrote:
2015-03-04 8:44 GMT+01:00 Dario Taraborelli dtaraborelli@wikimedia.org:
yay, shiny! The map is a pretty compelling way to show how dominant traffic from the US is, even for very minor languages (say bi.wikipedia.org), I wonder how many requests from US-based bots/automata we’re still failing to detect.
Still, the question could be: are we fulfilling the mission? (hint: probably not)
Cristian
Wiki-research-l mailing list Wiki-research-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
-- Oliver Keyes Research Analyst Wikimedia Foundation
Wiki-research-l mailing list Wiki-research-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
Wiki-research-l mailing list Wiki-research-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
Oliver:
Scott Hale and I have been working on a paper looking at global reach and how it tracks with internet access growth, in the context of editing, particularly looking at the mobile web. That, we should be done with by then; presenting it could be highly useful (Scott? ;p)
I see what you did there, Oliver :J I believe the showcase in is two weeks (3rd Wednesday of the month), which is a bit too tight to make sure everything is really checked and accurate. I'm in Asia in April, but *we* could definitely present in May on the work, which as Oliver said is correlating Wikipedia editor numbers with mobile and broadband penetration data on a country level.
Dario:
I wonder how many requests from US-based bots/automata we’re still failing to detect.
This reminds me that I would like to engage with the technical development team on the idea of storing the application (i.e., oauth consumer id) for each edit made through the API. Not all bots use the API, I guess, but I would venture that many (maybe most) do and tracking them would then become trivial. Tracking the applications used to make edits via the API would also allow tracking of alternative editing interfaces (e.g., visual editor uses the API and perhaps AutoWikiBrowser or others do as well.) I've never proposed any technical enhancement requests for Mediawiki and so very much welcome guidance.
Best wishes, Scott
wiki-research-l@lists.wikimedia.org