Re: [Analytics] Wikipedia Top 25

11 Jul 2013

      Unfortunately, no. See Jörn's mail where he says he has requested this
information along with page views, but hasn't got it yet (probably
because of Wikipedia's privacy policy). If "Domas was able to find the
cause by sampling some of the requests" then that pretty much means
that Domas couldn't get the info any other way, and if Domas can't,
then I don't think anyone else can either. You can always try mailing
Erik Zachte (infodisiac stats website) for his opinion though.
2013/7/11, Noneof MicrosoftsBusiness phonenumberofthebeast@hotmail.com:
...
Is there a way to do this? Forgive me, I'm not exactly computer-illiterate,
but this under-the-hood stuff is not something I'm familiar with.
...
Date: Thu, 11 Jul 2013 12:33:28 +0200
From: jane023@gmail.com
To: analytics@lists.wikimedia.org
Subject: Re: [Analytics] Wikipedia Top 25
You're right, it would be extremely helpful to know "how many
different IP addresses" cause the accesses. The last three things,
though definitely desirable, are less important. For reporting you
could just filter by some ratio of unique IP's vs page views (i.e.
only include in your top25 report when at least half of the page views
are from unique IPs).
2013/7/11, Jörn Hees wikistats@joernhees.de:
...
Hi,
On 11.07.2013, at 10:37, Federico Leva (Nemo) nemowiki@gmail.com
wrote:
...
Jane Darnell, 11/07/2013 09:15:
...
Hmm, This one really has me stumped:
http://stats.grok.se/en/latest90/Yahoo!
That is not a wikibump, but  some sort of structural thing. The only
thing I can think of is that some sort of popular band, manga
character, or porn queen in China has been named Yahoo!
Or someone (e.g. Yahoo!) has linked it from some prominent webpage
(but
only in English? other languages seem not affected) or some stocks
holder
(e.g. Yahoo!) is running simple "crwalers" to skew pageviews stats and
make them appear flat so that nobody can make stocks value forecasts
using
them.
Yupp, i also think this is a weird anomaly…
the causes can be very weird though, as we found out back in 2010-11
when
the views to the "initial" page were suddenly very skewed:
http://infodisiac.com/blog/2010/11/page-views-anomaly-in-october-resolved/#c...
Domas was able to find the cause by sampling some of the requests and
found
that all had the same referrer. Turned out it was an online ads page
that
had an error in their html which tried to load the page as background
image.
It's hard to analyze / clean this stuff while maintaining privacy.
I remember there was a survey sent out to several (linked) open data
researchers a while ago how the Wikimedia foundation could provide
better
stats.
My reply was something along these lines:
Provide more stats with every line in the hourly pageview stats:

how many different IP addresses cause the accesses (better: how many

accesses per IP address (avg + stddev))

how many different referrers cause the accesses (better: how many

accesses
per referrer (avg + stddev))

how many accesses come from wikimedia IPs (toolserver, some bots)
ip address count for top 5 (or 10) originating countries (get some

geolocation in)
I think the top 3 aren't really computationally expensive but would
really
improve our ability to clean the view stats.
The 4th always was on my wishlist, but would require some more work for
reverse ip->geolocation lookup.
Cheers,
Jörn

Analytics mailing list
Analytics@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/analytics

Analytics mailing list
Analytics@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/analytics

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

Re: [Analytics] Wikipedia Top 25