Is there a way to do this? Forgive me, I'm not exactly computer-illiterate, but this under-the-hood stuff is not something I'm familiar with.
> Date: Thu, 11 Jul 2013 12:33:28 +0200
> From: jane023@gmail.com
> To: analytics@lists.wikimedia.org
> Subject: Re: [Analytics] Wikipedia Top 25
>
> You're right, it would be extremely helpful to know "how many
> different IP addresses" cause the accesses. The last three things,
> though definitely desirable, are less important. For reporting you
> could just filter by some ratio of unique IP's vs page views (i.e.
> only include in your top25 report when at least half of the page views
> are from unique IPs).
>
> 2013/7/11, Jörn Hees <wikistats@joernhees.de>:
> > Hi,
> >
> > On 11.07.2013, at 10:37, Federico Leva (Nemo) <nemowiki@gmail.com> wrote:
> >> Jane Darnell, 11/07/2013 09:15:
> >>> Hmm, This one really has me stumped:
> >>> http://stats.grok.se/en/latest90/Yahoo!
> >>> That is not a wikibump, but some sort of structural thing. The only
> >>> thing I can think of is that some sort of popular band, manga
> >>> character, or porn queen in China has been named Yahoo!
> >>
> >> Or someone (e.g. Yahoo!) has linked it from some prominent webpage (but
> >> only in English? other languages seem not affected) or some stocks holder
> >> (e.g. Yahoo!) is running simple "crwalers" to skew pageviews stats and
> >> make them appear flat so that nobody can make stocks value forecasts using
> >> them.
> >
> > Yupp, i also think this is a weird anomaly…
> > the causes can be very weird though, as we found out back in 2010-11 when
> > the views to the "initial" page were suddenly very skewed:
> > http://infodisiac.com/blog/2010/11/page-views-anomaly-in-october-resolved/#comments
> > Domas was able to find the cause by sampling some of the requests and found
> > that all had the same referrer. Turned out it was an online ads page that
> > had an error in their html which tried to load the page as background image.
> >
> > It's hard to analyze / clean this stuff while maintaining privacy.
> > I remember there was a survey sent out to several (linked) open data
> > researchers a while ago how the Wikimedia foundation could provide better
> > stats.
> > My reply was something along these lines:
> > Provide more stats with every line in the hourly pageview stats:
> > - how many different IP addresses cause the accesses (better: how many
> > accesses per IP address (avg + stddev))
> > - how many different referrers cause the accesses (better: how many accesses
> > per referrer (avg + stddev))
> > - how many accesses come from wikimedia IPs (toolserver, some bots)
> > - ip address count for top 5 (or 10) originating countries (get some
> > geolocation in)
> >
> > I think the top 3 aren't really computationally expensive but would really
> > improve our ability to clean the view stats.
> > The 4th always was on my wishlist, but would require some more work for
> > reverse ip->geolocation lookup.
> >
> > Cheers,
> > Jörn
> >
> >
> > _______________________________________________
> > Analytics mailing list
> > Analytics@lists.wikimedia.org
> > https://lists.wikimedia.org/mailman/listinfo/analytics
> >
>
> _______________________________________________
> Analytics mailing list
> Analytics@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/analytics