Thank you Erik for this amazing work. I think I understood most of it :)

The overreporting patch leads to a significant difference, as discussed in the first two pages, starting end of 2013. I wonder for the curve before the divergence, why does the patch have no effect? I.e. why is there only a divergence since mid-2013?

Can the numbers be retroactively recalculated after the patch, or is the original data not available to do that?

I wonder if there was systemic overreporting before mid-2013 as well.

Again, thank you very very much for this great work!






On Tue Jan 14 2014 at 7:02:15 AM, Erik Zachte <ezachte@wikimedia.org> wrote:
The point of yesterdays execrcise was to crosscheck webstatscollector, so
the filters in webstatscollector served as inspiration but not as guiding
principle.

I broke down traffic patterns from squids log using ad hoc criteria that
seemed most descriptive. Nearly all html traffic is sent as response on GET
requests. This may be obvious for web experts, so it hardly merits
mentioning, but it wasn't obvious to me. (for example I was curious about
ratio GET vs HEAD)

I am also not sure yet webstatscollector should use the range of ip
addresses it does right now to filter out internal traffic,
Some are very busy ranges, others show no activity at all (could be tuned
out for better webstatscollector performance).
And is the configuration complete at all, or could we be missing some
ranges? I'll follow up on that off-list.

Erik



-----Original Message-----
From: analytics-bounces@lists.wikimedia.org
[mailto:analytics-bounces@lists.wikimedia.org] On Behalf Of Christian
Aistleitner
Sent: Tuesday, January 14, 2014 14:06
To: A mailing list for the Analytics Team at WMF and everybody who has an
interest in Wikipedia and analytics.
Subject: Re: [Analytics] Page view data with Wikipedia app?

Hi,

On Tue, Jan 14, 2014 at 03:05:03AM +0100, Erik Zachte wrote:
> [...] webstatscollector data, which are totally based on GET /wiki/
> (as was known).

it should hardly have impact, but just to avoid misconceptions as GET was
coined several times through the email ...
I do not know of a filter to GET in webstatscollector.
Looking for it again, I could not find it.
Where does that GET filter live?

Best regards,
Christian

P.S.: Just to avoid confusion, from my point of view, webstatscollector does
limit to /wiki/, but not to GET /wiki/.

--
---- quelltextlich e.U. ---- \\ ---- Christian Aistleitner ----
                           Companies' registry: 360296y in Linz Christian
Aistleitner
Gruendbergstrasze 65a        Email:  christian@quelltextlich.at
4040 Linz, Austria           Phone:          +43 732 / 26 95 63
                             Fax:            +43 732 / 26 95 63
                             Homepage: http://quelltextlich.at/
---------------------------------------------------------------
OpenPGP key transition from 0xEF78CCDE to 0x13C1072F:
http://quelltextlich.at/openpgp-transition-0xEF78CCDE-to-0x13C1072F.txt


_______________________________________________
Analytics mailing list
Analytics@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/analytics