Thank you Erik for this amazing work. I think I understood most of it :)
The overreporting patch leads to a significant difference, as discussed in the first two pages, starting end of 2013. I wonder for the curve before the divergence, why does the patch have no effect? I.e. why is there only a divergence since mid-2013?
Can the numbers be retroactively recalculated after the patch, or is the original data not available to do that?
I wonder if there was systemic overreporting before mid-2013 as well.
Again, thank you very very much for this great work!
On Tue Jan 14 2014 at 7:02:15 AM, Erik Zachte ezachte@wikimedia.org wrote:
The point of yesterdays execrcise was to crosscheck webstatscollector, so the filters in webstatscollector served as inspiration but not as guiding principle.
I broke down traffic patterns from squids log using ad hoc criteria that seemed most descriptive. Nearly all html traffic is sent as response on GET requests. This may be obvious for web experts, so it hardly merits mentioning, but it wasn't obvious to me. (for example I was curious about ratio GET vs HEAD)
I am also not sure yet webstatscollector should use the range of ip addresses it does right now to filter out internal traffic, Some are very busy ranges, others show no activity at all (could be tuned out for better webstatscollector performance). And is the configuration complete at all, or could we be missing some ranges? I'll follow up on that off-list.
Erik
-----Original Message----- From: analytics-bounces@lists.wikimedia.org [mailto:analytics-bounces@lists.wikimedia.org] On Behalf Of Christian Aistleitner Sent: Tuesday, January 14, 2014 14:06 To: A mailing list for the Analytics Team at WMF and everybody who has an interest in Wikipedia and analytics. Subject: Re: [Analytics] Page view data with Wikipedia app?
Hi,
On Tue, Jan 14, 2014 at 03:05:03AM +0100, Erik Zachte wrote:
[...] webstatscollector data, which are totally based on GET /wiki/ (as was known).
it should hardly have impact, but just to avoid misconceptions as GET was coined several times through the email ... I do not know of a filter to GET in webstatscollector. Looking for it again, I could not find it. Where does that GET filter live?
Best regards, Christian
P.S.: Just to avoid confusion, from my point of view, webstatscollector does limit to /wiki/, but not to GET /wiki/.
-- ---- quelltextlich e.U. ---- \ ---- Christian Aistleitner ---- Companies' registry: 360296y in Linz Christian Aistleitner Gruendbergstrasze 65a Email: christian@quelltextlich.at 4040 Linz, Austria Phone: +43 732 / 26 95 63 Fax: +43 732 / 26 95 63 Homepage: http://quelltextlich.at/
OpenPGP key transition from 0xEF78CCDE to 0x13C1072F: http://quelltextlich.at/openpgp-transition-0xEF78CCDE-to-0x13C1072F.txt
Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics