Yes, this has been an issue before. Squid log based reports filter these banners for
years, but only after a similar distortion became very apparent, and a lot of data needed
to be repaired.
-----Original Message-----
From: analytics-bounces(a)lists.wikimedia.org [mailto:analytics-bounces@lists.wikimedia.org]
On Behalf Of Oliver Keyes
Sent: Friday, February 06, 2015 22:04
To: A mailing list for the Analytics Team at WMF and everybody who has an interest in
Wikipedia and analytics.
Subject: [Analytics] Drop in Commons mobile traffic - a diagnosis
Hey all,
The pageviews stored at
stats.wikimedia.org and the Vital Signs dashboards showed a
substantial drop in pageviews to Wikimedia Commons, primarily from mobile, beginning on 1
January 2015. I was tasked with investigating and I'm reporting what I found so that
we have a note of the problems this brings up.
From an investigation of requests to that site at that
time, it appears that this is a perfect storm of known deficiencies in the legacy
pageviews definition, fundraising changes, and mobile changes.
To summarise:
1. The legacy Pageviews definition contains Special pages, including Special:BannerRandom
and Special:HideBanner; 2. The mobile website was historically loading things from Commons
in such a way as to trigger calls to Special:HideBanner, which were picked up by the
legacy definition as "pageviews to commons"; 3. The Mobile team deployed changes
to their image loading setup at the end of December that stopped this from happening, and
that coincided with the disabling of the Fundraising primary campaign.
4. The result of this was an apparent massive drop in traffic to Commons from the mobile
site - when the actual inaccuracy was the inclusion of that traffic in the first place.
There are several lessons to be learned from this. First, it is worth reiterating the
deficiencies and inaccuracies inherent in the legacy pageview definition, many (but
certainly not all) of which centre on how it treats the fundraising banners. We are
working as rapidly as we can to completely deprecate this definition, replacing it with a
new one which is not subject to this kind of variation. We are currently in the middle of
performing final QA testing on the new definition:
once it is satisfactory, we will deploy it as soon as humanly possible and deprecate the
legacy definition.
Second, let me emphasise how critical it is that the teams building MediaWiki and our
instances of it - Platform, Operations, Mobile, you name it - keep us in the loop about
changes that they make. This was a very dramatic shift in client logic around requests: it
flew under our radar. We should have a process in place for letting Analytics know about
these changes before they happen so that we do not end up with inaccurate data and a
constant game of catchup.
Thanks,
--
Oliver Keyes
Research Analyst
Wikimedia Foundation
_______________________________________________
Analytics mailing list
Analytics(a)lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/analytics