So worst case (no data at all) our monthly PV totals will be down with 1.6% (12/744).
I marked these periods as invalid in my webstatscollector 2.0 client so that totals will
be extrapolated from remaining 732 hours.
From: analytics-bounces(a)lists.wikimedia.org [mailto:analytics-bounces@lists.wikimedia.org]
On Behalf Of Dan Andreescu
Sent: Thursday, August 27, 2015 4:25
To: A mailing list for the Analytics Team at WMF and everybody who has an interest in
Wikipedia and analytics.
Subject: Re: [Analytics] Webrequest loss on 08-03 and 08-10
Tilman - done, but apologies for the not very useful link formatting on that tool tip.
I'll file a phab bug to improve that. By the way, annotations for the pageview data
can be collaboratively edited:
https://meta.wikimedia.org/wiki/Dashiki:PageviewsAnnotations (unlocked for now, we'll
limit access if we start having problems).
On Wed, Aug 26, 2015 at 6:22 PM, Tilman Bayer <tbayer(a)wikimedia.org> wrote:
Thanks for the update! And BTW kudos also for marking these as
annotations in the dashboard at
https://vital-signs.wmflabs.org/
(maybe link the incident reports from there as well?)
On Wed, Aug 26, 2015 at 1:26 PM, Andrew Otto <aotto(a)wikimedia.org> wrote:
Hi all,
Now that we’ve had a little space to analyze the problem, I wanted to call
out a recent webrequest data loss issue that we experienced on two separate
occasions.
We attempted to upgrade to Kafka 0.8.2.1, and it wasn’t until the second
attempt that we actually found the problem. Kafka 0.8.2.1 ships with a
buggy version of Snappy[1] that causes messages to not be compressed
properly. This caused a ~4x increase network and disk I/O around the
cluster all at once.
We’ve documented the incidents and the occasions of significant data loss
here:
https://wikitech.wikimedia.org/wiki/Incident_documentation/20150803-Kafka
https://wikitech.wikimedia.org/wiki/Incident_documentation/20150810-Kafka#C…
https://wikitech.wikimedia.org/wiki/Analytics/Data/Webrequest
This loss will affect the output of pagecount* and pageview datasets, as
well as other webrequest generated statistics. Please consider statistics
that are generated from webrequest data using the following UTC hours
unreliable:
2015-08-03T18:00 - 2015-08-03T23:00
2015-08-10T15:00 - 2015-08-10T21:00
2015-08-11T17:00 - 2015-08-11T18:00
Many apologies for any inconvenience this causes. We’ve learned a lot
during this turmoil, and have a lot of ideas on how to hopefully prevent
this from happening in the future, and also how to reduce loss and
complexity if and when it does. The analytics engineering team will be
doing a post mortem on this soon, in which we will document these ideas.
Thanks,
-Andrew Otto
[1]
https://issues.apache.org/jira/browse/KAFKA-2189
_______________________________________________
Analytics mailing list
Analytics(a)lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/analytics
--
Tilman Bayer
Senior Analyst
Wikimedia Foundation
IRC (Freenode): HaeB
_______________________________________________
Analytics mailing list
Analytics(a)lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/analytics