Hi,
in the week from 2014-11-24–2014-11-30 Andrew, and I [1] worked on the
following items around the Analytics Cluster and Analytics related
Ops:
* Catch-up and meetings around EventLogging issues.
* EventLogging's database writer not properly shutting down
* Wikipedia Zero graph comparability
* Network switch outage in eqiad
(details below)
Have fun,
Christian
* Catch-up and meetings around EventLogging issues.
There were quite some catch-up discussions and meetings around the
recent EventLogging issues. It seems were all on the same page now.
* EventLogging's database writer not properly shutting down
When having to adhoc increase EventLogging's database throughput, the
hot fix was known to come with not too robust exit synchronization. So
in case of issues, with the events, the database writer would not
properly shut down and restart, but could be left hanging. This has
been known beforehand, and was accepted to bring EventLogging up again
as soon as possible.
The fix for it is not hard, but with the many follow-up meetings, it
did not get deployed before the issue first struck [2]. Now with the
follow-up meetings done, the fix got reviewed, deployed and is working
fine up to now.
We backfilled the database from plain-file logs for the affected period.
* Wikipedia Zero graph comparability
Wikipedia Zero is moving from the Analytics team's dashboards to
on-wiki graphs on the (private) zerowiki. But the numbers on the
graphs did not match. So we helped to identify which aspects of the
different pageview definitions cause the mismatches in the graphs. It
seems that the key differences are now understood.
* Network switch outage in eqiad
During the weekend, a network switch in eqiad went offline [3] and
took key machines in the analytics infrastructure offline. We started
[4] looking at the affected machines, measuring impact and
backfilling.
This is not done yet and will take more time.
[1] Jeff will refocus on Ops projects outside the realm of
Analytics. Many thanks for your great work on Analytics cluster and
Analytics related Ops!
[2]
https://wikitech.wikimedia.org/wiki/Incident_documentation/20141125-EventLo…
[3]
https://wikitech.wikimedia.org/wiki/Incident_documentation/20141130-Eqiad-R…
https://phabricator.wikimedia.org/tag/incident-20141129-network/
[4]
https://lists.wikimedia.org/pipermail/analytics/2014-November/002819.html
https://lists.wikimedia.org/pipermail/analytics/2014-December/002821.html
https://phabricator.wikimedia.org/T76334
--
---- quelltextlich e.U. ---- \\ ---- Christian Aistleitner ----
Companies' registry: 360296y in Linz
Christian Aistleitner
Kefermarkterstrasze 6a/3 Email: christian(a)quelltextlich.at
4293 Gutau, Austria Phone: +43 7946 / 20 5 81
Fax: +43 7946 / 20 5 81
Homepage:
http://quelltextlich.at/
---------------------------------------------------------------