Hi,
in the week from 2014-12-01–2014-12-07 Andrew, and I worked on the following items around the Analytics Cluster and Analytics related Ops:
* Change in SSL setup causing pagecounts-raw to be off ... temporary * Preparing for vlan move of stats machines * Ganglia -> Graphite -> Grafana * Wikipedia Zero graph comparability (details below)
Have fun, Christian
* Change in SSL setup causing pagecounts-raw to be off ... temporary
Ops changed the SSL setup from dedicated SSL terminators to cache-local SSL terminators for eqiad and esams. This change came a bit as a surprise to us, and (as expected) made webstatscollector's C implementation (pagecounts-raw) overcount HTTPS traffic.
We adjusted webstatscollector's C implementation accordingly.
While some weeks back that would be the end of the story and we'd just be left with a few days of broken data, we now have the data in the cluster, and have a Hive implementation too. So we could effectively backfill pagecounts-raw for the affected days.
Up to my knowledge, this is the first time we could cover/mitigate a webstatscollector on the udp2log pipeline issue through the cluster.
And pagecounts-raw has good data again for the affected period :-)
* Preparing for vlan move of stats machines
To develop infrastructure and research pipelines, devs and researchers would need some more basic development tools (E.g.: Maven, Virtualenv) on stat100[123] that Ops would prefer us not to use in the machines' current vlan. Hence, preparations started to move stat100[123] into the separate analytics vlan. This will address the concerns of Ops, while it still allows to install the needed tools.
* Ganglia -> Graphite -> Grafana
Ops is more and more moving from ganglia to graphite to do checks on numbers. So work has been started to look into graphite a bit more and on how to instrument it to perform checks. The cluster got re-configured to get the key metrics get fed into graphite. For dashboarding, it seems grafana might give a kibana-like interface. And
http://grafana.wikimedia.org/#/dashboard/db/kafka
got setup to provide a high-level, realtime view on kafka.
* Wikipedia Zero graph comparability
Following up from the previous week, the Wikipedia Zero had further concerns about the differences between their new on-wiki graphs and the Analytics team's dashboards. We identified and explained the differences for them.