Hi,
in the week from 2014-10-06–2014-10-12 Andrew, Jeff, and I worked on
the following items around the Analytics Cluster and Analytics related
Ops:
* ULSFO outage affecting webrequest logs (Bug 71876, Bug 71879)
* Revoked default Push grant for Analytics on gerrit's analytics/* projects
* Wikimetrics showing many requests to internal files
* Counting pageviews for the pages “undefined” / “Undefined” (Bug 66532)
* Counting redirect pageviews for Webstatscollector (Bug 71790)
* Reworking webstatscollector's build system
* Puppetization of MaxMind's Connection Type databases
* Wikihadoop now available on the Analytics Cluster
* Analytics Mini-Hackathon in San Francisco
(details below)
Have fun,
Christian
* ULSFO outage affecting webrequest logs (Bug 71876, Bug 71879)
It seems there have been connection issues from ULSFO, which caused a
minor hiccup in the webrequest logs on both udp2log and kafka [1]. Due
to kafka's buffering, kafka could nicely bridge the shorter dropouts,
and in total only a few minutes of data have been lost on kafka, while
udp2log was shaky for up to 2 hours.
* Revoked default Push grant for Analytics on gerrit's analytics/* projects
Per default, all Analytics members had Push permission on all of
gerrit's analytics/* project. As accidental pushes caused pain again,
we now revoked the default Push grant, and made sure that our bots
still had necessary permission to do their duty.
* Wikimetrics showing many requests to internal files
A fix for the mis-redirection of those monitoring requests has been
implemented (but it's not yet deployed).
* Counting pageviews for the pages “undefined” / “Undefined” (Bug 66532)
A short increase on requests for the pages “undefined” and “Undefined”
impacted pageview trend graphs. So after the initial push-back that
bug 66532 received, it was picked up again, and we prepared patches
for both the C and Hive implementation of webstatscollector's pageview
definition to not count such requests. Deployment of those patches is
likely to happen around 2014-10-15.
* Counting redirect pageviews for Webstatscollector (Bug 71790)
Ever since, the webstatscollector pageview definition has been
counting redirects, and was hence overcounting.
Since, we're about to deploy a webstatscollector anyways, we prepared
changes to fix this longstanding miscounting.
* Reworking webstatscollector's build system
Fresh compilations of webstatscollector's C implementation gave
executables that segfaulted. So we fixed some NULL dereferences, fixed
the build system, made it capable of compiling with optimization
turned on, and built a rudimentary testsuite for the collector
process. Thereby, we can now again build the collector executable, and
can automatically verify that it's working.
* Puppetization of MaxMind's Connection Type databases
MaxMind's Connection Type (NetSpeed) databases have been
puppetized. They are available for example on stat1002, and stat1003
at
/usr/share/GeoIP/GeoIPNetSpeedCell.dat
/usr/share/GeoIP/GeoIPNetSpeed.dat
.
* Wikihadoop now available on the Analytics Cluster
This allows for easier parsing of Mediawiki xml revision dumps.
* Analytics Mini-Hackathon in San Francisco
During this week, the Analytics Mini-Hackathon took place, and
more prototyping around
** Scoop and Oozification
** Streaming data into HDFS
happened, and some time was spend on hunting down the kafkatee issues.
--
---- quelltextlich e.U. ---- \\ ---- Christian Aistleitner ----
Companies' registry: 360296y in Linz
Christian Aistleitner
Kefermarkterstrasze 6a/3 Email: christian(a)quelltextlich.at
4293 Gutau, Austria Phone: +43 7946 / 20 5 81
Fax: +43 7946 / 20 5 81
Homepage:
http://quelltextlich.at/
---------------------------------------------------------------