Hi,
in the week from 2014-10-13–2014-10-19 Andrew, Jeff, and I worked on
the following items around the Analytics Cluster and Analytics related
Ops:
* Webstatscollector deployment (Bug 66352, Bug 71790)
* Testing potential kafkatee fix
* Analytics1021, its partition leader role, and missing data
*
gp.wmflabs.org showing empty graphs
* Database lags
* Obtaining HTTPS numbers to assist around POODLE vulnerability
* Redeployment of some Hive scripts
* Preparations for ua_parser Hive UDF
(details below)
Have fun,
Christian
* Webstatscollector deployment (Bug 66352, Bug 71790)
As reported previous weeks, new webstatscollector builds have been
prepared to stop counting requests to the “Undefined” page (Bug
66352), and to stop counting redirects twice (Bug 71790). Those new
builds now got deployed to both webstatscollector pipelines.
* Testing potential kafkatee fix
From time to time kafkatee did not consume from all relevant kafka
partitions. The kafkatee maintainer provided a potential fix that is
running on analytics1003 since. The kafkatee generated files look good
for now, but since the issue previously took some time to manifest,
the tests need to run a bit longer.
* Analytics1021, its partition leader role, and missing data
Analytics1021 again dropped out of its partition leader role.
This is the first time it happened after ACK parameters got tuned on
some machines. The tuning proved to be worth it, as the caches with
tuned ACK parameters did not see message loss.
Since the issue happened again later, and again exactly the machines
with tuned ACK parameters did not see message loss, we can prepare to
roll out the tuned ACK parameters more widely.
*
gp.wmflabs.org showing empty graphs
In 2013 some graphs of
gp.wmflabs.org have been taken offline due to
privacy concerns. However, the main dashboard still referenced some of
those graphs, and rendered them as empty graphs. This made the
dashboard /look/ broken, although the public graphs were rendered as
expected. We updated the dashboard to no longer reference offline
graphs, so the dashboard does not look broken any longer.
* Database lags
Due to different, unrelated causes, some databases lagged considerably
during this week. Ops got the databases back to normal again.
* Obtaining HTTPS numbers to assist around POODLE vulnerability
In order to decide on how to address the POODLE vulnerability, Ops
needed numbers on usage of HTTPS for old browsers. Since this data is
not prepared automatically, we extracted the numbers from the logs.
* Redeployment of some Hive scripts
It seems an unannounced Friday deployment during the SF hackathon
angered the deployment gods, and caused some Oozie/Hive jobs to not
run correctly. So we had to fix the setup, resubmit the jobs, and
backfill the missing data. No data got lost.
* Preparations for ua_parser UDF
There is a push from several sides to have a Hive UDF that can parse
User-Agents. A good part of time was spent implementing, and reviewing
this UDF. But it's not yet merged and will require a bit more work.
--
---- quelltextlich e.U. ---- \\ ---- Christian Aistleitner ----
Companies' registry: 360296y in Linz
Christian Aistleitner
Kefermarkterstrasze 6a/3 Email: christian(a)quelltextlich.at
4293 Gutau, Austria Phone: +43 7946 / 20 5 81
Fax: +43 7946 / 20 5 81
Homepage:
http://quelltextlich.at/
---------------------------------------------------------------