Hi,
in the week from 2014-10-20–2014-10-26 Andrew, Jeff, and I worked on the following items around the Analytics Cluster and Analytics related Ops:
* Research on columnar storage in the cluster * Research on how to count of access to media files * Rolling out ACK tuning for varnishkafka * More work towards getting application id into logstash (details below)
Have fun, Christian
* Research on columnar storage in the cluster
Columnar storage engines can help to speed up some queries we're running and plan to run. So some more research around Parquet and AVRO was done, and how xmldumps imports could benefit them.
* Research on how to count of access to media files
We had many requests making access counts for media files public. Since the basic infrastructural ingredients are within reach, we started to explore what would be doable towards getting such data public.
* Rolling out ACK tuning for varnishkafka
As reported for the previous week, the ACK tuning of varnishkafka showed to avoid message loss during leader elections. So we're incrementally deploying the new ACK parameter to caches, and 3 out of 4 clusters are using it already. The deployment for the fourth cluster is still pending.
* More work towards getting application id into logstash
Repackaging jars to inject the log4j configurations allowed to get more logs into logstash. And we're also starting to extract application ids from log messages, which will finally allow to go to logstash to get and filter logs for the applications (like Hive queries) one is running on the cluster.