[Analytics] Cluster issues. Refining suspended. Hence a few datasets start to lag.

26 Feb 2015


      Hi,
just a quick heads up that the Analytics cluster got stuck today. And
jobs deadlocked themselves waiting for other jobs to free resources.
For the time being, to allow the cluster to catch up for the missed
hours, I suspended the refining jobs.
This gives the cluster enough resources to catch up with importing the
kafka data that it missed during the day.
But this also means that the datasets:
  pagecounts-all-sites,
  pagecounts-raw,
  legacy_tsvs
will fall behind a bit, and the wmf.webrequest data will not see new
data while the cluster is catching up.
Tomorrow, in the European morning when the cluster has caught up, I'll
enable refining again, and the datasets should catch up again.
Sorry for the inconveniences,
Christian
P.S.: Suspending refining looks a bit drastic. But if we only killed
the resource hungry jobs without stopping refining, refining would
start during the catch up of camus and produce faulty datasets.
Hence, we suspended refining for now. Tomorrow, we'll resume the
suspended jobs and have the datasets catch up again.
P.P.S.: If you have resource hungry jobs on the Analytics cluster, if
possible please wait until tomorrow to run them.
-- 
---- quelltextlich e.U. ---- \ ---- Christian Aistleitner ----
                           Companies' registry: 360296y in Linz
Christian Aistleitner
Kefermarkterstrasze 6a/3     Email:  christian@quelltextlich.at
4293 Gutau, Austria          Phone:          +43 7946 / 20 5 81
                             Fax:            +43 7946 / 20 5 81
                             Homepage: http://quelltextlich.at/
---------------------------------------------------------------

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

[Analytics] Cluster issues. Refining suspended. Hence a few datasets start to lag.