This is really useful, Christian. Thanks for explaining and documenting it.
Leila
On Sat, Mar 7, 2015 at 6:14 AM, Christian Aistleitner < christian@quelltextlich.at> wrote:
Hi,
around running jobs on the Analytics cluster, I've sometime seen people say in IRC: “Let's run this heavy job. I'll keep an eye on it”.
But more often than not, this seems to have meant: “Let's just run this heavy job and wait. If QChris joins IRC, let's hope he doesn't ping us about having overloaded the cluster.”
That's not nice^Wscalable ;-)
So just in case someone is vague on how to “keep an eye on it”, I did a short write-up at:
https://wikitech.wikimedia.org/wiki/Analytics/Cluster/Hadoop/Load
which details on detecting how the cluster is doing on a very high level. Especially, it allows you to detect if the cluster got stalled, and if it did, it tells you what to do.
Have fun, Christian
P.S.: The above URL has diagrams! Click the URL!
-- ---- quelltextlich e.U. ---- \ ---- Christian Aistleitner ---- Companies' registry: 360296y in Linz Christian Aistleitner Kefermarkterstrasze 6a/3 Email: christian@quelltextlich.at 4293 Gutau, Austria Phone: +43 7946 / 20 5 81 Fax: +43 7946 / 20 5 81 Homepage: http://quelltextlich.at/
Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics