This is really useful, Christian. Thanks for explaining and documenting it.

Leila

On Sat, Mar 7, 2015 at 6:14 AM, Christian Aistleitner <christian@quelltextlich.at> wrote:
Hi,

around running jobs on the Analytics cluster, I've sometime seen
people say in IRC: “Let's run this heavy job. I'll keep an eye on it”.

But more often than not, this seems to have meant:
“Let's just run this heavy job and wait. If QChris joins IRC, let's
hope he doesn't ping us about having overloaded the cluster.”

That's not nice^Wscalable ;-)

So just in case someone is vague on how to “keep an eye on it”, I did
a short write-up at:

  https://wikitech.wikimedia.org/wiki/Analytics/Cluster/Hadoop/Load

which details on detecting how the cluster is doing on a very high
level.
Especially, it allows you to detect if the cluster got stalled, and if
it did, it tells you what to do.

Have fun,
Christian

P.S.: The above URL has diagrams! Click the URL!

--
---- quelltextlich e.U. ---- \\ ---- Christian Aistleitner ----
                           Companies' registry: 360296y in Linz
Christian Aistleitner
Kefermarkterstrasze 6a/3     Email:  christian@quelltextlich.at
4293 Gutau, Austria          Phone:          +43 7946 / 20 5 81
                             Fax:            +43 7946 / 20 5 81
                             Homepage: http://quelltextlich.at/
---------------------------------------------------------------

_______________________________________________
Analytics mailing list
Analytics@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/analytics