Aside from this, I get daily emails about webrequest partition statuses,
and I would at least notice the morning after that something is wrong. Right, but in the case of Friday that would mean perhaps having to backfill a bunch of data up to Saturday morning, whereas if we have alarms we can detect the issue right away and kill jobs as needed.
On Mon, Mar 9, 2015 at 8:55 AM, Andrew Otto aotto@wikimedia.org wrote:
Should have icinga alarms arround these types of issues? Seems like that would be the way to go.
Aside from this, I get daily emails about webrequest partition statuses, and I would at least notice the morning after that something is wrong.
On Mar 7, 2015, at 21:20, Nuria Ruiz nuria@wikimedia.org wrote:
Thanks much Christian for the writeup.
Should have icinga alarms arround these types of issues? Seems like that would be the way to go.
Thanks,
Nuria
On Sat, Mar 7, 2015 at 4:00 PM, Andrew Otto aotto@wikimedia.org wrote:
Thanks Christian!
On Mar 7, 2015, at 09:14, Christian Aistleitner <
christian@quelltextlich.at> wrote:
Hi,
around running jobs on the Analytics cluster, I've sometime seen people say in IRC: “Let's run this heavy job. I'll keep an eye on it”.
But more often than not, this seems to have meant: “Let's just run this heavy job and wait. If QChris joins IRC, let's hope he doesn't ping us about having overloaded the cluster.”
That's not nice^Wscalable ;-)
So just in case someone is vague on how to “keep an eye on it”, I did a short write-up at:
https://wikitech.wikimedia.org/wiki/Analytics/Cluster/Hadoop/Load
which details on detecting how the cluster is doing on a very high level. Especially, it allows you to detect if the cluster got stalled, and if it did, it tells you what to do.
Have fun, Christian
P.S.: The above URL has diagrams! Click the URL!
-- ---- quelltextlich e.U. ---- \ ---- Christian Aistleitner ---- Companies' registry: 360296y in Linz Christian Aistleitner Kefermarkterstrasze 6a/3 Email: christian@quelltextlich.at 4293 Gutau, Austria Phone: +43 7946 / 20 5 81 Fax: +43 7946 / 20 5 81 Homepage: http://quelltextlich.at/
Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics
Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics
Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics
Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics