Aside from this, I get daily emails about webrequest
partition statuses,
and I would at least notice the morning after that something is
wrong.
Right, but in the case of Friday that would mean perhaps having to backfill
a bunch of data up to Saturday morning, whereas if we have alarms we can
detect the issue right away and kill jobs as needed.
On Mon, Mar 9, 2015 at 8:55 AM, Andrew Otto <aotto(a)wikimedia.org> wrote:
Should have icinga alarms arround these types of
issues? Seems like that
would be the way to go.
Aside from this, I get daily emails about webrequest partition statuses,
and I would at least notice the morning after that something is wrong.
On Mar 7, 2015, at 21:20, Nuria Ruiz <nuria(a)wikimedia.org> wrote:
Thanks much Christian for the writeup.
Should have icinga alarms arround these types of issues? Seems like that
would be the way to go.
Thanks,
Nuria
On Sat, Mar 7, 2015 at 4:00 PM, Andrew Otto <aotto(a)wikimedia.org> wrote:
Thanks Christian!
On Mar 7, 2015, at 09:14, Christian Aistleitner
<
christian(a)quelltextlich.at> wrote:
Hi,
around running jobs on the Analytics cluster, I've sometime seen
people say in IRC: “Let's run this heavy job. I'll keep an eye on it”.
But more often than not, this seems to have meant:
“Let's just run this heavy job and wait. If QChris joins IRC, let's
hope he doesn't ping us about having overloaded the cluster.”
That's not nice^Wscalable ;-)
So just in case someone is vague on how to “keep an eye on it”, I did
a short write-up at:
https://wikitech.wikimedia.org/wiki/Analytics/Cluster/Hadoop/Load
which details on detecting how the cluster is doing on a very high
level.
Especially, it allows you to detect if the cluster got stalled, and if
it did, it tells you what to do.
Have fun,
Christian
P.S.: The above URL has diagrams! Click the URL!
--
---- quelltextlich e.U. ---- \\ ---- Christian Aistleitner ----
Companies' registry: 360296y in Linz
Christian Aistleitner
Kefermarkterstrasze 6a/3 Email: christian(a)quelltextlich.at
4293 Gutau, Austria Phone: +43 7946 / 20 5 81
Fax: +43 7946 / 20 5 81
Homepage:
http://quelltextlich.at/
---------------------------------------------------------------
_______________________________________________
Analytics mailing list
Analytics(a)lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/analytics
_______________________________________________
Analytics mailing list
Analytics(a)lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/analytics
_______________________________________________
Analytics mailing list
Analytics(a)lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/analytics
_______________________________________________
Analytics mailing list
Analytics(a)lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/analytics