Should have icinga alarms arround these types of
issues? Seems like that would be the way to go.
Aside from this, I get daily
emails about webrequest partition statuses, and I would at least notice the morning after
that something is wrong.
> On Mar 7, 2015, at 21:20, Nuria Ruiz <nuria(a)wikimedia.org> wrote:
>
> Thanks much Christian for the writeup.
>
Should have icinga alarms arround these types of
issues? Seems like that would be the way to go.
>
> Thanks,
>
> Nuria
>
> On Sat, Mar 7, 2015 at 4:00 PM, Andrew Otto <aotto(a)wikimedia.org
<mailto:aotto@wikimedia.org>> wrote:
> Thanks Christian!
>
>
> > On Mar 7, 2015, at 09:14, Christian Aistleitner <christian(a)quelltextlich.at
<mailto:christian@quelltextlich.at>> wrote:
> >
> > Hi,
> >
> > around running jobs on the Analytics cluster, I've sometime seen
> > people say in IRC: “Let's run this heavy job. I'll keep an eye on it”.
> >
> > But more often than not, this seems to have meant:
> > “Let's just run this heavy job and wait. If QChris joins IRC, let's
> > hope he doesn't ping us about having overloaded the cluster.”
> >
> > That's not nice^Wscalable ;-)
> >
> > So just in case someone is vague on how to “keep an eye on it”, I did
> > a short write-up at:
> >
> >
https://wikitech.wikimedia.org/wiki/Analytics/Cluster/Hadoop/Load
<https://wikitech.wikimedia.org/wiki/Analytics/Cluster/Hadoop/Load>
> >
> > which details on detecting how the cluster is doing on a very high
> > level.
> > Especially, it allows you to detect if the cluster got stalled, and if
> > it did, it tells you what to do.
> >
> > Have fun,
> > Christian
> >
> > P.S.: The above URL has diagrams! Click the URL!
> >
> > --
> > ---- quelltextlich e.U. ---- \\ ---- Christian Aistleitner ----
> > Companies' registry: 360296y in Linz
> > Christian Aistleitner
> > Kefermarkterstrasze 6a/3 Email: christian(a)quelltextlich.at
<mailto:christian@quelltextlich.at>
> > 4293 Gutau, Austria Phone: +43 7946 / 20 5 81
<tel:%2B43%207946%20%2F%2020%205%2081>
> > Fax: +43 7946 / 20 5 81
<tel:%2B43%207946%20%2F%2020%205%2081>
> > Homepage:
http://quelltextlich.at/
<http://quelltextlich.at/>
> > ---------------------------------------------------------------
> > _______________________________________________
> > Analytics mailing list
> > Analytics(a)lists.wikimedia.org <mailto:Analytics@lists.wikimedia.org>
> >
https://lists.wikimedia.org/mailman/listinfo/analytics
<https://lists.wikimedia.org/mailman/listinfo/analytics>
>
> _______________________________________________
> Analytics mailing list
> Analytics(a)lists.wikimedia.org <mailto:Analytics@lists.wikimedia.org>
>
https://lists.wikimedia.org/mailman/listinfo/analytics
<https://lists.wikimedia.org/mailman/listinfo/analytics>
>
> _______________________________________________
> Analytics mailing list
> Analytics(a)lists.wikimedia.org
>
https://lists.wikimedia.org/mailman/listinfo/analytics