>Aside from this, I get daily emails about webrequest partition statuses, and I would at least notice the morning after that something is wrong.  
Right, but in the case of Friday that would mean perhaps having to backfill a bunch of data up to Saturday morning, whereas if we have alarms we can detect the issue right away and kill jobs as needed. 

On Mon, Mar 9, 2015 at 8:55 AM, Andrew Otto <aotto@wikimedia.org> wrote:
Should have icinga alarms arround these types of issues?  Seems like that would be the way to go. 
Aside from this, I get daily emails about webrequest partition statuses, and I would at least notice the morning after that something is wrong.  



On Mar 7, 2015, at 21:20, Nuria Ruiz <nuria@wikimedia.org> wrote:

Thanks much Christian for the writeup.

Should have icinga alarms arround these types of issues?  Seems like that would be the way to go. 

Thanks, 

Nuria

On Sat, Mar 7, 2015 at 4:00 PM, Andrew Otto <aotto@wikimedia.org> wrote:
Thanks Christian!


> On Mar 7, 2015, at 09:14, Christian Aistleitner <christian@quelltextlich.at> wrote:
>
> Hi,
>
> around running jobs on the Analytics cluster, I've sometime seen
> people say in IRC: “Let's run this heavy job. I'll keep an eye on it”.
>
> But more often than not, this seems to have meant:
> “Let's just run this heavy job and wait. If QChris joins IRC, let's
> hope he doesn't ping us about having overloaded the cluster.”
>
> That's not nice^Wscalable ;-)
>
> So just in case someone is vague on how to “keep an eye on it”, I did
> a short write-up at:
>
https://wikitech.wikimedia.org/wiki/Analytics/Cluster/Hadoop/Load
>
> which details on detecting how the cluster is doing on a very high
> level.
> Especially, it allows you to detect if the cluster got stalled, and if
> it did, it tells you what to do.
>
> Have fun,
> Christian
>
> P.S.: The above URL has diagrams! Click the URL!
>
> --
> ---- quelltextlich e.U. ---- \\ ---- Christian Aistleitner ----
>                           Companies' registry: 360296y in Linz
> Christian Aistleitner
> Kefermarkterstrasze 6a/3     Email:  christian@quelltextlich.at
> 4293 Gutau, Austria          Phone:          +43 7946 / 20 5 81
>                             Fax:            +43 7946 / 20 5 81
>                             Homepage: http://quelltextlich.at/
> ---------------------------------------------------------------
> _______________________________________________
> Analytics mailing list
> Analytics@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/analytics

_______________________________________________
Analytics mailing list
Analytics@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/analytics

_______________________________________________
Analytics mailing list
Analytics@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/analytics


_______________________________________________
Analytics mailing list
Analytics@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/analytics