[QA] integration.wikimedia.org: Monitoring for contint slaves

S Page spage at wikimedia.org
Wed Oct 8 20:35:19 UTC 2014


On Tue, Oct 7, 2014 at 4:43 AM, Krinkle <krinklemail at gmail.com> wrote:

> Also, thanks to Yuvi, there's alerts set up via production Icinga:
> ...
>
>
> https://icinga.wikimedia.org/cgi-bin/icinga/status.cgi?host=labmon1001&nostatusheader
>

This appears to have beta labs monitors such as "BetaLabs: Low disk space",
"Puppet failure events", etc.  Is there an Icinga check "Expect visiting
http://en.wikipedia.beta.wmflabs.org/ to give me an error-free wiki page,
not a 503 error or a site down"?
If there is, is there a way to graph the status of this over time?

The pain point I would like eased is: When I review failing browser tests I
can quickly establish "Duh, beta labs was 503 or overloaded at the time of
the test", before I spend time investigating the particular browser failure.

There's more to this than monitoring:
* Can't easily see status of other tests that ran at the same time.
* Jenkins /ci/job/browsertests-Foo/* pages don't consistently show the
ISO9601 UTC time of a test

Whining is cheap, implementing is harder :-)  I really appreciate
jenkins/ci and beta labs. <3 and thanks,

--
=S Page  Features engineer
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.wikimedia.org/pipermail/qa/attachments/20141008/1e2d1e16/attachment.html>


More information about the QA mailing list