[QA] integration.wikimedia.org: Monitoring for contint slaves

Krinkle krinklemail at gmail.com
Wed Oct 8 05:32:25 UTC 2014


The metrics it displayed weren't ideal compared to those reported by e.g. Ganglia.
I've updated it with better metrics and calculated values.

https://integration.wikimedia.org/monitoring/

... and then I decided we needed this for other labs projects (like tool-labs, beta deployment-prep etc.), so made it into a generic tool:

https://tools.wmflabs.org/nagf/
https://tools.wmflabs.org/nagf/?project=deployment-prep
https://tools.wmflabs.org/nagf/?project=integration

Source code:
https://github.com/wikimedia/nagf

(and the old one, will probably be removed soon in favour of a redirect)
https://github.com/wikimedia/integration-docroot/tree/master/org/wikimedia/integration/monitoring

Yay for solo half-day sprints!

— Krinkle

On 7 Oct 2014, at 13:43, Krinkle <krinklemail at gmail.com> wrote:

> Hey all,
> 
> As a temporary solution lacking ganglia or equivalent, I've put up some graphs that allow us to monitor the Jenkins slaves in labs for trends in CPU, Memory and Disk space.
> 
> https://integration.wikimedia.org/monitoring/
> 
> Also, thanks to Yuvi, there's alerts set up via production Icinga:
> 
> https://github.com/wikimedia/operations-puppet/blob/df0d3298/modules/contint/manifests/monitoring.pp
> 
> https://icinga.wikimedia.org/cgi-bin/icinga/status.cgi?host=labmon1001&nostatusheader
> 
> — Krinkle
> 

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.wikimedia.org/pipermail/qa/attachments/20141008/fc8a5937/attachment.html>


More information about the QA mailing list