[QA] integration.wikimedia.org: Monitoring for contint slaves
Krinkle
krinklemail at gmail.com
Wed Oct 8 05:32:25 UTC 2014
The metrics it displayed weren't ideal compared to those reported by e.g. Ganglia.
I've updated it with better metrics and calculated values.
https://integration.wikimedia.org/monitoring/
... and then I decided we needed this for other labs projects (like tool-labs, beta deployment-prep etc.), so made it into a generic tool:
https://tools.wmflabs.org/nagf/
https://tools.wmflabs.org/nagf/?project=deployment-prep
https://tools.wmflabs.org/nagf/?project=integration
Source code:
https://github.com/wikimedia/nagf
(and the old one, will probably be removed soon in favour of a redirect)
https://github.com/wikimedia/integration-docroot/tree/master/org/wikimedia/integration/monitoring
Yay for solo half-day sprints!
— Krinkle
On 7 Oct 2014, at 13:43, Krinkle <krinklemail at gmail.com> wrote:
> Hey all,
>
> As a temporary solution lacking ganglia or equivalent, I've put up some graphs that allow us to monitor the Jenkins slaves in labs for trends in CPU, Memory and Disk space.
>
> https://integration.wikimedia.org/monitoring/
>
> Also, thanks to Yuvi, there's alerts set up via production Icinga:
>
> https://github.com/wikimedia/operations-puppet/blob/df0d3298/modules/contint/manifests/monitoring.pp
>
> https://icinga.wikimedia.org/cgi-bin/icinga/status.cgi?host=labmon1001&nostatusheader
>
> — Krinkle
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.wikimedia.org/pipermail/qa/attachments/20141008/fc8a5937/attachment.html>
More information about the QA
mailing list