Le 19/06/13 21:55, Ori Livneh a écrit :
I added a link to http://tinyurl.com/n3twd8k to the channel topic of #wikimedia-tech & #wikimedia-operations. It points to a live graph of MediaWiki's error rate over the last 24 hours . I hope to automate monitoring of this data sometime soon, but in the meantime let's keep an eye on it collectively, especially right after deployments.
If you can get ops to adds in a nagios plugin to check a ganglia metric, we could get IRC spam whenever something is wrong :)
A fairly recent plugin:
https://github.com/MrMichaelWill/nagios-ganglia-plugin
There is even a python plugin in Ganglia/contrib (5 years old though):
https://github.com/ganglia/monitor-core/blob/master/contrib/check_ganglia.py