Ashar Voultoiz wrote:
Mark Bergsma wrote:
<snip>
Can we start installing snmpd on all servers to at least get some basic data ? :o)
That's exactly the same data ganglia is currently monitoring, so I don't really see the point...
So lets write ganglia scripts :o)
If we want to monitor every minute 15 services, we will have to telnet the gmetad every 2 seconds. We could build a caching system though:
Check gmetad, cache the result for one minute, the have the nagios plugins grep the cache instead of telneting gmetad.
I think i have an idea about how to handle that.
I wrote a perl script a while back to poll the gmond XML output from one machine and stop or start a process on another machine based on the value of a metric retrieved. I didn't use telnet (ick), I read from a socket and then used an XPath module to find the metric in the XML. It's probably lying around in my home directory somewhere if you want to look at it.
If caching is required, then adding metrics to nagios is obviously not the same as adding metrics to ganglia. For ganglia, you run gmetric whenever a metric changes, so you can have a loop that sets 30 metrics in each pass if you like. You don't give it a plugin for it to invoke at its leisure, you make your own daemon.
-- Tim Starling