Ashar Voultoiz wrote:
Nagios is still on larousse although it is not running
at the momment. I
could easily upgrade it to lastest version (2.0b4), tweak the config
files to add the new servers (something like 60+ new friends).
We will have to choose a server to run nagios on. Larousse seems to be a
good choice as it is mostly idling, serve pages for
http://noc.wikimedia.org/ and got used for servmon. Larousse could
become THE monitoring device (and eventually move ganglia from zwinger
to larousse).
Yes. Although larousse is getting old, and the install running on it is
too. We might want to do a reinstall before that.
Reusing gmetad data is probably a better idea, the
data in nagios and
ganglia would be the same. One of the problems is that we will have to
code a nagios plugin that cache the gmetad data to avoid multiples
queries (we probably dont want to query gmetad for cpu, then for memory
then for nfs call, then for each disk space usage).
I don't know ganglia too well, but this seems like the best option to
investigate. If ganglia is flexible and uncomplicated enough to add new
metrics easily, then this could certainly work.
SNMP is a great tool for grabing devices status. Again
it s probably
redundant with gmetad but will let us monitor network equipment such as
the switches, our ISP router and probably the console switch.
Can we use SNMP for devices that support it, and use ganglia for the rest?
In my experience, SNMP is nice and easy for things that the standard
net-snmpd supports, but it gets nasty beyond that, i.e. if you want to
add things yourself...
--
Mark
mark(a)nedworks.org