Ashar Voultoiz wrote:
Nagios is still on larousse although it is not running at the momment. I could easily upgrade it to lastest version (2.0b4), tweak the config files to add the new servers (something like 60+ new friends).
We will have to choose a server to run nagios on. Larousse seems to be a good choice as it is mostly idling, serve pages for http://noc.wikimedia.org/ and got used for servmon. Larousse could become THE monitoring device (and eventually move ganglia from zwinger to larousse).
Yes. Although larousse is getting old, and the install running on it is too. We might want to do a reinstall before that.
Reusing gmetad data is probably a better idea, the data in nagios and ganglia would be the same. One of the problems is that we will have to code a nagios plugin that cache the gmetad data to avoid multiples queries (we probably dont want to query gmetad for cpu, then for memory then for nfs call, then for each disk space usage).
I don't know ganglia too well, but this seems like the best option to investigate. If ganglia is flexible and uncomplicated enough to add new metrics easily, then this could certainly work.
SNMP is a great tool for grabing devices status. Again it s probably redundant with gmetad but will let us monitor network equipment such as the switches, our ISP router and probably the console switch.
Can we use SNMP for devices that support it, and use ganglia for the rest?
In my experience, SNMP is nice and easy for things that the standard net-snmpd supports, but it gets nasty beyond that, i.e. if you want to add things yourself...