[Wikitech-l] Re: cluster monitoring

14 Nov 2005

Ashar Voultoiz wrote:
...
  Mark Bergsma wrote:
 <snip>

  Can we
start installing snmpd on all servers to at least get some basic
data ? :o) 
That's exactly the same data ganglia is currently monitoring, so I don't
really see the point...  

 So lets write ganglia scripts :o)

 If we want to monitor every minute 15 services, we will have to telnet
 the gmetad every 2 seconds. We could build a caching system though:

 Check gmetad, cache the result for one minute, the have the nagios
 plugins grep the cache instead of telneting gmetad.

 I think i have an idea about how to handle that.

I wrote a perl script a while back to poll the gmond XML output from one
machine and stop or start a process on another machine based on the value of
a metric retrieved. I didn't use telnet (ick), I read from a socket and then
used an XPath module to find the metric in the XML. It's probably lying
around in my home directory somewhere if you want to look at it.

If caching is required, then adding metrics to nagios is obviously not the
same as adding metrics to ganglia. For ganglia, you run gmetric whenever a
metric changes, so you can have a loop that sets 30 metrics in each pass if
you like. You don't give it a plugin for it to invoke at its leisure, you
make your own daemon.

-- Tim Starling

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

2003

2002

[Wikitech-l] Re: cluster monitoring