[Labs-l] Custom nagios checks

Ryan Lane rlane32 at gmail.com
Tue Feb 4 08:30:08 UTC 2014

On Tue, Feb 4, 2014 at 12:14 AM, Petr Bena <benapetr at gmail.com> wrote:

> I think that Ryan said something like he would most happily get rid of
> puppet or replace it with a better solution :P but if you really want
> to keep stuff managed by puppet, I still see an issue with other
> projects which aren't using puppet, or which do use different
> puppetmaster.
I didn't say that. I said if you're starting from scratch you should
consider something other than puppet. That wasn't about Labs or Wikimedia
at all.

> To be honest, from my point of view, puppet as it is now on labs is
> almost unusable for non-ops users. Getting any simple change merged
> unless it's top priority thing requires someone from ops, and usually
> take at least few hours if not days. I can't imagine any sysadmin who
> can work like this, some changes need to be applied immediately, you
> can't wait for them to happen for days, so I expect that waste
> majority of projects that exist now will not use puppet anyway (you
> just can't force people to use it under these circumstances), so they
> wouldn't benefit from this.
You shouldn't be making changes to systems without code review. Wikimedia
Ops generally has a bad practice in this regard (self-merging). It's mostly
historical. Other places I've worked at or consulted with *require* code
review to merge.

So you know, I work like this (and I'm pretty reasonably productive, from
most people's perspective).

> That is why I think that even if we are to use this puppet nrpe
> management there still should be a way for manual adjustments and not
> just because of these projects, but also to fix other icinga issues.
> For example right now it receive some nonsense (broken) data from ldap
> about instances that don't even exist anymore. If there wasn't that
> nasty workaround consisting of instance ignore list, that prevents
> these hosts from being monitored, icinga would be full of hosts that
> are down. How would you apply i_dont_exist puppet class to nonexisting
> node? :P
Did you put a bug in about the broken data?

> I have nothing against "labs cloning production" beside that IMHO it
> should be the other way (production should actually clone labs, which
> is the testing env where changes should happen first before they get
> deployed on production), but still labs != production so I think we
> could have some extra thing here that would make it easier to manage
> icinga for regular, non-ops people which would exist on labs only and
> not on production.
The biggest reason we can't do the same thing in labs and production for
nagios is that in production nagios is generated via exported resources,
which are disabled in labs.

As far as I know that and ssh host keys are the only things in Wikimedia's
puppet that requires exported resources.

- Ryan
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.wikimedia.org/pipermail/labs-l/attachments/20140204/cd4b3551/attachment.html>

More information about the Labs-l mailing list