I opened https://phabricator.wikimedia.org/T192422 and depooled labvirt1015 for now. I don't know that this is actually cause for alarm, but 97 VMs seems like a lot of eggs to have in one basket.
-A
Subject: | ** PROBLEM alert - labvirt1015/ensure kvm processes are running is CRITICAL ** |
---|---|
Date: | Wed, 18 Apr 2018 01:17:17 +0000 |
From: | icinga@einsteinium.wikimedia.org |
To: | abogott@wikimedia.org |
Notification Type: PROBLEM Service: ensure kvm processes are running Host: labvirt1015 Address: 10.64.20.31 State: CRITICAL Date/Time: Wed Apr 18 01:17:17 UTC 2018 Notes URLs: Additional Info: PROCS CRITICAL: 97 processes with regex args /usr/bin/kvm