I opened https://phabricator.wikimedia.org/T192422 and depooled labvirt1015 for now. I don't know that this is actually cause for alarm, but 97 VMs seems like a lot of eggs to have in one basket.
-A
-------- Forwarded Message -------- Subject: ** PROBLEM alert - labvirt1015/ensure kvm processes are running is CRITICAL ** Date: Wed, 18 Apr 2018 01:17:17 +0000 From: icinga@einsteinium.wikimedia.org To: abogott@wikimedia.org
Notification Type: PROBLEM
Service: ensure kvm processes are running Host: labvirt1015 Address: 10.64.20.31 State: CRITICAL
Date/Time: Wed Apr 18 01:17:17 UTC 2018
Notes URLs:
Additional Info:
PROCS CRITICAL: 97 processes with regex args /usr/bin/kvm