Re: [Cloud-admin] [Cloud-announce] additional monitoring on cloudvirts -- don't run them empty! - Cloud-admin

7 Dec 2018


      On 12/6/18 9:16 PM, Andrew Bogott wrote:
...
I recently noticed that some of our standard kvm/nova monitoring never 
got copied over from the labvirt puppet code to the cloudvirt puppet 
code.  Tomorrow I will merge 
https://gerrit.wikimedia.org/r/#/c/operations/puppet/+/478113/ to fix that.
Once that patch is merged, icinga will be a bit touchier on the 
cloudvirts.  In particular, it will alert for any cloudvirt that has 0 
VMs running on it.  (This turns out to be a useful thing to watch for 
because we've had cases where every single kvm process died at once.)
So, all 'idle' cloudvirts should nonetheless have a canary instance. For 
example, on the new analytics cloudvirts I created canaries like this:
$ OS_PROJECT_ID=testlabs openstack server create --image 
7c6371d1-8411-48c7-bf73-2ef6d6ff2a15 --flavor m1.small --nic 
net-id=7425e328-560c-4f00-8e99-706f3fb90bb4 --availability-zone 
host:cloudvirtan1004 canary-an1004-01
Once a virt host is in full service we can leave the canaries there or 
delete them -- there hasn't been any real consistent policy there.
Thanks for the heads up and the example command.
I think it makes sense to have a canary per cloudvirt. It does mean they 
are OSes that need to be updated and maybe ignored in metrics 
collection, but the annoyance should be minimal. It would be good to 
have a barebones OS image for them but I'd consider that a very low 
priority.
-- 
Giovanni Tirloni
Operations Engineer
Wikimedia Cloud Services