On 12/6/18 9:16 PM, Andrew Bogott wrote:
I recently noticed that some of our standard kvm/nova monitoring never got copied over from the labvirt puppet code to the cloudvirt puppet code. Tomorrow I will merge https://gerrit.wikimedia.org/r/#/c/operations/puppet/+/478113/ to fix that.
Once that patch is merged, icinga will be a bit touchier on the cloudvirts. In particular, it will alert for any cloudvirt that has 0 VMs running on it. (This turns out to be a useful thing to watch for because we've had cases where every single kvm process died at once.)
So, all 'idle' cloudvirts should nonetheless have a canary instance. For example, on the new analytics cloudvirts I created canaries like this:
$ OS_PROJECT_ID=testlabs openstack server create --image 7c6371d1-8411-48c7-bf73-2ef6d6ff2a15 --flavor m1.small --nic net-id=7425e328-560c-4f00-8e99-706f3fb90bb4 --availability-zone host:cloudvirtan1004 canary-an1004-01
Once a virt host is in full service we can leave the canaries there or delete them -- there hasn't been any real consistent policy there.
Thanks for the heads up and the example command.
I think it makes sense to have a canary per cloudvirt. It does mean they are OSes that need to be updated and maybe ignored in metrics collection, but the annoyance should be minimal. It would be good to have a barebones OS image for them but I'd consider that a very low priority.
cloud-admin@lists.wikimedia.org