Thought I'd mention something I worked on recently - I have a cron on deployment-cumin that runs puppet across everything (that openstack says is running) and emails me with a list of hosts with any problems. It has a little config allowing associating a host with a task. deployment-prep is looking better than I thought. (deploy-01 got broken in the security updates and I'm planning to look into cache-text04 later - think this was a timeout of some sort, likely related to my certificate work there)
---------- Forwarded message ---------- From: krenair@beta.wmflabs.org Date: 15 June 2018 at 19:05 Subject: Deployment-prep Puppet error hosts report To: krenair@gmail.com
HostnameTask? deployment-cache-text04.deployment-prep.eqiad.wmflabs None deployment-deploy-01.deployment-prep.eqiad.wmflabs T192561 https://phabricator.wikimedia.org/T192561 Hosts configured with tasks but are not listing as broken anymore: HostnameTask deployment-mx.deployment-prep.eqiad.wmflabs T184244
I'm very glad you're keeping an eye on those! Shinken reports many more breakages; I guess that's mostly an issue with purging down or no-longer-existing VMs.
On 6/15/18 1:33 PM, Alex Monk wrote:
Thought I'd mention something I worked on recently - I have a cron on deployment-cumin that runs puppet across everything (that openstack says is running) and emails me with a list of hosts with any problems. It has a little config allowing associating a host with a task. deployment-prep is looking better than I thought. (deploy-01 got broken in the security updates and I'm planning to look into cache-text04 later - think this was a timeout of some sort, likely related to my certificate work there)
---------- Forwarded message ---------- From: <krenair@beta.wmflabs.org mailto:krenair@beta.wmflabs.org> Date: 15 June 2018 at 19:05 Subject: Deployment-prep Puppet error hosts report To: krenair@gmail.com mailto:krenair@gmail.com
Hostname Task? deployment-cache-text04.deployment-prep.eqiad.wmflabs None deployment-deploy-01.deployment-prep.eqiad.wmflabs T192561 https://phabricator.wikimedia.org/T192561
Hosts configured with tasks but are not listing as broken anymore: Hostname Task deployment-mx.deployment-prep.eqiad.wmflabs T184244
Cloud-admin mailing list Cloud-admin@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/cloud-admin
Some of those will be hosts that are deliberately shut down (I have at least 4 hosts in deployment-prep like this pending deletion), actually we should probably make shinkengen check for what nova says the status is. Will look into it
On Fri, 15 Jun 2018, 20:00 Andrew Bogott, abogott@wikimedia.org wrote:
I'm very glad you're keeping an eye on those! Shinken reports many more breakages; I guess that's mostly an issue with purging down or no-longer-existing VMs.
On 6/15/18 1:33 PM, Alex Monk wrote:
Thought I'd mention something I worked on recently - I have a cron on deployment-cumin that runs puppet across everything (that openstack says is running) and emails me with a list of hosts with any problems. It has a little config allowing associating a host with a task. deployment-prep is looking better than I thought. (deploy-01 got broken in the security updates and I'm planning to look into cache-text04 later - think this was a timeout of some sort, likely related to my certificate work there)
---------- Forwarded message ---------- From: krenair@beta.wmflabs.org Date: 15 June 2018 at 19:05 Subject: Deployment-prep Puppet error hosts report To: krenair@gmail.com
Hostname Task? deployment-cache-text04.deployment-prep.eqiad.wmflabs None deployment-deploy-01.deployment-prep.eqiad.wmflabs T192561 https://phabricator.wikimedia.org/T192561 Hosts configured with tasks but are not listing as broken anymore: Hostname Task deployment-mx.deployment-prep.eqiad.wmflabs T184244
Cloud-admin mailing listCloud-admin@lists.wikimedia.orghttps://lists.wikimedia.org/mailman/listinfo/cloud-admin
Cloud-admin mailing list Cloud-admin@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/cloud-admin
cloud-admin@lists.wikimedia.org