[Labs-l] Possible mild puppet and wikitech breakages tomorrow

Andrew Bogott abogott at wikimedia.org
Wed May 27 18:35:46 UTC 2015

As part of our quarterly emphasis on Labs stability and resilience, I've 
been setting up backup or hot spare systems for many labs services.

Of course, a backup system is only useful if you can switch to it. 
Tomorrow I will be attempting a switchover from our primary OpenStack 
controller, virt1000, to a new system, labcontrol1001. Most likely it 
will go poorly and I will switch back and forth several times.  So, 
during my workday tomorrow (beginning at approximately 14:00 UTC) expect 
occasional interruptions in some labs services.  I'll keep the duration 
of any of these to a minimum.

What might break:

- Instance creation/deletion [1]
- Various wikitech queries [1]
- Wikitech logins [2]
- Puppet runs on labs instances [3]

What definitely won't break:

- Anything that a toollabs user would notice or care about
- Anything internal to a labs instance (apart from noisy puppet runs)
- Existing wikitech sessions
- Instance network connectivity

I am operating under the assumption that the items in the 'might break' 
list are pretty much never time-critical for anyone.  If that's mistaken 
then please correct me.


[1] Due to nova services, e.g. the nova-scheduler or nova-conductor
[2] Due to the OpenStack identity service, Keystone
[3] The puppetmaster runs on virt1000 /and/ I'm adding a new service 
name for puppet, 'labs-puppetmaster-eqiad' so we're no longer coupled to 
a specific host.

More information about the Labs-l mailing list