[Labs-announce] IMPORTANT: Security reboots everywhere tomorrow (DONE)
abogott at wikimedia.org
Wed Jun 21 20:19:46 UTC 2017
Reboots are done for today. Please let me know if you find things that
are broken as a result.
Thank you for your patience!
On 6/21/17 12:48 PM, Andrew Bogott wrote:
> Good news:
> I've now finished rebooting all the labvirt hosts, and the last few
> VMs should be in the process of spinning back up right now.
> Bad news:
> - We have a lot more (non-VM-hosting) servers to restart! Expect
> brief outages in storage, horizon, wikitech, etc during the next few
> - Most VMs are still running insecure kernels. I didn't upgrade them
> yet because it's not entirely clear that the current generation of
> security patches won't break random Java apps here and there. Once
> there's a clear resolution to that issue there will be another round
> of reboots, which I will announce in advance.
> - The restart process hit k8s nodes a bit harder than planned, so if
> your tools are running on Kubernetes you might want to verify that
> things are up and running properly. If they aren't, a simple restart
> should resolve things.
> On 6/21/17 8:56 AM, Andrew Bogott wrote:
>> Reminder: These reboots will start in about 5 minutes.
>> On 6/20/17 11:38 AM, Andrew Bogott wrote:
>>> Good morning!
>>> In order to plug a newly-revealed security hole, we need to upgrade
>>> kernels everywhere and reboot everything. I'll be doing this for
>>> all instances and VM hosts tomorrow, 2017-06-21, beginning at
>>> 14:00UTC (aka 7AM in San Francisco).
>>> Project admins should expect instances to be shut down for 10-15
>>> minutes and then restarted at some point tomorrow. I don't expect
>>> the process to take absolutely all day, but it might. If you have
>>> an instance that is a special case and you need it to be restarted
>>> during a more specific window, please respond to me directly with
>>> details about what you need and we'll try to make it happen.
>>> These reboots should have minimal impact on things running in Tools,
>>> although it's always good to keep an eye out -- some tools are
>>> unable to recover from a shift between nodes and need a manual restart.
>>> Sorry for the inconvenience!
More information about the Labs-announce