[Labs-announce] [Labs-l] IMPORTANT: Reboots (again!) on Wednesday

Andrew Bogott abogott at wikimedia.org
Mon Apr 4 04:05:52 UTC 2016


Security updates are not something you can opt out of.  Most concretely: 
we have to keep our physical hardware secure and up-to-date, and 
rebooting a physical virtualization node has the side-effect of cycling 
power on the resident VMs.

Even if it weren't a natural consequence of our own maintenance, though, 
I would still be rolling out these updates.  Foundation staff can't 
engage in total supervision of all Labs use or activity but we are, 
nonetheless, effectively responsible for any malicious activities that 
might originate from within our cluster.  Naturally we need to take 
whatever general measures we can to prevent Labs from turning into a 
spam farm.  When a staff of three is managing more than 700 VMs we are 
always going to have to rely on general one-size-fits-all maintenance 
solutions.

In almost all cases it is trivial or near-trivial to set up a service so 
that it recovers gracefully from a reboot.  If the few minutes of 
downtime that result are unacceptable, we can provide resources for a 
fail-over instance and ensure that the two instances are spread out on 
different virt hosts so that they don't both go down at once.  Let me 
know if you need help with either of the above!

-A


On 4/3/16 5:23 PM, Petr Bena wrote:
> Is it possible to somehow opt-out of these auto updates and reboots?
> Majority of kernel exploits (like 99.9999%) of them can be exploited
> only if the hacker has at least some access to remote system (eg. over
> ssh) so that they can execute something that eventually get them
> rights of user 0.
>
> This makes sense for tool labs, but I see no reason why for example
> wm-bot's instance need to get this patch, when all people who can
> access it over ssh already have root. I am absolutely sure there is no
> security risk while running obsolete kernel on that one for example.
> The reboots on other hand cause much more harm.
>
> Thanks
>
> On Wed, Mar 23, 2016 at 11:42 PM, Andrew Bogott <abogott at wikimedia.org> wrote:
>> This is now done; all labs services should be back to normal.  As always,
>> it's a good idea to poke at your tools and make sure there aren't jobs that
>> need restarting.
>>
>> -Andrew
>>
>>
>> On 3/23/16 8:23 AM, Andrew Bogott wrote:
>>> Reminder: This is happening today, starting in about 40 minutes.
>>>
>>> On 3/18/16 1:58 PM, Andrew Bogott wrote:
>>>> Yet another kernel exploit turned up this week, which means another round
>>>> of kernel updates and reboots. All labs instances will be rebooted (at
>>>> various and unpredictable times) this Wednesday, 2016-03-23, beginning
>>>> around 14:00 UTC.
>>>>
>>>>     We're getting pretty good at this :(  We'll pool- and de-pool exec
>>>> nodes as needed to minimize surprise endings for ToolLabs jobs, but as usual
>>>> there will be some stragglers that get cut short or don't get restarted
>>>> properly.  Keep an eye out on Wednesday, and let us know on IRC if you run
>>>> into trouble.
>>>>
>>>> -Andrew
>>>>
>>
>> _______________________________________________
>> Labs-announce mailing list
>> Labs-announce at lists.wikimedia.org
>> https://lists.wikimedia.org/mailman/listinfo/labs-announce
>> _______________________________________________
>> Labs-l mailing list
>> Labs-l at lists.wikimedia.org
>> https://lists.wikimedia.org/mailman/listinfo/labs-l
> _______________________________________________
> Labs-l mailing list
> Labs-l at lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/labs-l




More information about the Labs-announce mailing list