[Labs-l] Reboot of virt11 Friday Sept 6 at 20:00 UTC

Fri Sep 6 23:16:51 UTC 2013

On Fri, Sep 6, 2013 at 6:01 AM, Ryan Lane <rlane at wikimedia.org> wrote:

> On Fri, Sep 6, 2013 at 5:46 AM, Maarten Dammers <maarten at mdammers.nl>wrote:
>
>> Hi Ryan,
>>
>> Op 4-9-2013 23:38, Ryan Lane schreef:
>>
>>  During wikimania I was cleaning up some base images that were eating up
>>> a large amount of disk space and caused an issue on virt11 that requires a
>>> reboot. This will cause a reboot of about 45 instances. Here's a list of
>>> the instances that will be affected:
>>>
>>> <https://wikitech.wikimedia.**org/w/index.php?title=Special:**
>>> Ask&q=[[Resource+Type%3A%**3Ainstance]][[Instance+Host%**
>>> 3A%3Avirt11]]&p=format%**3Dbroadtable%2Flink%3Dall%**2Fheaders%3Dshow%**
>>> 2Fsearchlabel%3Dinstances%**2Fclass%3Dsortable-**
>>> 20wikitable-20smwtable&po=%**3FInstance+Name%0A%3FInstance+**
>>> Type%0A%3FProject%0A%3FImage+**Id%0A%3FFQDN%0A%3FLaunch+Time%**
>>> 0A%3FPuppet+Class%0A%**3FModification+date%0A%**
>>> 3FInstance+Host%0A%3FNumber+**of+CPUs%0A%3FRAM+Size%0A%**
>>> 3FAmount+of+Storage%0A&limit=**100&eq=no<https://wikitech.wikimedia.org/w/index.php?title=Special:Ask&q=[[Resource+Type%3A%3Ainstance]][[Instance+Host%3A%3Avirt11]]&p=format%3Dbroadtable%2Flink%3Dall%2Fheaders%3Dshow%2Fsearchlabel%3Dinstances%2Fclass%3Dsortable-20wikitable-20smwtable&po=%3FInstance+Name%0A%3FInstance+Type%0A%3FProject%0A%3FImage+Id%0A%3FFQDN%0A%3FLaunch+Time%0A%3FPuppet+Class%0A%3FModification+date%0A%3FInstance+Host%0A%3FNumber+of+CPUs%0A%3FRAM+Size%0A%3FAmount+of+Storage%0A&limit=100&eq=no>
>>> >
>>>
>> How long will he downtime be and can you please announce earlier? A week
>> is a normal notice time.
>> The Wiki Loves Monuments tools and applications (like the mobile app)
>> rely on this so please keep it as short as possible.
>>
>>
> The reboot will take about 10 minutes.
>
> That said, relying on labs for something like this is legitimately insane.
> Have you talked with Wikimedia Foundation about getting production level
> support for WLM? That's what you actually need.
>
> What will you do if the node hosting your instance completely dies? Is
> your work puppetized? Can you just bring up a new instance to replace it?
> Are you doing backups?
>
> Outside of tools (and deployment-prep, which is rather ephemeral) we don't
> consider any project "semi-production" and the failure model is meant to be
> handled at the instance level. The underlying infrastructure will just fail
> and will not recover for you. You have to assume that your instances can
> simply disappear at any moment (this is the traditional cloud computing
> model, btw).
>
>
This was completed a couple hours ago and all instances were rebooted. If
you have any issues with your instance, please let me know.

- Ryan
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.wikimedia.org/pipermail/labs-l/attachments/20130907/c1a53b6b/attachment.html>