<div dir="ltr">On Fri, Sep 6, 2013 at 6:01 AM, Ryan Lane <span dir="ltr"><<a href="mailto:rlane@wikimedia.org" target="_blank">rlane@wikimedia.org</a>></span> wrote:<br><div class="gmail_extra"><div class="gmail_quote">
<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div dir="ltr"><div class="im">On Fri, Sep 6, 2013 at 5:46 AM, Maarten Dammers <span dir="ltr"><<a href="mailto:maarten@mdammers.nl" target="_blank">maarten@mdammers.nl</a>></span> wrote:<br>
</div><div class="gmail_extra"><div class="gmail_quote"><div class="im">
<blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex">Hi Ryan,<br>
<br>
Op 4-9-2013 23:38, Ryan Lane schreef:<div><br>
<blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex">
During wikimania I was cleaning up some base images that were eating up a large amount of disk space and caused an issue on virt11 that requires a reboot. This will cause a reboot of about 45 instances. Here's a list of the instances that will be affected:<br>
<br>
<<a href="https://wikitech.wikimedia.org/w/index.php?title=Special:Ask&q=[[Resource+Type%3A%3Ainstance]][[Instance+Host%3A%3Avirt11]]&p=format%3Dbroadtable%2Flink%3Dall%2Fheaders%3Dshow%2Fsearchlabel%3Dinstances%2Fclass%3Dsortable-20wikitable-20smwtable&po=%3FInstance+Name%0A%3FInstance+Type%0A%3FProject%0A%3FImage+Id%0A%3FFQDN%0A%3FLaunch+Time%0A%3FPuppet+Class%0A%3FModification+date%0A%3FInstance+Host%0A%3FNumber+of+CPUs%0A%3FRAM+Size%0A%3FAmount+of+Storage%0A&limit=100&eq=no" target="_blank">https://wikitech.wikimedia.<u></u>org/w/index.php?title=Special:<u></u>Ask&q=[[Resource+Type%3A%<u></u>3Ainstance]][[Instance+Host%<u></u>3A%3Avirt11]]&p=format%<u></u>3Dbroadtable%2Flink%3Dall%<u></u>2Fheaders%3Dshow%<u></u>2Fsearchlabel%3Dinstances%<u></u>2Fclass%3Dsortable-<u></u>20wikitable-20smwtable&po=%<u></u>3FInstance+Name%0A%3FInstance+<u></u>Type%0A%3FProject%0A%3FImage+<u></u>Id%0A%3FFQDN%0A%3FLaunch+Time%<u></u>0A%3FPuppet+Class%0A%<u></u>3FModification+date%0A%<u></u>3FInstance+Host%0A%3FNumber+<u></u>of+CPUs%0A%3FRAM+Size%0A%<u></u>3FAmount+of+Storage%0A&limit=<u></u>100&eq=no</a>><br>
</blockquote></div>
How long will he downtime be and can you please announce earlier? A week is a normal notice time.<br>
The Wiki Loves Monuments tools and applications (like the mobile app) rely on this so please keep it as short as possible.<span><font color="#888888"><br>
<br></font></span></blockquote><div><br></div></div><div>The reboot will take about 10 minutes.</div><div><br></div><div>That said, relying on labs for something like this is legitimately insane. Have you talked with Wikimedia Foundation about getting production level support for WLM? That's what you actually need.</div>
<div><br></div><div>What will you do if the node hosting your instance completely dies? Is your work puppetized? Can you just bring up a new instance to replace it? Are you doing backups?</div><div><br></div><div>Outside of tools (and deployment-prep, which is rather ephemeral) we don't consider any project "semi-production" and the failure model is meant to be handled at the instance level. The underlying infrastructure will just fail and will not recover for you. You have to assume that your instances can simply disappear at any moment (this is the traditional cloud computing model, btw).</div>
<span class="HOEnZb"><font color="#888888">
<div><br></div></font></span></div></div></div></blockquote><div><br></div><div>This was completed a couple hours ago and all instances were rebooted. If you have any issues with your instance, please let me know.</div><div>
<br></div><div>- Ryan</div></div></div></div>