On Fri, Sep 6, 2013 at 5:46 AM, Maarten Dammers <maarten@mdammers.nl> wrote:
Hi Ryan,

Op 4-9-2013 23:38, Ryan Lane schreef: How long will he downtime be and can you please announce earlier? A week is a normal notice time.
The Wiki Loves Monuments tools and applications (like the mobile app) rely on this so please keep it as short as possible.


The reboot will take about 10 minutes.

That said, relying on labs for something like this is legitimately insane. Have you talked with Wikimedia Foundation about getting production level support for WLM? That's what you actually need.

What will you do if the node hosting your instance completely dies? Is your work puppetized? Can you just bring up a new instance to replace it? Are you doing backups?

Outside of tools (and deployment-prep, which is rather ephemeral) we don't consider any project "semi-production" and the failure model is meant to be handled at the instance level. The underlying infrastructure will just fail and will not recover for you. You have to assume that your instances can simply disappear at any moment (this is the traditional cloud computing model, btw).

- Ryan