[Labs-l] Yet another partial labs outage

Sun May 17 05:46:53 UTC 2015

On Sat, May 16, 2015 at 4:02 AM, Maarten Dammers <maarten at mdammers.nl>
wrote:

> With that you basically break the edit flow of most users on Wikidata, see
> https://www.wikidata.org/wiki/Wikidata:Project_chat#wdq.wmflabs.org.2Fapi
> . This is one of those tools that have silently become production.
>
>
It may not be correct to say "you" here :). If something is really
important, it should be run in a way that can handle failure of the
underlying hardware. Labs infrastructure wasn't designed for high-uptime of
the underlying instances (purposely).

If it's production-ish, it should likely either be moved to production or
you should put a bit of effort into making it work across multiple
instances. The ideal goal is for services to be stateless, with their state
living in databases that are also split across instances. It's best to have
the service config managed (ideally puppetized since it's what wikimedia
uses) so that a loss of an instance is only a brief inconvenience.

- Ryan
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.wikimedia.org/pipermail/labs-l/attachments/20150516/b29ebea3/attachment.html>