[Labs-l] Wikimedia labs-tools

Marc-André Pelletier mpelletier at wikimedia.org
Wed Sep 11 17:23:54 UTC 2013


On 2013-09-11, at 7:50 AM, John <phoenixoverride at gmail.com> wrote:

> My question is why has the wmf decided to degrade the environment where tool developers design and host tools (quite a few of them are long term stable projects)? and what can we do to remedy this?

You know, that terminology thing is really really going to bite us in the end.

Labs (and Tool Labs) are, indeed, not "production" systems but considered 
"semi-production".  But what does that actually mean in practice?

It means that we are aiming for two nines of uptime, hoping to reach or approach three, and that when something fails off-hours only /some/ of the operations team gets rushed to help as opposed to /all/ of it in the case of a production failure.

It doesn't mean that we don't consider the Labs important, just that our uptime objectives are not quite the same (and, accordingly, resource allocation).

When I, or Ryan, says to a project/tool that "you should try to get this in production", it isn't because we wouldn't care about it otherwise, but that the project itself has stated or implied that two nines of uptime isn't adequate for their needs.  WLM is a good example: they are going (for a small period of time) to be very very publicly visible during a project that has press coverage.  An unplanned outage of an hour or two that would have been a minor inconvenience for a maintenance bot could be a disaster for them.

That doesn't mean we'd not *try* to help them if something failed, or that we'd dismiss downtime as "it's just labs"; it just means that when operations is allocating resources, we're doing so with an eye towards more than two nines and not more than three.

-- Marc


More information about the Labs-l mailing list