[Labs-l] Disruptive Tools NFS maintenance on 11/2/2016

Madhumitha Viswanathan mviswanathan at wikimedia.org
Fri Oct 21 18:59:42 UTC 2016


On Fri, Oct 21, 2016 at 11:49 AM, Martin Urbanec <
martin.urbanec at wikimedia.cz> wrote:

> If 48 hours are required for this, maybe we can offer backup solution
> which will work as before. So there will be no outage, all will be setup at
> another virtuals and then only swith them. If there will be problem, switch
> again, solve it and switch again.
>
> Is this possible?
>
> This is not possible. At some point we have to stop new data being written
into, sync the latest changes and then switch everyone over to the new
servers. Not doing this will cause data loss and inconsistency, and not
something we'd do.

Martin
>
> PS: I can live without toollabs for 48 hours but a lot of tools depends on
> availability so I strongly prefer as short window as possible.
>
> pá 21. 10. 2016 v 20:29 odesílatel Martin Domdey <animalia at gmx.net>
> napsal:
>
>> Why do you need 48 hours for that?
>>
>> I'm submitting very many cron jobs the day to deliver much stuff and
>> services to a lot of users in dewiki and other wikis. An outage window of
>> 48 hours (!) is simply not possible.
>> Please suggest a solution how I can work on during the outage window or
>> at least a crontab that can handle the data and files on tools.taxonbot.
>> You maybe can install a NFS redundancy for at least that time.
>>
>> Thank you
>> Martin ...
>>
>>
>>
>> *Gesendet:* Freitag, 21. Oktober 2016 um 20:00 Uhr
>> *Von:* "Madhumitha Viswanathan" <mviswanathan at wikimedia.org>
>> *An:* "Wikimedia Labs" <labs-l at lists.wikimedia.org>,
>> labs-announce at lists.wikimedia.org
>> *Betreff:* [Labs-l] Disruptive Tools NFS maintenance on 11/2/2016
>> As the next step in our storage redundancy and reliability efforts for
>> Labs, we have a significant migration coming up on 11/2 starting 08:00
>> PST(15:00 UTC) involving the tools NFS share. The maintenance window can be
>> up to 48h long, and will affect most running tools. At the end of the
>> migration, everything (except transient jobs) should ideally be working the
>> same way as they were before the migration, but better.
>>
>> Here's what to expect during the maintenance window:
>>
>> * The tools NFS share (/data/project and /home) will be read-only for the
>> duration of the maintenance, so no new data or logs will get written to it.
>> * New jobs cannot be submitted for the whole maintenance window - this
>> means submitting jobs through cron or tools-mail will not function,
>> although tools-mail can continue to send emails.
>> * Current jobs might keep running, but won't get rescheduled if they die.
>> If they do not die and aren't writing to NFS they should be fine.
>> * All exec nodes will get depooled, rebooted and repooled and jobs that
>> don't get rescheduled automatically will have died and need manual restarts.
>>
>> Do let us know if you have any questions or concerns on the lists or on
>> #wikimedia-labs.
>>
>> --
>> Madhumitha Viswanathan
>> Operations Engineer, Wikimedia Labs
>> _______________________________________________ Labs-l mailing list
>> Labs-l at lists.wikimedia.org https://lists.wikimedia.org/
>> mailman/listinfo/labs-l
>> _______________________________________________
>> Labs-l mailing list
>> Labs-l at lists.wikimedia.org
>> https://lists.wikimedia.org/mailman/listinfo/labs-l
>>
>
> _______________________________________________
> Labs-l mailing list
> Labs-l at lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/labs-l
>
>


-- 
--Madhu :)
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.wikimedia.org/pipermail/labs-l/attachments/20161021/2b4c2ef2/attachment.html>


More information about the Labs-l mailing list