[Labs-l] [Maintenance] Labs NFS storage

Tue Mar 24 19:58:55 UTC 2015

What SW you use to sync the current and new storage?

On Tue, Mar 24, 2015 at 8:29 PM, Marc A. Pelletier <marc at uberbox.org> wrote:
> TL;DR: NFS will be slow for a few days then briefly unavailable on March
> 26, 2015 at 22:00 UTC (less than five minutes).
>
> Tracked at: https://phabricator.wikimedia.org/T93792
>
> == The good news ==
>
> backups are coming (back) to the Labs storage, with snapshots into the
> past.  In addition, we will replicate data to another datacenter, so
> that there will be an available backup in case of disaster.
>
> == The bad news ==
>
> In order to finish moving project storage to the new filesystem that has
> snapshots enabled, a copy needs to be performed to synchronize the (new,
> not live) filesystem with the currently active one.
>
> This means that for the next two days (estimated) starting at 22:00 UTC,
> the performance of file I/O on the NFS server (for /data/project and
> /home) will be noticably lower.  I will keep a close eye on the process
> and try to balance the available resources so that the copy does not
> take more than about half the disk bandwidth, but there will be a
> noticable increase in latency for all file operations on that filesystem.
>
> == The switchover ==
>
> Labs instances are tentatively scheduled to be moved to the new
> filesystem on March 26, 2015 at 22:00 UTC.  At that point, there will be
> a brief (<2 minutes) interruption during which file operations will be
> moved from one filesystem to the other.  This will be confirmed at least
> 24h in advance.
>
> File operations in effect during that brief outage will be unavoidably
> interruped and currently opened files will be forcibly closed.  They can
> be reopened immediately afterwards, but running processes may error out
> because of this.
>
> To avoid possible issues with running jobs (including webservices) in
> tool labs, all running jobs will be rescheduled and restarted at that
> time.  Jobs that run at interval through crontabs should not be affected
> unless they were scheduled to run exactly at the time of the outage.
>
> The older copy of the data will be kept around for several week, so if
> anything went wrong in the copying process they will be preserved and
> can be restored.
>
> == What you can do ==
>
> If you have directories the contents of which are not worthwhile to back
> up (caches, easily regenerated data, backups) you may add a file at
> their root to control whether they are copied (and what is copied).  The
> file needs to be named '.nobackup' and follow the rsync filter rules.
> (You can get a detailed explanation of the rules in the rsync manpage
> under the 'FILTER RULES' section).
>
> tl;dr: If all you need to do is exclude a directory entirely, then you
> only need to put "- *" in the file at the top of that directory (a dash,
> a space, and an asterisk).
>
> Doing so will improve the speed at which backups of your data are taken,
> and noticably reduce the performance impact.  This only affects backups
> intended for data recover - the snapshot process so that local
> time-based backups of your files remain available.
>
> Please take a moment to do so, especially if the directories contain
> many (over 10000) files.
>
> -- Marc
>
> _______________________________________________
> Labs-l mailing list
> Labs-l at lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/labs-l