[Labs-l] Filesystem downtime to schedule
Marc A. Pelletier
marc at uberbox.org
Wed Dec 31 17:11:38 UTC 2014
Many of you may recall that until some point late 2013, one of the
features of the labs file server was that it provided time travel
snapshots (you could see a consistent view of the filesystem as it
existed 1h, 2h, 3h, 1d, 2d, 3d and 1 week ago).
This was disabled at that time - despite being generally considered
valuable - because it was suspected to be (part of) the stability
problems the NFS server suffered at the time. This turns out to not
have been the case, and we could turn it back on now.
Indeed, doing so is a prerequisite to the planned replication of the
filesystem in the new datacenter where a redundant Labs installation is
slated to be deployed.
The issue is that turning that feature back on requires changing the way
the disk space is currently allocated at a low level and necessitates
a fairly long period of partial downtime during which data is being
copied from one part of the disk subsystem to the other. In practice,
this would require the primary partitions (/home and /data/project) to
be set readonly for a period on the order of a day (24-30 hours).
That downtime is pretty much unavoidable eventually as it is a
requirement of expanding labs and improving data resillience and
reliability, but the /timing/ of that is flexible. I wanted to "poll"
labs users as to when the possibility of disruption is minimized, and
give everyone plenty of time to make contingency planning and/or notify
their endusers of the expected period of reduced availability.
Provided there is a good consensus that the week is a better time than
the weekend (I am guessing here that volunteer coders and users are more
active during the weekend) then I would suggest starting the operation
on Tuesday, January 13 at 18:00 UTC. The period of downtime is expected
to last until January 14, 18:00 UTC but may extend a few hours beyond that.
The expected impacts are:
* Starting at the beginning of the window, /home and /data/project will
switch to readonly mode; any attempt to write to files to those trees
will result in EROFS errors being thrown. Reading from those
filesystems will still work as expected, so would writing to other
* Read performance may degrade noticably as the disk subsystem will be
loaded to capacity;
* It will not be possible to manipulate the gridengine queue -
specifically, starting or stopping jobs will not work; and
* At the end of the window, when the operation is complete, the "old"
file system will go away and be replaced by the new one - this will
cause any access to files or directories that were previously opened
(including working directories) on the affected filesystems to error out
with ESTALE. Reopening files by name will access the new copy identical
to the one at the time the filesystems became readonly.
In practice, that latter impact has the effect that most running
programs will be unable to continue unless they have special handling
for this situation, and most gridengine jobs will no longer be able to
log output. It may be a good idea to restart any continuous tool at
that time. All webservices that were running at the start of the
maintenance window will be restarted at that time.
If you have tools or other processes running that do not rely on being
able to write to /data/project, they may be able to continue running
during the downtime without interruption. Jobs that only access the
network (for instance, the Mediawiki API) or the databases will not
likely be affected. Because of this, no automatic or forcible restart
of running (non-webservice) jobs will be made.
In particular, if you have a tool whose continued operation is
important, temporarily modifying it so that it works from /data/scratch
may be a good workaround.
Finally, in order to avoid risks of the filesystem move taking longer
than expected and increasing downtime significantly, LOG FILES OVER 1G
WILL BE NOT BE COPIED. If you have critical files that are not simple
log files but whose names end in .log, .err or .out then you MUST
compress those files if you absolutely require them to survive the
transition. Alternately, truncating them to some size comfortably
smaller than 1G will work if the file must remain uncompressed.
The speed and reliability of the maintenance process depends on the
total data to copy. If you can clean up both your home and project
directories of extraneous files, you'll help the process greatly. :-)
More information about the Labs-l