[Labs-l] *COMPLETE* Planned maintenance

Marc-André Pelletier mpelletier at wikimedia.org
Fri Oct 11 07:21:45 UTC 2013


Hello all,

The switch to the new NFS server hardware is now complete, with only 12
hours of delay.  :-)

Seriously though, the filesystem copy took so long because - despite
having prepared with a copy a day early so that only a quick rsync would
suffice - there ended up being a bit over 4T worth of files that were
touched in that 24h period.

Log files, by their nature, are constantly updated.  Normally, that
shouldn't have been a major issue since the rsync would just copy them
again...  except that many of those log files were *hundreds of gigs* in
size.  This caused the filesystem resync to take a bit over 11 hours.

While Cyberbot was the overall winner, with very nearly 2G of logs to
his name, our winner for most impressive single log is yifeibot with
990G in a single log file!

I would really much rather not have to turn quota on on our file system:
it is very useful to be occasionally able to handle huge datasets.
However, if users abuse the freedom I will have no choice but to do so
in order to protect reliability and QoS.

So, the delay having been that long, some things may have broken that I
was no longer able to notice (I've been at this for 14h straight, now,
and need sleep).  I'll be on hand tomorrow to help work out any kinks
that may have slipped in.

On the positive side, however, we are now on new hardware for the NFS
server and it seems to be working quite fine.  Yeay!

-- Marc




More information about the Labs-l mailing list