[Labs-l] Filesystem maintenance aborted

Marc A. Pelletier marc at uberbox.org
Thu Jan 15 20:09:10 UTC 2015


Hello Labs,

The maintenance, due today, was started then aborted after two hours 
since only roughly 2% of the necessary copy was done after that interval 
- which might have caused the partial outage to last well over four days.

The unexpected lack of performance was caused by the fact that labs 
storage does not currently have sufficient elbow room to make a 
duplicate of the data over a contiguous area of the disk array - causing 
performance much lower than that was observed during testing.

We have a new storage shelf on order that should be put in production 
fairly soon (weeks); rather than add the storage this provides 
immediately, I'll be able to use it to make an offline copy of the Labs 
storage /prior/ to the next attempt at switching the filesystems over to 
the new scheme - which I will schedule some time in the future.

The existing filesystem behaved as expected and was properly readonly 
during the two hours of partial outage, and has now been restored to 
full read-write.

In the meantime, there should be no lasting effect from the partial 
outage - in particular, the notes about existing open files becoming 
stale is not applicable since the filesystem was not switched.  No tool 
or service that was not otherwise affected by the readonly filesystem 
needs to be restarted.

-- Marc



More information about the Labs-l mailing list