Date: Mon, 17 Dec 2007 00:19:28 +0000
From: River Tarnell <river(a)wikimedia.org>
Subject: Re: [Toolserver-l] downtime
To: toolserver-l(a)lists.wikimedia.org
Message-ID: <4765C090.1000808(a)wikimedia.org>
Content-Type: text/plain; charset="iso-8859-1"
the maintenance is finished now. the problem was caused by
filesystem corruption on clematis:/aux0, the QFS filesystem
where hemlock's /home is currently mounted from, when the
connection to the iSCSI array was broken. i have replaced
this filesystem with a VxFS filesystem, which should be more
resiliant against problems like this.
the iSCSI problem was my fault, so sorry for that. to
prevent it happening in the future, i've asked for the array
to be connected directly to clematis's NIC, which should be
more reliable. in the longer term the plan is to move
hemlock's /home back to a local array; this should happen
either this month or next, when a new array is installed at knams.
a very small number of files were unrecoverable from the
damaged /home.
if any of your files are missing, mail ts-admins or file a
bug and they can be restored from a backup.
- river.
river - thanks for your efforts to get things back on the air. Much
appreciated.
Do you have a list of what files were unrecoverable?
I suppose we all, as good developer hygiene, should maintain a list of
everything we have (and an offsite backup) but .. :)
Larry Pieniazek
Hobby mail: Lar at Miltontrainworks dot com