Date: Mon, 17 Dec 2007 00:19:28 +0000 From: River Tarnell river@wikimedia.org Subject: Re: [Toolserver-l] downtime To: toolserver-l@lists.wikimedia.org Message-ID: 4765C090.1000808@wikimedia.org Content-Type: text/plain; charset="iso-8859-1"
the maintenance is finished now. the problem was caused by filesystem corruption on clematis:/aux0, the QFS filesystem where hemlock's /home is currently mounted from, when the connection to the iSCSI array was broken. i have replaced this filesystem with a VxFS filesystem, which should be more resiliant against problems like this.
the iSCSI problem was my fault, so sorry for that. to prevent it happening in the future, i've asked for the array to be connected directly to clematis's NIC, which should be more reliable. in the longer term the plan is to move hemlock's /home back to a local array; this should happen either this month or next, when a new array is installed at knams.
a very small number of files were unrecoverable from the damaged /home. if any of your files are missing, mail ts-admins or file a bug and they can be restored from a backup.
- river.
river - thanks for your efforts to get things back on the air. Much appreciated.
Do you have a list of what files were unrecoverable?
I suppose we all, as good developer hygiene, should maintain a list of everything we have (and an offsite backup) but .. :)
Larry Pieniazek Hobby mail: Lar at Miltontrainworks dot com
toolserver-l@lists.wikimedia.org