[Labs-announce] Ongoing Labs Outage Update

Yuvi Panda yuvipanda at gmail.com
Thu Jun 18 15:56:58 UTC 2015


Yesterday, the filesystem used by many Labs tools suffered a
catastrophic failure, causing most tools to break. This was noticed
quickly but recovery is taking a long time because of the size of the
filesystem.

There have been file system corruption on the filesystem backing the
NFS setup that all of labs uses, causing a prolonged outage. The
Operations team is currently attempting to restore a backup made on
June 9 at 16:00 UTC. Recovery of modifications made after that date is
potentially possible, but our first priority is getting the backup
restored. We will update the incident report page
https://wikitech.wikimedia.org/wiki/Incident_documentation/20150617-LabsNFSOutage
with notes on our progress. E-mails will also be sent to the
labs-announce (https://lists.wikimedia.org/mailman/listinfo/labs-announce)
and labs-l (https://lists.wikimedia.org/mailman/listinfo/labs-l) on
significant changes. We are not yet able to estimate when things will
be back up fully.

This also means that tools hosted on tools.wmflabs.org will not be
accessible until this is finished, and even then they might need some
more fiddling to work properly. We will update
https://wikitech.wikimedia.org/wiki/Incident_documentation/20150617-LabsNFSOutage
as well as soon as we have more information.

If you have a non-tools project on labs that does not depend on NFS
and is currently down, you can recover it by getting rid of NFS. (We
can help you with that.) For instructions, see
https://wikitech.wikimedia.org/wiki/Recover_instance_from_NFS . Join
us on #wikimedia-labs and we will assist you.


-- 
Yuvi Panda T
http://yuvi.in/blog



More information about the Labs-announce mailing list