---------- Forwarded message ---------- From: Yuvi Panda yuvipanda@gmail.com Date: Thu, Jun 18, 2015 at 4:56 PM Subject: Ongoing Labs Outage Update To: labs-announce@lists.wikimedia.org
Yesterday, the filesystem used by many Labs tools suffered a catastrophic failure, causing most tools to break. This was noticed quickly but recovery is taking a long time because of the size of the filesystem.
There have been file system corruption on the filesystem backing the NFS setup that all of labs uses, causing a prolonged outage. The Operations team is currently attempting to restore a backup made on June 9 at 16:00 UTC. Recovery of modifications made after that date is potentially possible, but our first priority is getting the backup restored. We will update the incident report page https://wikitech.wikimedia.org/wiki/Incident_documentation/20150617-LabsNFSO... with notes on our progress. E-mails will also be sent to the labs-announce (https://lists.wikimedia.org/mailman/listinfo/labs-announce) and labs-l (https://lists.wikimedia.org/mailman/listinfo/labs-l) on significant changes. We are not yet able to estimate when things will be back up fully.
This also means that tools hosted on tools.wmflabs.org will not be accessible until this is finished, and even then they might need some more fiddling to work properly. We will update https://wikitech.wikimedia.org/wiki/Incident_documentation/20150617-LabsNFSO... as well as soon as we have more information.
If you have a non-tools project on labs that does not depend on NFS and is currently down, you can recover it by getting rid of NFS. (We can help you with that.) For instructions, see https://wikitech.wikimedia.org/wiki/Recover_instance_from_NFS . Join us on #wikimedia-labs and we will assist you.
-- Yuvi Panda T http://yuvi.in/blog