Le 08/06/2016 à 15:02, Antoine Musso a écrit :
The operation team has worked hard this European morning to backup
files, investigate the raid issue and setup a new host.
We are in the process of reinstalling everything on the new host and
bring back Jenkins and Zuul on it.
No ETA yet, since a 5 years old boxes must have hidden issues which
makes it hard to estimate how long it would need to fully recover.
A status update:
Ops (Jaime, Faidon, Mark, Chris) had a disk replaced and the raid array
is rebuilding right now. Should take roughly an hour from now. If the
disk and raid are confirmed to be fine, we would bring back Jenkins and
Zuul.
A new server has been installed contint1001. Jenkins data are being
copied there. We would need to adjust a few network rules and update IP
address in configuration files then attempt to switch to that new setup.
Main task is:
https://phabricator.wikimedia.org/T137265
The CI service is back since 19:00 UTC after a disk got replaced and the
RAID array rebuild successfully.