Postmortem: Partial Toolserver-outage - Toolserver-l

11 Feb 2013


      Hello all,
great parts of the toolserver-cluster were down or very slow in the last few 
hours. AFAIS it was a problem with the user-store or rosemary (where the user-
store is physically connected). I rebooted rosemary, but the reboot showed 
problems with its IPv6-address. I tried to fix that what caused several other 
reboots. Rosemary is now up and running but the user-store is not available 
(looks like Nosy just mounted it without updating the fstab-file). So I was 
forced to remove the user-store everywhere (beside on willow because it need a 
reboot to do that and a reboot is scheduled already later for today).
I will try if I can find the partition for user-store and mount it but I have 
not much hope (there are way to many devices to try) – just to be clear: There 
is no data lost. Also away will be munin, because its data is also mounted on 
that host. I fear that we have to wait for Nosy to recover before we get the 
user-store back.
tl;dr: TS had problems, user-store is away.
Sincerely,
DaB.
-- 
Userpage: [[:w:de:User:DaB.]] — PGP: 0x2d3ee2d42b255885