thyme – postmortem - Toolserver-l

12 Nov 2012


      Hello all,
just a little story of what happened today: As you know I planed to dump the 
user-databases of rosemary today to import them on thyme later. Around 12 
o'clock CET I looked at the replag of thyme during a break and everything was 
fine. After my dinner I looked in my mails seeing an email from the OSM-guys 
complaining that their title-dir was away. As a background information: thyme 
carries the nfs-server of the user-store, title and munin – these are normally 
on hemlock, but because hemlock's SAN-card is broken we had to move them to 
another server.
Short time later I spoke with Nosy at IRC about thyme. She told me that thyme 
is inaccessible by SSH. Few days ago we had discovered that thyme's serial-
console was not working (we have put that on the datacenter-to-do-list). But 
without SSH and serial-console you can not even reboot a server neither 
access. Nosy had started to move the nfs-server from thyme to rosemary and we 
completed that together.
Because of the missing user-store the script that checks your quota at login 
failed and login to linux-servers was hardly possible. I deleted the script on 
these boxes and added a quick&dirty-fix to puppet. These fix failed later making 
the login at the linux-boxes impossible for some time (even for roots).
The switching of the user-store from thyme to rosemary made some problems on 
the userland-servers (because user-store was busy), but I think we fixed this. 
Maybe we have to reboot some boxes in the next days – I will send a mail if 
needed.
Thyme also carried my wikidata-replication-program which failed too (so the 
replag of wikidata everywhere increased). I moved it to another server now.
A strange thing is that the mysql-process on thyme is still running; even 
replication is working so the replag will not increase there.
The next step is to reach Mark or someone from the datacenter to reboot thyme 
and then look where the problem was. Munin shows nothing abnormal.
Just to let you know. Good night.
Sincerely,
DaB.
-- 
Userpage: [[:w:de:User:DaB.]] — PGP: 2B255885