Hello all,
just a little story of what happened today: As you know I planed to dump the user-databases of rosemary today to import them on thyme later. Around 12 o'clock CET I looked at the replag of thyme during a break and everything was fine. After my dinner I looked in my mails seeing an email from the OSM-guys complaining that their title-dir was away. As a background information: thyme carries the nfs-server of the user-store, title and munin – these are normally on hemlock, but because hemlock's SAN-card is broken we had to move them to another server. Short time later I spoke with Nosy at IRC about thyme. She told me that thyme is inaccessible by SSH. Few days ago we had discovered that thyme's serial- console was not working (we have put that on the datacenter-to-do-list). But without SSH and serial-console you can not even reboot a server neither access. Nosy had started to move the nfs-server from thyme to rosemary and we completed that together. Because of the missing user-store the script that checks your quota at login failed and login to linux-servers was hardly possible. I deleted the script on these boxes and added a quick&dirty-fix to puppet. These fix failed later making the login at the linux-boxes impossible for some time (even for roots). The switching of the user-store from thyme to rosemary made some problems on the userland-servers (because user-store was busy), but I think we fixed this. Maybe we have to reboot some boxes in the next days – I will send a mail if needed. Thyme also carried my wikidata-replication-program which failed too (so the replag of wikidata everywhere increased). I moved it to another server now. A strange thing is that the mysql-process on thyme is still running; even replication is working so the replag will not increase there. The next step is to reach Mark or someone from the datacenter to reboot thyme and then look where the problem was. Munin shows nothing abnormal.
Just to let you know. Good night.
Sincerely, DaB.