Jimmy Wales wrote:
I changed the bash prompt for root on zwinger to something distinctive. It's all very well to say "be careful", but unfortunately our simple mammalian brains aren't designed to detect when one familiar bit of english text is replaced by another. Colours, flashing lights, pretty pictures -- these work better. They allow faster recognition and require less concentration.
Of course I fully agree. No human can really be blamed for this sort of error, our simple mammalian brains are simply not suitable for this type of repetitive work. (Jeronim typed reboot about 100 times that day, on purpose, on other servers.)
The one thing that came to my mind is: why does anyone log into zwinger in the first place? Since it's this horribly frightening SPOF, ought we to not avoid even _looking_ at it funny?
--Jimbo
Am I correct in thinking that the problem seems to be centred on Zwinger's being a central NFS server for a number of crucial read-only configuration files used by a large number of servers, and apps behaving in a peculiar (and usually disastrous) way when NFS dies?
How about just keeping local copies on each server, running rsync, rather than NFS, on the master, and using rsync to keep all the local copies on te slaves in sync? The files can even be kept "in the same place" as currently, using symbolic links. If the master falls over, all of its clients continue to work, and they will continue updating when the master is either brought back up, or replaced. A CNAME would probably be a good way of designating the master.
No new technology needed, which is rather better than my original idea of implementing reliable NFS failover at the client, which I won't go into further, other that to say that it seemed a good idea until I considered (a) the wrongness of re-inventing the wheel in a very complex way, and (b) the fact that the best way to keep the multiple redundant NFS servers in sync would be rsync...
-- Neil