SPOF notes - Wikitech-l

19 Sep 2005


      Another note; we really need to be able to survive a Zwinger downtime
much better than we do currently. Sometimes its an avoidable accident;
but it might be hardware failure... and we'd like to get Zwinger
upgraded some day so we don't have to special-case installations on it
for Red Hat 9. Surviving that upgrade would be nice. ;)
Most of the files the web servers need to run (eg, the PHP scripts
themselves) are stored on each machine's local disk, and we push out
updates. That's good!
Uploaded files are on another server; downtime there too can also be
bad, but at least a zwinger down shouldn't be killing those too.
However we are reading a few bits off of zwinger's NFS (some block lists
etc, some lock files) and sometimes writing (logs). Insofar as those are
currently used they should be either migrated to a more survivable
situation or should be able to fail gracefully. NFS should be set up if
it's not in a way that will fail cleanly after a short timeout.
Some other configuration files, such as php.ini, and various programs
and utilities (one of the perlbals?) are also pulled off of NFS
currently. These need to generally be fixed up so that things can
continue running while home dirs are down; pushing the files out on
update as we do with the PHP scripts is probably in order.
-- brion vibber (brion @ pobox.com)