[Wikitech-l] Re: SPOF notes

19 Sep 2005


      Austin Hair wrote:
...
...
However we are reading a few bits off of zwinger's NFS (some block lists
etc, some lock files) and sometimes writing (logs). Insofar as those are
currently used they should be either migrated to a more survivable
situation or should be able to fail gracefully. NFS should be set up if
it's not in a way that will fail cleanly after a short timeout.
Linux mount option "soft" will cause an I/O error to be returned after
a "major timeout," the definition of which varies.  "intr" in
combination with "hard" will allow the program to respond to signals,
which is in most cases preferable to having an uninterruptable process
sitting there until reboot.
We mount NFS with soft and timeo=14. I imagine retrans is at its default
value of 3, so if I understand the manual correctly, that gives a major
timeout of 9.8 seconds. That would be consistent with what we saw in the
crash -- most apps don't seem to abort when they get one of these
timeouts, they just treat it as an ordinary read error and continue
their execution. It's not surprising that everything locked up,
including root logins.
What about using a detachable filesystem like Coda, or a spare NFS
server with automatic failover?
-- Tim Starling

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

2003

2002

[Wikitech-l] Re: SPOF notes