[Wikitech-l] Read-only on s2 wikis

12 Oct 2008


      Same old story, disk full on a core master server (ixia) caused binlogs to
stop 10 minutes before the issue was noticed and I switched it into
read-only mode. Writes continued during those 10 minutes.
I'm resyncing from the master, the s2 wikis are in read-only mode while
that happens, it seems to be taking about 1.5 hours in total.
The server was in nagios and was reporting a critical disk full status.
I'm not sure exactly when it entered that state.
I'm inclined to think that the issue here is not the need for more
technology, but rather the need for procedures. There's no point in having
 monitoring if nobody is watching the output.
If it had happened an hour later, I would have been in bed, and nobody
else was around. The users in #wikimedia-tech tell me they would have
waited for hours before trying to phone anyone. So we need out-of-hours
response procedures as well.
I think we need:
* A systems checklist to be checked daily, independently by two different
people and cross-checked weekly;
* An SMS paging system for out-of-hours response, both automated and
manual (user-driven).
-- Tim Starling

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

2003

2002

[Wikitech-l] Read-only on s2 wikis