Re: [Wikitech-l] 2006-04-21T02:30 drastic slowdown postmortem

23 Apr 2006


      Tim Starling wrote:
...
The only person who lives near the servers is Kyle, and he was hired as a
hardware tech for that exact reason. The servers all have their clocks set
to UTC, and we use UTC in logs and communications.
That would be the standard practice, but....
As noted in the message itself, I used the time listed in the log on the
external monitoring site, http://www.thewritingpot.com/wikistatus/ garnered
from the recent message of garion1000@gmail.com, and named the subject line
accordingly.  Later, I also included the words:
# There's no corresponding admin log for 2:30 UTC....
Sorry that you found the subject line misleading, I'll try to do better in
the future....
...
William Allen Simpson wrote:
...
The time was 02:30+ UTC.
Ah, well in that case, I can tell you exactly what happened. The hero of the
day was Zsinj, a canny newbie who had his eye on the relevant monitoring
graphs, and alerted us to the problem immediately, using very specific
terms, allowing us to track down and fix it rapidly.
Log extract from #wikimedia-tech follows, times are UTC+10.
Ah, another place that doesn't use UTC logs....
Anyway, thank you for following up.  I still don't understand how the SQL
spike affected network performance, and have not yet found the switch and
router graph with IP subnet assignments.
And thanks to my questions, more of us know where to find the graphs!

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

2003

2002

Re: [Wikitech-l] 2006-04-21T02:30 drastic slowdown postmortem