Tim Starling wrote:
The only person who lives near the servers is Kyle, and he was hired as a hardware tech for that exact reason. The servers all have their clocks set to UTC, and we use UTC in logs and communications.
That would be the standard practice, but....
As noted in the message itself, I used the time listed in the log on the external monitoring site, http://www.thewritingpot.com/wikistatus/ garnered from the recent message of garion1000@gmail.com, and named the subject line accordingly. Later, I also included the words:
# There's no corresponding admin log for 2:30 UTC....
Sorry that you found the subject line misleading, I'll try to do better in the future....
William Allen Simpson wrote:
The time was 02:30+ UTC.
Ah, well in that case, I can tell you exactly what happened. The hero of the day was Zsinj, a canny newbie who had his eye on the relevant monitoring graphs, and alerted us to the problem immediately, using very specific terms, allowing us to track down and fix it rapidly.
Log extract from #wikimedia-tech follows, times are UTC+10.
Ah, another place that doesn't use UTC logs....
Anyway, thank you for following up. I still don't understand how the SQL spike affected network performance, and have not yet found the switch and router graph with IP subnet assignments.
And thanks to my questions, more of us know where to find the graphs!