Re: [Wikitech-l] Christmas server failure report

26 Dec 2010


      Ryan Lane wrote a script to purge some of the Flaged Rev memcached
entries; that ran last night as well.
The DOM-related errors all seem to have come from srv227; apache on that
host was restarted about half an hour ago and the results look good.
Ariel
Στις 26-12-2010, ημέρα Κυρ, και ώρα 01:49 +0100, ο/η Platonides έγραψε:
...
Earlier today, /a filled with binlogs in db27, which was s3 & s7 master.
nagios had warned too early / nobody noticed. Slaves lagged, lots of
locks, the wikis got to a halt.
Revisions between 6:50 and 8:20 pm UTC were lost (although they can be
manually reimported from db27).
The new s3 and s7 master is db17, with only one slave: db25.
After the master switch, we started having problems due to cached
revision text in memcached, due to the duplication of old_id values,
so we made them read-only until UTC midnight.
We decided not to disable $wgRevisionCacheExpiry but to remove the
faulty entries, thus I quickly prepared the script
maintenance/purgeStaleMemcachedText.php to clean them.
There were problems in hewiki, since data there didn't clean. On one
instance doing $wgMemc->get persisted even after a $wgMemc->delete on
that same key (???).
Other than the hewiki issues, it seemed to run fine. There will be lots
of wrong entries in diff and parser cache needing a manual action=purge
but a purge will clean them.
Flagged revs caches were not touched. Wikis using it may show the wrong
content (with the additional fun of some users viewing the right one).
There are also PPFrame_DOM->expand errors that started around the same
time, even on wikis on a different cluster. They usually only happen
once, and it succeeds just reloading.
https://bugzilla.wikimedia.org/show_bug.cgi?id=26429

Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

2003

2002

Re: [Wikitech-l] Christmas server failure report