FYI
---------- Forwarded message ----------
From: Greg Grossmeier <greg(a)wikimedia.org>
Date: Mon, Feb 10, 2014 at 3:25 PM
Subject: Outage report - Feb 6th - Math
To: Development and Operations engineers <engineering(a)lists.wikimedia.org>
https://wikitech.wikimedia.org/wiki/Incident_documentation/20140206-Math
Important bits:
== Summary ==
https://gerrit.wikimedia.org/r/#/c/104991/ changed the parser cache keys
for pages with <math> in them, causing a spike in cache misses and thus
the cluster feel over.
This has been slowly rolling out on small wikis, mostly unnoticed since
math isn't widely used there. Rolling out today to larger wikis (dewiki,
etc) caused the cache stampede to be more obvious and cause downtime.
Reverting the change didn't work because of incompatibilities between
core + the extension, but was ok because we had mostly gotten through
the invalidation before the roll back.
This would've been a problem if we weren't having fatals, we would've
started invalidating to the old version again. We got lucky. Going back
to new version caused a little more invalidations, but seems reasonable
and should level off soon probably
== Conclusions ==
We really need to process through the backlog of Math extension
changesets from physikerwelt who's done great work on the extension but
is lacking review.
== Actionables ==
* wrap Math stuff in PoolCounter so it doesn't kill apaches so easily.
* More review on recent changes to Math. Be careful in rolling this
* release out further.
** PoolCounter:
https://gerrit.wikimedia.org/r/#/c/111916/
--
| Greg Grossmeier GPG: B2FA 27B1 F7EB D327 6B8E |
| identi.ca: @greg A18D 1138 8E47 FAC8 1C7D |