FYI
---------- Forwarded message ---------- From: Greg Grossmeier greg@wikimedia.org Date: Mon, Feb 10, 2014 at 3:25 PM Subject: Outage report - Feb 6th - Math To: Development and Operations engineers engineering@lists.wikimedia.org
https://wikitech.wikimedia.org/wiki/Incident_documentation/20140206-Math
Important bits:
== Summary == https://gerrit.wikimedia.org/r/#/c/104991/ changed the parser cache keys for pages with <math> in them, causing a spike in cache misses and thus the cluster feel over.
This has been slowly rolling out on small wikis, mostly unnoticed since math isn't widely used there. Rolling out today to larger wikis (dewiki, etc) caused the cache stampede to be more obvious and cause downtime. Reverting the change didn't work because of incompatibilities between core + the extension, but was ok because we had mostly gotten through the invalidation before the roll back.
This would've been a problem if we weren't having fatals, we would've started invalidating to the old version again. We got lucky. Going back to new version caused a little more invalidations, but seems reasonable and should level off soon probably
== Conclusions == We really need to process through the backlog of Math extension changesets from physikerwelt who's done great work on the extension but is lacking review.
== Actionables == * wrap Math stuff in PoolCounter so it doesn't kill apaches so easily. * More review on recent changes to Math. Be careful in rolling this * release out further. ** PoolCounter: https://gerrit.wikimedia.org/r/#/c/111916/
-- | Greg Grossmeier GPG: B2FA 27B1 F7EB D327 6B8E | | identi.ca: @greg A18D 1138 8E47 FAC8 1C7D |