Le 07/05/2014 17:52, Greg Grossmeier a écrit :
<quote name="Antoine Musso" date="2014-05-07" time="11:17:48 +0200"> > Le 06/05/2014 18:56, Rob Lanphier a écrit : >> Yup, thanks folks! >> >> This was probably enough of an outage that we should write up a brief >> postmortem for this. Who feels like they understand the situation >> well enough to do this? > > Ccing Faidon who provided the fix. > > This is rather long and is public since mediawiki-core list is publicly > available. We might want a shorter postmortem on the wiki.
Decided to put the whole thing on-wiki: https://wikitech.wikimedia.org/wiki/Incident_documentation/20140503-Thumbnai...
Can someone take a stab at creating bugs/RT tickets for the things in "What can be improved" section? I'll try to later but either Antoine or Faidon might be able to give better context.
Hello,
I have filled two bugs in Bugzilla:
Bug 65477 - User::pingLimiter should have per action profiling https://bugzilla.wikimedia.org/show_bug.cgi?id=65477
Trivially solved by https://gerrit.wikimedia.org/r/134067
Bug 65478 - Graph User::pingLimiter() actions in gdash https://bugzilla.wikimedia.org/show_bug.cgi?id=65478
I have no real clue how to set a gdash dashboard :-(
Will still have to figure out how to emit alarms when a specific action is being throttled out of normal.
cheers,