<quote name="Ori Livneh" date="2016-07-12" time="16:56:11
-0700">
On Tue, Jul 12, 2016 at 4:07 PM, Greg Grossmeier
<greg(a)wikimedia.org> wrote:
<quote name="Greg Grossmeier"
date="2016-07-12" time="09:24:38 -0700">
data for {Username}@{wiki}"
There was an order of magnitude increase in the rate of those errors
that started on July 7th.
Investigation and remediation is on-going.
Investigation and remediation is mostly complete[0] and the vast
majority of cases have been addressed. There are still users who will
experience this error for the next ~1 day.[1]
Is it actually fixed? It doesn't look like it, from the logs.
That was the information I was given. If it is not improved after the
fixes and letting the maint script finish then we'll know more
certainly, and with that certainty can modify our plans (as we always
do).
Our failure to react to this swiftly and
comprehensively is appalling and
embarrassing. It represents failure of process at multiple levels and a
lack of accountability.
Matt is working on an incident report for this.
I think we should also reach out to the users that
were affected and
apologize.
That certainly should/could be one of the action items.
Greg
--
| Greg Grossmeier GPG: B2FA 27B1 F7EB D327 6B8E |
| identi.ca: @greg A18D 1138 8E47 FAC8 1C7D |