On Tue, Jul 12, 2016 at 7:56 PM, Ori Livneh <ori(a)wikimedia.org> wrote:
On Tue, Jul 12, 2016 at 4:07 PM, Greg Grossmeier
<greg(a)wikimedia.org>
wrote:
<quote name="Greg Grossmeier"
date="2016-07-12" time="09:24:38 -0700">
data for {Username}@{wiki}"
There was an order of magnitude increase in the rate of those errors
that started on July 7th.
Investigation and remediation is on-going.
Investigation and remediation is mostly complete[0] and the vast
majority of cases have been addressed. There are still users who will
experience this error for the next ~1 day.[1]
Is it actually fixed? It doesn't look like it, from the logs.
Since midnight UTC on July 7, 3,195 distinct users have tried and failed
to log in a combined total of 25,047 times, or an average of approximately
eight times per user. The six days that have passed since then were
business as usual for the Wikimedia Engineering.
Our failure to react to this swiftly and comprehensively is appalling and
embarrassing. It represents failure of process at multiple levels and a
lack of accountability.
This (unbreak now) bug has been open since November. I wonder how this has
been allowed to remain open and not addressed for this long?
A new user ran into this issue in June at an editathon that I attended. In
his case, I could fix the problem by manually deleting the offending row in
the database, but most of the time, the user likely gives up :(
I think we need to have a serious discussion about what happened, and
think very hard about the changes we would need to make to our processes
and organizational structure to prevent a recurrence.
I think we should also reach out to the users that were affected and
apologize.
+1
Cheers,
Katie
_______________________________________________
Ops mailing list
Ops(a)lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/ops
--
@wikidata