On Tue, Jul 12, 2016 at 4:07 PM, Greg Grossmeier greg@wikimedia.org wrote:
<quote name="Greg Grossmeier" date="2016-07-12" time="09:24:38 -0700"> > https://phabricator.wikimedia.org/T119736 - "Could not find local user data for {Username}@{wiki}" > > There was an order of magnitude increase in the rate of those errors > that started on July 7th. > > Investigation and remediation is on-going.
Investigation and remediation is mostly complete[0] and the vast majority of cases have been addressed. There are still users who will experience this error for the next ~1 day.[1]
Is it actually fixed? It doesn't look like it, from the logs.
Since midnight UTC on July 7, 3,195 distinct users have tried and failed to log in a combined total of 25,047 times, or an average of approximately eight times per user. The six days that have passed since then were business as usual for the Wikimedia Engineering.
Our failure to react to this swiftly and comprehensively is appalling and embarrassing. It represents failure of process at multiple levels and a lack of accountability.
I think we need to have a serious discussion about what happened, and think very hard about the changes we would need to make to our processes and organizational structure to prevent a recurrence.
I think we should also reach out to the users that were affected and apologize.