How’d we do in our strive for operational excellence last month? Read on to find out!
Incidents
One documented incident last month ().
Impact: A portion of outgoing email from wikimedia.org was delivered
with a delay of upto 24 hours. This affected staff Gmail, and
Znuny/Phabricator notifications. No mail was lost, it was eventually
delivered.
Incident follow-up
Remember to review and schedule in Phabricator. These are preventive measures and tech debt mitigations
written down after an incident. Read about past incidents at on Wikitech.
Recently resolved incident follow-up:
.
Filed in December after the mail delivery incident, resolved later that month by Keith (Herron).
.
Filed in December after various incidents due to high DB/appserver load, carried out by Amir (Ladsgroup).
Trends
In December we reported ,
of which 5 have since been resolved, and 17 remain open and have
carried over to January. From the 298 issues previously carried over,
we also resolved 17, thus the workboard still adds up to 298 in total.
In previous editions, we sometimes looked at the breakdown of tasks
that remained unresolved. This time, I'd like to draw attention to the
throughput and distribution of tasks that did get resolved.
Production errors resolved in the month of December, by team and component ():
For the month-over-month numbers, refer to the .
Outstanding errors
Oldest unresolved errors:
Take a look at the workboard and look for tasks that could use your help.
Thanks!
Thank you to everyone who helped by reporting, investigating, or resolving problems in Wikimedia production. Thanks!
Until next time,
– Timo Tijhof