How’d we do in our strive for operational excellence last month? Read on to find out!
Read on Phabricator at https://phabricator.wikimedia.org/phame/post/view/227
📈 Incidents
1 documented incident last month. That's the third month in a row that we are at or near zero major incidents – not bad! [1] [2]
Learn about recent incidents at Incident status https://wikitech.wikimedia.org/wiki/Incident_status on Wikitech, or Preventive measures https://phabricator.wikimedia.org/project/view/4758/ in Phabric ator. 💡 *Did you know: Our Incident status https://wikitech.wikimedia.org/wiki/Incident_status page provides a green-yellow status reflection over the past ten days, with a link to the most recent incident doc if there was any during that time.*
-------
📊 Trends
This January saw a small recovery in our otherwise negative upward trend. For the first time in twelve month more reports were closed than new reports having outlived the previous month without resolution. What happened twelve months ago? That January 2020, which also saw a small recovery during the otherwise upward trend before and after it.
Perhaps its something about the post-December holidays that temporarily improves the quality and/or reduces the quantity — of code changes. Only time will tell if this is the start of a new trend, or merely a post-holiday dip. [3]
While our month-to-month trend might not (yet) be improving, we do see persistent improvements in our overall backlog of pre-2019 reports. This is in part because generally don't file new reports there, so it makes sense that it doesn't go up, but it's still good to see downward progress every month, unlike with reports from more recent months which often see no change month-to-month (see "Outstanding errors" below, for example).
This positive trend on our "Old" backlog started in October 2020 and has consistently progressed every month since then (refer to the "Old" numbers in red on the below chart, or the same column in the spreadsheet https://docs.google.com/spreadsheets/d/1tRCh8aB0UYyLlhftvcHvhWH4-e7cF5V01XvRObTVgUI/edit?usp=sharing. [3][4] Figure 1, Figure 2: Unresolved error reports stacked by month. https://phabricator.wikimedia.org/phame/post/view/227/production_excellence_28_january_2021/#trends
------- 📖 Outstanding errors
Summary over recent months:
- ⚠️ July 2019 (2 of 18 issues left): *no change*. - ⚠️ August 2019 (1 of 14 issues): *no change*. - ✅ September 2019 (0 of 12 issues): Last two tasks were resolved (-2). - ⚠️ October 2019 (4 of 12 issues): One task resolved (-1). - ⚠️ November 2019 (1 of 5 issues): *no change*. - ⚠️ December 2019 (2 of 9 issues), Two tasks resolved (-2). - ⚠️ January 2020 (2 of 7 issues), no change. - ⚠️ February 2020 (1 of 7 issues left), One task resolved (-1). - March 2020 (2 of 2 issues left), no change. - April 2020 (9 of 14 issues left): *no change*. - May 2020 (6 of 14 issues left): One task resolved (-1). - June 2020 (7 of 14 issues left): *no change*. - July 2020 (9 of 24 new issues https://phabricator.wikimedia.org/maniphest/query/s__D8Kd0xuQH/#R): *no change*. - August 2020 (22 of 53 new issues https://phabricator.wikimedia.org/maniphest/query/hu1yhWu4sXkP/#R): One task resolved (-1). - September 2020 (13 of 33 new issues https://phabricator.wikimedia.org/maniphest/query/CGFQViLShnOY/#R): One task resolved (-1). - October 2020 (31 of 69 new issues https://phabricator.wikimedia.org/maniphest/query/MYnnBybPTYpd/#R): Four tasks fixed (-4). - November 2020 (14 of 38 new issues https://phabricator.wikimedia.org/maniphest/query/CkC_VqQq5VC0/#R): *no change*. - December 2020 (19 of 33 new issues https://phabricator.wikimedia.org/maniphest/query/10NQy74iKaZJ/#R) Three tasks resolved (-3) - *January 2021*: 7 of 50 new issues https://phabricator.wikimedia.org/maniphest/query/WIP9W8q54HB6/#R survived the month and remained unresolved (+50; -43)
Recent tally 160 issues open, as of Excellence #27 https://phabricator.wikimedia.org/phame/post/view/219/production_excellence_27_december_2020/ (4 Feb 2021). -15 issues closed since, of the previous 160 open issues. +7 new issues that survived January 2021. 152 issues open, as of today (16 Feb 2021).
January saw +50 new production errors reported in a single month, which is an unfortunate all-time high. However, we've also done remarkably well on addressing 43 of them within a month, when the potential root cause and diagnostics data are still fresh in our minds. Well done!
For the on-going month of February, there have been 16 new issues https://phabricator.wikimedia.org/maniphest/query/xjFr73QLJYlE/#R reported so far.
Take a look at the workboard https://phabricator.wikimedia.org/tag/wikimedia-production-error/ and look for tasks that could use your help!
-------
🎉 Thanks!
Thank you to everyone else who helped by reporting, investigating, or resolving problems in Wikimedia production. Thanks!
Until next time,
– Timo Tijhof
-------
Footnotes:
[1] Incident status Wikitech https://wikitech.wikimedia.org/wiki/Incident_status. [2] Wikimedia incident stats by Krinkle, CodePen https://codepen.io/Krinkle/full/wbYMZK. [3] Month-over-month, Production Excellence spreadsheet https://docs.google.com/spreadsheets/d/1tRCh8aB0UYyLlhftvcHvhWH4-e7cF5V01XvRObTVgUI/edit . [4] Open tasks, Wikimedia-prod-error, Phabricator https://phabricator.wikimedia.org/maniphest/query/Fw3RdXt1Sdxp/#R.