How’d we do in our strive for operational excellence last month? Read on to find out!
Read on Phabricator at https://phabricator.wikimedia.org/phame/post/view/236/ Incidents
Zero incidents recorded in the past month. Yay! That's only five months after November 2020, the last month without documented incidents (Incident stats https://codepen.io/Krinkle/full/wbYMZK).
Remember to review Preventive measures https://phabricator.wikimedia.org/project/view/4758/ in Phabricator, which are action items filed after an incident.
------- Trends
In May, we unfortunately saw a repeat of the worrying pattern we saw in April https://phabricator.wikimedia.org/phame/post/view/235/production_excellence_31_april_2021/#trends, but with higher numbers. We found 54 new errors. This is the most new errors in a single month, since the Excellence monthly began three years ago in 2018. About half of these (29 of 54) remain unresolved as of writing, two weeks into the following month.
Figure 1, Figure 2: Unresolved error reports stacked by month. https://phabricator.wikimedia.org/phame/post/view/236/production_excellence_32_may_2021/#trends
Month-over-month plots based on spreadsheet data https://docs.google.com/spreadsheets/d/e/2PACX-1vTrUCAI10hIroYDU-i5_8s7pony8M71ATXrFRiXXV7t5-tITZYrTRLGch-3iJbmeG41ZMcj1vGfzZ70/pubhtml .
------- New errors in May
Below is a snapshot of just the 54 new issues https://phabricator.wikimedia.org/maniphest/query/tmkGqt0C93YG/#R found last month, listed by their code steward https://www.mediawiki.org/wiki/Developers/Maintainers.
Be mindful that the reporting of errors is not itself a negative point per-se. I think it should be celebrated when teams have good telemetry, detect their issues early, and address them within their development cycle. It might be more worrisome when teams lack telemetry or time to find such issues, or can't keep up with the pace at which issues are found. Anti Harassment Tools None. Community Tech None. Editing Team +2, -1 Cite (T283755 https://phabricator.wikimedia.org/T283755); OOUI (T282176 https://phabricator.wikimedia.org/T282176). Growth Team +17, -4 Add-Link (T281960 https://phabricator.wikimedia.org/T281960); GrowthExperiments (T281525 https://phabricator.wikimedia.org/T281525 T281703 https://phabricator.wikimedia.org/T281703 T283546 https://phabricator.wikimedia.org/T283546 T283638 https://phabricator.wikimedia.org/T283638 T283924 https://phabricator.wikimedia.org/T283924); Echo (T282446 https://phabricator.wikimedia.org/T282446); Recent-changes (T282047 https://phabricator.wikimedia.org/T282047 T282726 https://phabricator.wikimedia.org/T282726); StructuredDiscussions (T281521 https://phabricator.wikimedia.org/T281521 T281523 https://phabricator.wikimedia.org/T281523 T281782 https://phabricator.wikimedia.org/T281782 T281784 https://phabricator.wikimedia.org/T281784 T282069 https://phabricator.wikimedia.org/T282069 T282146 https://phabricator.wikimedia.org/T282146 T282599 https://phabricator.wikimedia.org/T282599 T282605 https://phabricator.wikimedia.org/T282605). Language Team +1 Translate extension (T283828 https://phabricator.wikimedia.org/T283828). Parsing Team +1 Parsoid (T281932 https://phabricator.wikimedia.org/T281932 ). Reading Web None. Structured Data None. Product Infra Team +1 WikimediaEvents (T282580 https://phabricator.wikimedia.org/T282580). Analytics None. Performance Team None. Platform Engineering +16, -11 MediaWiki-API (T282122 https://phabricator.wikimedia.org/T282122); MediaWiki-General (T282173 https://phabricator.wikimedia.org/T282173); MediaWiki-Page-derived-data ( T281714 https://phabricator.wikimedia.org/T281714 T281802 https://phabricator.wikimedia.org/T281802 T282180 https://phabricator.wikimedia.org/T282180 T283282 https://phabricator.wikimedia.org/T283282), MediaWiki-Revision-backend ( T282145 https://phabricator.wikimedia.org/T282145 T282723 https://phabricator.wikimedia.org/T282723 T282825 https://phabricator.wikimedia.org/T282825 T283170 https://phabricator.wikimedia.org/T283170); MediaWiki-User-management ( T283167 https://phabricator.wikimedia.org/T283167); MW Expedition (T281526 https://phabricator.wikimedia.org/T281526 T281981 https://phabricator.wikimedia.org/T281981 T282038 https://phabricator.wikimedia.org/T282038 T282181 https://phabricator.wikimedia.org/T282181 T283196 https://phabricator.wikimedia.org/T283196). Search Platform +3, -2 CirrusSearch (T282036 https://phabricator.wikimedia.org/T282036 T282207 https://phabricator.wikimedia.org/T282207); GeoData (T282735 https://phabricator.wikimedia.org/T282735). WMDE TechWish +2, -1 Revision-Slider (T282067 https://phabricator.wikimedia.org/T282067); VisualEditor Template dialog ( T283511 https://phabricator.wikimedia.org/T283511). WMDE Wikidata +3, -1 Wikibase (T282534 https://phabricator.wikimedia.org/T282534 T283198 https://phabricator.wikimedia.org/T283198 T283862 https://phabricator.wikimedia.org/T283862). No owner +7, -6 CentralAuth (T282834 https://phabricator.wikimedia.org/T282834 T283635 https://phabricator.wikimedia.org/T283635); Change-tagging (T283098 https://phabricator.wikimedia.org/T283098 T283099 https://phabricator.wikimedia.org/T283099); MapSources (T282833 https://phabricator.wikimedia.org/T282833); MediaWiki-Page-information ( T283751 https://phabricator.wikimedia.org/T283751); Other (T283252 https://phabricator.wikimedia.org/T283252). -------
Outstanding errors
Take a look at the workboard and look for tasks that could use your help. → https://phabricator.wikimedia.org/tag/wikimedia-production-error/
Summary over recent months: Aug 2019 (0 of 14 left) ✅ Last task resolved! -1 Jan 2020 (1 of 7 left) ⚠️ Unchanged (over one year old). Mar 2020 (2 of 2 left) ⚠️ Unchanged (over one year old). Apr 2020 (4 of 14 left) ⬇️ One task resolved. -1 May 2020 (5 of 14 left) ⚠️ Unchanged (over one year old). Jun 2020 (5 of 14 left) ⚠️ Unchanged (over one year old). Jul 2020 (4 of 24 issues) ⏸ — Aug 2020 (12 of 53 issues) ⬇️ One task resolved. -1 Sep 2020 (7 of 33 issues) ⏸ — Oct 2020 (19 of 69 issues) ⬇️ One task resolved. -1 Nov 2020 (8 of 38 issues) ⬇️ One task resolved. -1 Dec 2020 (7 of 33 issues) ⏸ — Jan 2021 (3 of 50 issues https://phabricator.wikimedia.org/maniphest/query/WIP9W8q54HB6/#R) ⏸ — Feb 2021 (7 of 20 issues https://phabricator.wikimedia.org/maniphest/query/5MzPJAb5oJgv/#R) ⬇️ One task resolved. -1 Mar 2021 (14 of 48 issues https://phabricator.wikimedia.org/maniphest/query/RsVPep46KRY4/#R) ⬇️ Four tasks resolved. -4 Apr 2021 (23 of 42 issues https://phabricator.wikimedia.org/maniphest/query/rYyMt_gYYymb/#R) ⬇️ Two tasks resolved. -2 *May 2021* (29 of 54 issues https://phabricator.wikimedia.org/maniphest/query/tmkGqt0C93YG/#R) 54 new issues found, of which 29 remain open. +54; -25
------- Tally 133 issues open, as of Excellence #31 https://phabricator.wikimedia.org/phame/post/view/235/production_excellence_31_april_2021/ (12 May 2021). -12 issues closed, of the previous 133 open issues. +29 new issues that survived May 2021. 150 issues open, as of today (12 June 2021).
------- Thanks!
Thank you to everyone who helped by reporting, investigating, or resolving problems in Wikimedia production. Thanks!
Until next time,
– Timo Tijhof
Thanks as always for this report, Timo.
One reason the count is higher in May is because that's when the Growth team began implementing a chores process https://www.mediawiki.org/wiki/Growth/Team/Chores (credit to Readers Web for the inspiration https://www.mediawiki.org/wiki/Readers/Web/Chores) to systematically review and log production errors that appear on our team dashboard https://logstash.wikimedia.org/app/kibana#/dashboard/AWl4jlZ78aQffZ3Ho7BV in Logstash. (We've also implemented a triage process for our inbox https://phabricator.wikimedia.org/project/board/1114/, which used to have ~2000 tasks and is now at 10.) Some of the tasks we've filed from Logstash are probably duplicates or close relatives of existing production error tasks, but because we are trying to timebox our triage process, we don't always succeed in ensuring that we identify existing tasks before filing new ones.
A bigger problem is how to handle our growing pile of tasks that need some attention; as a team that's tasked with feature development, making time to work on maintenance tasks unrelated to the code we touch day-to-day is a challenge. So, while we are going to be more diligent about filing tasks when we see issues in Logstash, unless something appears to be badly broken, it is probably going to stay as an open task.
Kosta
On Mon, Jun 21, 2021 at 4:55 AM Krinkle krinklemail@gmail.com wrote:
How’d we do in our strive for operational excellence last month? Read on to find out!
Read on Phabricator at https://phabricator.wikimedia.org/phame/post/view/236/ Incidents
Zero incidents recorded in the past month. Yay! That's only five months after November 2020, the last month without documented incidents (Incident stats https://codepen.io/Krinkle/full/wbYMZK).
Remember to review Preventive measures https://phabricator.wikimedia.org/project/view/4758/ in Phabricator, which are action items filed after an incident.
Trends
In May, we unfortunately saw a repeat of the worrying pattern we saw in April https://phabricator.wikimedia.org/phame/post/view/235/production_excellence_31_april_2021/#trends, but with higher numbers. We found 54 new errors. This is the most new errors in a single month, since the Excellence monthly began three years ago in 2018. About half of these (29 of 54) remain unresolved as of writing, two weeks into the following month.
Figure 1, Figure 2: Unresolved error reports stacked by month. https://phabricator.wikimedia.org/phame/post/view/236/production_excellence_32_may_2021/#trends
Month-over-month plots based on spreadsheet data https://docs.google.com/spreadsheets/d/e/2PACX-1vTrUCAI10hIroYDU-i5_8s7pony8M71ATXrFRiXXV7t5-tITZYrTRLGch-3iJbmeG41ZMcj1vGfzZ70/pubhtml .
New errors in May
Below is a snapshot of just the 54 new issues https://phabricator.wikimedia.org/maniphest/query/tmkGqt0C93YG/#R found last month, listed by their code steward https://www.mediawiki.org/wiki/Developers/Maintainers.
Be mindful that the reporting of errors is not itself a negative point per-se. I think it should be celebrated when teams have good telemetry, detect their issues early, and address them within their development cycle. It might be more worrisome when teams lack telemetry or time to find such issues, or can't keep up with the pace at which issues are found. Anti Harassment Tools None. Community Tech None. Editing Team +2, -1 Cite (T283755 https://phabricator.wikimedia.org/T283755); OOUI (T282176 https://phabricator.wikimedia.org/T282176). Growth Team +17, -4 Add-Link (T281960 https://phabricator.wikimedia.org/T281960); GrowthExperiments (T281525 https://phabricator.wikimedia.org/T281525 T281703 https://phabricator.wikimedia.org/T281703 T283546 https://phabricator.wikimedia.org/T283546 T283638 https://phabricator.wikimedia.org/T283638 T283924 https://phabricator.wikimedia.org/T283924); Echo (T282446 https://phabricator.wikimedia.org/T282446); Recent-changes (T282047 https://phabricator.wikimedia.org/T282047 T282726 https://phabricator.wikimedia.org/T282726); StructuredDiscussions ( T281521 https://phabricator.wikimedia.org/T281521 T281523 https://phabricator.wikimedia.org/T281523 T281782 https://phabricator.wikimedia.org/T281782 T281784 https://phabricator.wikimedia.org/T281784 T282069 https://phabricator.wikimedia.org/T282069 T282146 https://phabricator.wikimedia.org/T282146 T282599 https://phabricator.wikimedia.org/T282599 T282605 https://phabricator.wikimedia.org/T282605). Language Team +1 Translate extension (T283828 https://phabricator.wikimedia.org/T283828). Parsing Team +1 Parsoid (T281932 https://phabricator.wikimedia.org/T281932). Reading Web None. Structured Data None. Product Infra Team +1 WikimediaEvents (T282580 https://phabricator.wikimedia.org/T282580). Analytics None. Performance Team None. Platform Engineering +16, -11 MediaWiki-API (T282122 https://phabricator.wikimedia.org/T282122); MediaWiki-General (T282173 https://phabricator.wikimedia.org/T282173); MediaWiki-Page-derived-data (T281714 https://phabricator.wikimedia.org/T281714 T281802 https://phabricator.wikimedia.org/T281802 T282180 https://phabricator.wikimedia.org/T282180 T283282 https://phabricator.wikimedia.org/T283282), MediaWiki-Revision-backend ( T282145 https://phabricator.wikimedia.org/T282145 T282723 https://phabricator.wikimedia.org/T282723 T282825 https://phabricator.wikimedia.org/T282825 T283170 https://phabricator.wikimedia.org/T283170); MediaWiki-User-management ( T283167 https://phabricator.wikimedia.org/T283167); MW Expedition ( T281526 https://phabricator.wikimedia.org/T281526 T281981 https://phabricator.wikimedia.org/T281981 T282038 https://phabricator.wikimedia.org/T282038 T282181 https://phabricator.wikimedia.org/T282181 T283196 https://phabricator.wikimedia.org/T283196). Search Platform +3, -2 CirrusSearch (T282036 https://phabricator.wikimedia.org/T282036 T282207 https://phabricator.wikimedia.org/T282207); GeoData (T282735 https://phabricator.wikimedia.org/T282735). WMDE TechWish +2, -1 Revision-Slider (T282067 https://phabricator.wikimedia.org/T282067); VisualEditor Template dialog (T283511 https://phabricator.wikimedia.org/T283511). WMDE Wikidata +3, -1 Wikibase (T282534 https://phabricator.wikimedia.org/T282534 T283198 https://phabricator.wikimedia.org/T283198 T283862 https://phabricator.wikimedia.org/T283862). No owner +7, -6 CentralAuth (T282834 https://phabricator.wikimedia.org/T282834 T283635 https://phabricator.wikimedia.org/T283635); Change-tagging (T283098 https://phabricator.wikimedia.org/T283098 T283099 https://phabricator.wikimedia.org/T283099); MapSources (T282833 https://phabricator.wikimedia.org/T282833); MediaWiki-Page-information ( T283751 https://phabricator.wikimedia.org/T283751); Other (T283252
https://phabricator.wikimedia.org/T283252).
Outstanding errors
Take a look at the workboard and look for tasks that could use your help. → https://phabricator.wikimedia.org/tag/wikimedia-production-error/
Summary over recent months: Aug 2019 (0 of 14 left) ✅ Last task resolved! -1 Jan 2020 (1 of 7 left) ⚠️ Unchanged (over one year old). Mar 2020 (2 of 2 left) ⚠️ Unchanged (over one year old). Apr 2020 (4 of 14 left) ⬇️ One task resolved. -1 May 2020 (5 of 14 left) ⚠️ Unchanged (over one year old). Jun 2020 (5 of 14 left) ⚠️ Unchanged (over one year old). Jul 2020 (4 of 24 issues) ⏸ — Aug 2020 (12 of 53 issues) ⬇️ One task resolved. -1 Sep 2020 (7 of 33 issues) ⏸ — Oct 2020 (19 of 69 issues) ⬇️ One task resolved. -1 Nov 2020 (8 of 38 issues) ⬇️ One task resolved. -1 Dec 2020 (7 of 33 issues) ⏸ — Jan 2021 (3 of 50 issues https://phabricator.wikimedia.org/maniphest/query/WIP9W8q54HB6/#R) ⏸ — Feb 2021 (7 of 20 issues https://phabricator.wikimedia.org/maniphest/query/5MzPJAb5oJgv/#R) ⬇️ One task resolved. -1 Mar 2021 (14 of 48 issues https://phabricator.wikimedia.org/maniphest/query/RsVPep46KRY4/#R) ⬇️ Four tasks resolved. -4 Apr 2021 (23 of 42 issues https://phabricator.wikimedia.org/maniphest/query/rYyMt_gYYymb/#R) ⬇️ Two tasks resolved. -2 *May 2021* (29 of 54 issues https://phabricator.wikimedia.org/maniphest/query/tmkGqt0C93YG/#R) 54 new issues found, of which 29 remain open. +54; -25
Tally 133 issues open, as of Excellence #31 https://phabricator.wikimedia.org/phame/post/view/235/production_excellence_31_april_2021/ (12 May 2021). -12 issues closed, of the previous 133 open issues. +29 new issues that survived May 2021. 150 issues open, as of today (12 June 2021).
Thanks!
Thank you to everyone who helped by reporting, investigating, or resolving problems in Wikimedia production. Thanks!
Until next time,
– Timo Tijhof _______________________________________________ Wikitech-l mailing list -- wikitech-l@lists.wikimedia.org To unsubscribe send an email to wikitech-l-leave@lists.wikimedia.org https://lists.wikimedia.org/postorius/lists/wikitech-l.lists.wikimedia.org/
wikitech-l@lists.wikimedia.org