How’d we do in our strive for operational excellence last month? Read on to find out! Incidents
3 documented incidents last month. That's at the median for the past twelve months, and slightly below the median of 4 over the past five years (Incident stats graphs https://codepen.io/Krinkle/full/wbYMZK).
- 2021-07-14 eventgate latency spike https://wikitech.wikimedia.org/wiki/Incident_documentation/2021-07-14_eventgate-analytics_latency_spike_caused_MW_app_server_overload - Impact: For ~ 10min MediaWiki API clients experienced request failures. - 2021-07-16 codfw-a2 network https://wikitech.wikimedia.org/wiki/Incident_documentation/2021-07-16_asw-a2-codfw_network - Impact: For ~ 1 hour Restbase clients received errors, affecting mobile apps and ContentTranslation. - 2021-07-26 ruwikinews DynamicPageList https://wikitech.wikimedia.org/wiki/Incident_documentation/2021-07-26_ruwikinews_DynamicPageList - Impact: For 30min, 15% of requests from contributors on all wikis failed. There were also brief moments during which no readers could load recently modified or uncached pages.
Learn about past incidents at Incident status https://wikitech.wikimedia.org/wiki/Incident_status on Wikitech. Remember to review and schedule Incident Follow-up https://phabricator.wikimedia.org/project/view/4758/ in Phabricator, which are preventive measures and other action items filed after an incident. Trends
Last month the workboard held 154 non-old unresolved error reports. Over the past thirty days, the collective efforts of our volunteers and engineering teams have closed 14 of those.
In the month of July we've also introduced or discovered thirty-one new error reports (that's an average of one production regression every day!). Of those new error reports, fifteen were resolved and 16 remain unresolved. The workboard now tallies up to 156 tasks.
Over on the backlog, we're continuing to ploddingly present progress on production problems from phantoms of christmases past.
Figure 1, Figure 2: Unresolved error reports stacked by month. https://phabricator.wikimedia.org/phame/post/view/247/production_excellence_34_july_2021/#trends
For more month-over-month numbers refer to the spreadsheet data https://docs.google.com/spreadsheets/d/e/2PACX-1vTrUCAI10hIroYDU-i5_8s7pony8M71ATXrFRiXXV7t5-tITZYrTRLGch-3iJbmeG41ZMcj1vGfzZ70/pubhtml . Outstanding errors
Take a look at the workboard https://phabricator.wikimedia.org/tag/wikimedia-production-error/ and look for tasks that could use your help.
Below are various older issues that may have fallen by the wayside, taken from somewhat-random stab-in-the-dark queries.
Oldest unresolved errors that are still reproducible (Phab query https://phabricator.wikimedia.org/maniphest/query/07CAHhY.GApw/#R):
- Reported in 2015: Unable to view history of protected Flow board (StructuredDiscussions, Growth team), T118502 https://phabricator.wikimedia.org/T118502. - Reported in 2016: Error when deleting a heading next to a table (VisualEditor, Editing team), T140871 https://phabricator.wikimedia.org/T140871.
Stalled error reports (Phab query https://phabricator.wikimedia.org/maniphest/query/Dmy0AuERAQct/#R):
- Stalled Mar 2021: Constraints check for Q142 France times out (Wikidata, WMDE), T212282 https://phabricator.wikimedia.org/T212282.
Oldest error with a patch for review (Phab query https://phabricator.wikimedia.org/maniphest/query/eb6hYVaKr0Kx/#R):
- Reported in 2016: Maps broken during 2nd live preview (Maps, Product Infra), T151524 https://phabricator.wikimedia.org/T151524. - Reported in 2018: Corrupt connection for cross-wiki db query (Platform team), T193565 https://phabricator.wikimedia.org/T193565.
Jan 2021 (3 of 50 issues https://phabricator.wikimedia.org/maniphest/query/WIP9W8q54HB6/#R left) ⚠️ *Unchanged. Have a look-see!* Feb 2021 (6 of 20 issues https://phabricator.wikimedia.org/maniphest/query/5MzPJAb5oJgv/#R left) ⚠️ *Unchanged. Take a gander!* Mar 2021 (13 of 48 issues https://phabricator.wikimedia.org/maniphest/query/RsVPep46KRY4/#R left) ⚠️ *Unchanged. Check it out!* Apr 2021 (18 of 42 issues https://phabricator.wikimedia.org/maniphest/query/rYyMt_gYYymb/#R left) -1 May 2021 (22 of 54 issues https://phabricator.wikimedia.org/maniphest/query/tmkGqt0C93YG/#R left) -3 June 2021 (11 of 26 issues https://phabricator.wikimedia.org/maniphest/query/roL0TaxtcaLQ/#R left) -4 July 2021 (16 of 31 issues https://phabricator.wikimedia.org/maniphest/query/mUVAD8TJHE3n/#R left) +31; -15 Tally 154 issues open, as of Excellence #33 (June 2021) https://phabricator.wikimedia.org/phame/post/view/240/production_excellence_33_june_2021/ . -14 issues closed, of the previous 154 open issues. +16 new issues that survived July 2021. 156 issues open, as of today Thanks!
Thank you to everyone who helped by reporting, investigating, or resolving problems in Wikimedia production. Thanks!
Until next time,
– Timo Tijhof
🔗 Share or read later via https://phabricator.wikimedia.org/phame/post/view/247/
wikitech-l@lists.wikimedia.org