The 1.38.0-wmf.26 version of MediaWiki is blocked[0].
The new version is deployed to group(s)0[1], but can proceed no
further until these issues are resolved:
* Wikimedia\Rdbms\DBUnexpectedError:
MWExceptionHandler::rollbackPrimaryChangesAndLog: Database is owned by ID
'1923155107' (got '') - https://phabricator.wikimedia.org/T303885
* Wikimedia\Rdbms\DBTransactionError: Transaction round stage must be
'cursory'
(not 'within-rollback-session') - https://phabricator.wikimedia.org/T303885
Once these issues are resolved, train can resume. If these issues are
resolved on a Friday the train will resume Monday.
Thank you for your help resolving these issues!
-- Your humble train toiler
[0]. https://phabricator.wikimedia.org/T300202
[1]. <https://versions.toolforge.org/>
--
Jeena Huneidi
Software Engineer, Release Engineering
Wikimedia Foundation
The 1.38.0-wmf.26 version of MediaWiki is blocked[0].
The new version is deployed to group(s)0[1], but can proceed no
further until these issues are resolved:
* Wikimedia\Rdbms\DBUnexpectedError:
MWExceptionHandler::rollbackPrimaryChangesAndLog: Database is owned by ID
'1923155107' (got '') - https://phabricator.wikimedia.org/T303885
* Wikimedia\Rdbms\DBTransactionError: Transaction round stage must be 'cursory'
(not 'within-rollback-session') - https://phabricator.wikimedia.org/T303885
Once these issues are resolved train can resume. If these issues are
resolved on a Friday the train will resume Monday.
Thank you for your help resolving these issues!
-- Your humble train toiler
[0]. https://phabricator.wikimedia.org/T300202
[1]. <https://versions.toolforge.org/>
--
Jeena Huneidi
Software Engineer, Release Engineering
Wikimedia Foundation
How’d we do in our strive for operational excellence last month? Read on to find out!
Incidents
3 documented incidents last month.
2022-02-01 ulsfo network <https://wikitech.wikimedia.org/wiki/Incident_documentation/2022-02-01_ulsfo…>
Impact: For 3 minutes, clients served by the ulsfo POP were not able to contribute or display un-cached pages.
2022-02-22 wdqs updater codfw <https://wikitech.wikimedia.org/wiki/Incident_documentation/2022-02-22_wdqs_…>
Impact: For 2 hours, WDQS updates failed to be processed. Most bots and tools were unable to edit Wikidata during this time.
2022-02-22 vrts <https://wikitech.wikimedia.org/wiki/Incident_documentation/2022-02-22_vrts>
Impact: For 12 hours, incoming emails to a specific recently created VRTS queue were not processed with senders receiving a bounce with an SMTP 550 Error.
See also Incident graphs <https://codepen.io/Krinkle/full/wbYMZK>.
Incident follow-up
Remember to review and schedule Incident Follow-up work <https://phabricator.wikimedia.org/project/view/4758/> in Phabricator, which are preventive measures and tech debt mitigations written down after an incident is concluded. Read about past incidents at Incident status <https://wikitech.wikimedia.org/wiki/Incident_status> on Wikitech.
Recently conducted incident follow-up:
Create a dashboard for Prometheus metrics about health of Prometheus itself. <https://phabricator.wikimedia.org/T222102>
Pitched by CDanis after an April 2019 incident, carried by Filippo (@fgiunchedi).
Improve wording around AbuseFilter messages about throttling functionality. <https://phabricator.wikimedia.org/T200036>
Originally filed in 2018. This came up last month during an incident where the wording may've led to a misunderstanding. Now resolved by @Daimona.
Exclude restart procedure from automated Elasticsearch provisioning. <https://phabricator.wikimedia.org/T290902>
There can be too much automation. Filed after an incident last September. Fixed by @RKemper.
Outstanding errors
Take a look at the workboard and look for tasks that could use your help.
→ https://phabricator.wikimedia.org/tag/wikimedia-production-error/
I skip breakdowns most months as each breakdown has its flaws. However, I hear people find them useful, so I'll try to do them from time to time with my noted caveats. The last breakdown was in the December edition <https://phabricator.wikimedia.org/phame/post/view/265/production_excellence…>, which focussed on throughput during a typical month. Important to recognise is that neither high nor low throughput is per-se good or bad. It's good when issues are detected, reported, and triaged correctly. It's also good if a team's components are stable and don't produce any errors. A report may be found to be invalid or a duplicate, which is sometimes only determined a few weeks later.
The below "after six months" breakdown takes more of that into consideration by looking at what's on the table after six months (tasks upto Sept 2021). This may be considered "fairer" in some sense, although has the drawback of suffering from hindsight bias, and possibly not highlighting current or most urgent areas.
WMF Product:
* Anti Harassment Tools (3): 1 MW Blocks, 2 SecurePoll.
* Community Tech (0).
* Design Systems (1): 1 WVUI.
* Editing Team (15): 14 VisualEditor, 1 OOUI.
* Growth Team (13): 11 Flow, 1 GrowthExperiments, 1 MW Recent changes.
* Language Team (6): 4 ContentTranslation, 1 CX-server, 1 Translate extension.
* Parsoid Team (9): 8 Parsoid, 1 ParserFunctions extension .
* Product Infrastructure: 2 JsonConfig, 1 Kartographer, 1 WikimediaEvents.
* Reading Web (0).
* Structured Data (4): 2 MW Uploading, 1 WikibaseMediaInfo, 1 3D extension.
WMF Tech:
* Data Engineering: 1 EventLogging.
* Fundraising Tech: 1 CentralNotice.
* Performance: 1 Rdbms.
* Platform MediaWiki Team (19): 4 MW-Page-data, 1 MW-REST-API, 1 MW-Action-API, 1 MW-Snapshots, 1 MW-ContentHandler, 1 MW-JobQueue, 1 MW-libs-RequestTimeout, 9 Other.
* Search Platform: 1 MW-Seach.
* SRE Service Operations: 1 Other.
WMDE:
* WMDE-Wikidata (7): 5 Wikibase, 2 Lexeme.
* WMDE-TechWish: 1 FileImporter.
Other:
* Missing steward (7): 2 Graph, 2 LiquidThreads, 2 TimedMediaHandler, MW Contributions 1 page.
* Individually maintained (2): 1 WikimediaIncubator, 1 Score extension.
Trends
In February, we reported 25 new production errors <https://phabricator.wikimedia.org/maniphest/query/1B79KZ8KkRj6/#R>. Of those, 13 have since been resolved, and 12 remain open as of today (two weeks into the following month). We also resolved 22 errors that remained open from previous months. The overall workboard has grown slightly to a total of 301 outstanding error reports.
For the month-over-month graph, refer to the spreadsheet. <https://docs.google.com/spreadsheets/d/e/2PACX-1vTrUCAI10hIroYDU-i5_8s7pony…>
Thanks!
Thank you to everyone who helped by reporting, investigating, or resolving problems in Wikimedia production. Thanks!
Until next time,
– Timo Tijhof
🔗 Share or read online via https://phabricator.wikimedia.org/phame/post/view/267/
This email is a summary of the Wikimedia production deployment of
1.38.0-wmf.25
- Conductor: Ahmon Dancy
- Backup Conductor: Brennen Bearnes
- Blocker Task: T300201 <https://phabricator.wikimedia.org/T300201>
- Current Status <https://versions.toolforge.org/>: Stable on all wikis.
📊 By the Numbers
Sparklines comparing with the last 5 trains.
- 368 Patches ▃▅▁▄█
- 1 Rollbacks ▁▁▁▁█
- 0 Days of delay ▄▄▄▄▄
- 9 Blockers ▁▁▁▂█
🥰 Traintastic Folks 😻
Thanks to folks who reported or resolved blockers:
- Bryan Davis
- Bartosz Dziewoński
- Timo Tijhof
- Zabe
- Dom Walden
- Ammarpad
- Xiplus
- Kosta Harlan
- Subramanya Sastry
- Jon Robson
*We’re going to do more than one train next week – starting Mon, 21 March
2022*.
We’re aiming for four trains. 😬
____
We’ve been running the train since version 1.20,
<https://www.mediawiki.org/wiki/MediaWiki_1.20/Roadmap#Schedule_for_the_depl…>
and we believe a more frequent train will be easier, faster, and safer.
There are unknowns in this process. Our goal is to surface the unknowns to
increase the safety and speed of deployment.
*Please reach out* if you would like to be in closer contact with our team
as part of this process.*
___
📅 Schedule
Join us in *#wikimedia-operations *IRC and *#engineering-all* slack for all
the excitement. We’re also planning to use Google Meet sessions for pairing.
If you’re interested in joining a session (and you’ve signed an NDA) sign
up (with an email that can join a meet) on the trainsperiments page on
MediaWiki
<https://www.mediawiki.org/wiki/Talk:Wikimedia_Release_Engineering_Team/Trai…>
.
Here’s the schedule—*two trains in 24 hours to start the week*:
-
*Mon, 21 March 2022*
-
UTC AM Train – 09:00 :
-
+0 minutes – 1.39.0-wmf.1 Group0
-
+30m – 1.39.0-wmf.1 Group1
-
UTC PM Train – 18:00:
-
+0 minutes – 1.39.0-wmf.1 All Wikis
-
+30m – 1.39.0-wmf.2 Group0 (Δ 10 hours)
-
+45m – 1.39.0-wmf.2 Group1
-
+90m – 1.39.0-wmf.2 All Wikis
-
*Tue, 22 March 2022*
-
UTC AM Train – 09:00:
-
+0 minutes – 1.39.0-wmf.3 Group0 (Δ 14 hours)
-
+15m – 1.39.0-wmf.3 Group1
-
+60m – 1.39.0-wmf.3 All Wikis
-
*Wed, 23 March 2022*
-
UTC PM Train – 18:00
-
+0 minutes – 1.39.0-wmf.4 Group0 (Δ 20 hours)
-
+15m – 1.39.0-wmf.4 Group1
-
+60m – 1.39.0-wmf.4 All Wikis
-
*Thu, 24 March 2022*
-
UTC AM Train – 09:00: Catchup and cleanup
-
UTC PM Train – 18:00: Catchup and cleanup
____
🤔 Unknowns
Here’s a list of things we know we don’t know. If you can help with any of
these: reach out!
-
How will more trains affect caches and site performance? (T303758
<https://phabricator.wikimedia.org/T303758>)
-
Are developers going to need to be more mindful of forward and backward
compatibility? (T303759 <https://phabricator.wikimedia.org/T303759>)
-
How will rollback work with blockers? This one is on us, but we’ll be
figuring it out as we go. (T303747
<https://phabricator.wikimedia.org/T303747>)
-
Will blocker tasks be too confusing with so many trains?
-
Is this sustainable?
___
*💌 Get in touch
-
Reply to this email
-
Start a topic on our talk page:
https://www.mediawiki.org/wiki/Talk:Wikimedia_Release_Engineering_Team/Trai…
-
Put a task on our workboard:
https://phabricator.wikimedia.org/project/board/5811/
Tyler Cipriani (he/him)
Engineering Manager, Release Engineering
Wikimedia Foundation
TLDR: Tech leads please review Best practices for extensions <https://www.mediawiki.org/wiki/Best_practices_for_extensions> on mediawiki.org.
Hi all,
You may be familiar with the Best practices for extensions <https://www.mediawiki.org/wiki/Best_practices_for_extensions> page on mediawiki.org. It has been marked as a draft since 2017.
I'd like to polish this page and get it to a state where it would be uncontroversial to label it as "Development guideline <https://www.mediawiki.org/wiki/Development_guidelines>". This would not make it a hard policy. Neither does it imply that it covers all practices in all situations.
Rather, it would mean that the items that are there now are indeed a part of our current best practices. We would keep it alive through bold <https://en.wikipedia.org/wiki/Wikipedia:Be_bold> edits and talk page conversations, similar to our Coding conventions <https://www.mediawiki.org/wiki/Manual:Coding_conventions/PHP> and other such guidelines that we maintain peer to peer and through consensus.
The reason I've not simply labelled it as such already is because before today I found the document to be out of sync with our actual practices. I have made a number of changes with descriptive edit summaries to bring it in sync with what I percieve to be our best practices; based on how myself and other maintainers perform code review at large, and how we review new extensions prior to deployment.
All are welcome to fix mistakes, raise questions/concerns on the talk page, on this thread. You're also welcome to message me directly anytime if you prefer.
If you consider yourself familiar with our practices and/or lead and mentor other engineers, please take a minute to review the page and consider whether the items reflect your current understanding and judgement.
--
Timo Tijhof,
Principal Engineer,
Wikimedia Performance Team.
This email is a summary of the Wikimedia production deployment of
1.38.0-wmf.24
* Conductor: Brennen Bearnes
* Backup Conductor: Dan Duvall
* Blocker Task: T300200 <https://phabricator.wikimedia.org/T300200>
* Current Status <https://versions.toolforge.org>
📈 By the Numbers
Sparklines comparing with the last 5 trains.
* 306 Patches █▄▆▁▅
* 0 Rollbacks ▄▄▄▄▄
* 0 Days of delay ▄▄▄▄▄
* 5 Blockers ▁▆▆▆█
🥰 Trainlicious Shoutouts 😻
Thanks to folks who reported or resolved blockers, as well as those who
assisted with /potential/ blockers:
* Peter Pelberg
* Jon Robson
* Zabe
* Legoktm
* Bartosz Dziewoński
* Krinkle
* ssastry
* Arlolra
* DLynch
--
Brennen Bearnes
Release Engineering
Wikimedia Foundation
Hello again, Wikimedia Hackers!
We invite technical community members and affiliates to apply for Rapid
Fund grants to host local meetups during or around the Hackathon. Grants
can be between 500 and 5,000 USD. Please note that we cannot go above 5,000
USD per grant, so plan accordingly. The deadline to apply is March 20,
2022. This is a quick turn-around, so note that proposals do not have to be
extensive or complex.
Examples of requests include:
-
Food for community meetups before, during, or after the online Hackathon
(please keep meetups within a month of the Hackathon dates).
-
Venue rental for attendees
-
Transportation or commuting expenses
-
Scholarships to help local community members attend, which can include
data packages, childcare costs, or transportation
-
The Foundation offers this list of eligible expenses
<https://meta.wikimedia.org/wiki/Grants:Project/Rapid/Learn> (e.g., you
can use this grant money to pay for direct expenses like supplies, but you
cannot use funds to pay for an event coordinator). You cannot pay
honorariums to speakers you have at your events but you can give material
gifts (gift cards, etc.).
-
Note that at this moment, in-person meetups of 10 people or fewer can be
held without extra approval; if your meetup will be larger please follow
the steps listed for Risk Assessment.
<https://meta.wikimedia.org/wiki/Grants:Risk_assessment_during_COVID-19>
Eligibility:
-
Proposals should support the achievement of Wikimedia’s mission
<https://meta.wikimedia.org/wiki/Mission> and be related to the
Hackathon (this cannot support edit-a-thon projects for example).
-
There are some countries to which the Foundation cannot disburse funds.
If you have been ineligible for WMF funds through other opportunities,
please email hlepp(a)wikimedia.org to verify your eligibility for this
application.
-
Both individuals and organizations are welcome to apply.
-
Someone in your group must have access to a bank account which can
receive international wire transfers.
-
The Code of Conduct for Wikimedia's Technical Spaces
<https://www.mediawiki.org/wiki/Special:MyLanguage/Code_of_Conduct> will
be in effect throughout the event, on all platforms and at local meetups.
Please have a look at it and ensure you are willing and able to follow it.
How to Apply: Start your application
<https://meta.wikimedia.org/wiki/Grants:Project/Rapid/Apply#Other> in the
yellow box marked “Other”. Enter the title of your proposal: Hackathon +
[Institution/Group/Individual Name] (e.g., Hackathon University of Ghana
Legon) and click “Start my application”. Submit the application. Any
questions can be sent to hlepp(a)wikimedia.org.
Cheers,
Your Wikimedia Hackathon Committee