Hi all!

tl;dr: There's a large backlog of production errors. Release Engineering is blocking the train for any new logspam. Your help is needed!

A quick update on the deployment train:

In the process of rolling out wmf/1.36.0-wmf.28 there were a number of issues that prevented us from rolling forward the train in a timely manner. After the issues were resolved and backports deployed to the current version in production (wmf/1.36.0-wmf.27), we realized there were a few remaining spammy log messages and blocked the following week's train on those issues.

Release Engineering has long blocked the train on logspam issues[0]. Even when it does not indicate user-facing errors, logspam of any kind makes it harder for us to see real problems. We have, however, defaulted to pushing forward the train despite minor issues.

Under this custom, many log messages have been accepted as "just occassional, not a big deal" or "yeah, we'll fix that eventually... it's not a big deal". Frequently, "eventually" never arrives. This results in an unmanageable accumulation of exceptions (see the ever-growing list of exceptions in the Wikimedia-production-error workboard[1] and logstash[2]).

To deal with these issues we are now, as a matter of policy, blocking trains that cause any new error messages. In most cases new errors are the result of code changes that lack defensive coding practices and/or have unexpected interactions with other code. The best resolution in these cases is for the code to be fixed or reverted.

Release Engineering organises a weekly "train log triage" meeting, on Wednesdays at 19:00 UTC, where we invite people who develop MediaWiki to help triage log messages. As of this week, there is also a second one, on Thursdays at 10:00 UTC, to be more suitable for people in EU time zones. We invite everyone who develops MediaWiki or its extensions to join one of the meetings each week.

Thank you,

Greg

[0]: <https://wikitech.wikimedia.org/wiki/Deployments/Holding_the_train#Logspam>
[1]: https://phabricator.wikimedia.org/project/view/1055/
[2]: https://logstash.wikimedia.org/app/dashboards#/view/0a9ecdc0-b6dc-11e8-9d8f-dbc23b470465




--
| Greg Grossmeier              GPG: B2FA 27B1 F7EB D327 6B8E |
| Dir. Engineering Productivity     A18D 1138 8E47 FAC8 1C7D |