đ (est. 3 minute read) https://phabricator.wikimedia.org/phame/live/1/post/140/ -------
Howâd we do in our strive for operational excellence last month? Read on to find out!
- Month in numbers. - Highlighted stories. - Current problems.
## đ *Month in numbers*
* 4 documented incidents in January 2019. [1] * 16 Wikimedia-prod-error tasks closed. [2] * 17 Wikimedia-prod-error tasks created. [3]
## *ď¸âŁ *Unable to move certain file pages*
Xiplus reported that renaming a File page on zh.wikipedia.org led to a fatal database exception. Andre Klapper identified the stack trace from the logs, and Brad (Anomie) investigated.
The File renaming failed because the File page did not have a media file associated with it (such move action is not currently allowed in MediaWiki). But, while handling this error the code caused a different error. The impact was that the user didn't get informed about why the move failed. Instead, they received a generic error page about a fatal database exception.
Brad fixed the code a few hours later, and it was deployed by Roan later that same day. Thanks! â https://phabricator.wikimedia.org/T213168
## *ď¸âŁ *DBPerformance regression detected and fixed*
During a routine audit of Logstash dashboards, I found a DBPerformance warning. The warning indicated that the limit of 0 for âmaster connectionsâ was violated. That's a cryptic way of saying it found code in MediaWiki that uses a database master connection on a regular page view.
MediaWiki can have many replica database servers, but there can be only one master database at any given moment. To reduce chances of overload, delaying edits, or network congestion; we make sure to use replicas whenever possible. We usually involve the master only when source data is being changed, or is about to be changed. For example, when editing a page, or saving changes.
As the vast majority of traffic is page views, we have lower thresholds for latency and dependency on page views. In particular, page views may (in the future) be routed to secondary data centres that donât even have a master DB.
Tchanders from the Anti-Harassment tea) investigated the issue, found the culprit, and fixed it in time for the next MediaWiki train. Thanks! â https://phabricator.wikimedia.org/T214735
## *ď¸âŁ *TemplateData missing in action*
Tacsipacsi and Evad37 both independently reported the same TemplateData issue. TemplateData powers the template insertion dialog in VisualEditor. It wasn't working for some templates after we deployed the 1.33-wmf.13 branch.
The error was âArgument 1 passed to ApiResult::setIndexedTagName() must be an instance of array, null givenâ. This means there was code that calls a function with the wrong parameter. For example, the variable name may've been misspelled, or it may've been the wrong variable, or (in this case) the variable didn't exist. In such case, PHP implicitly assumes ânullâ.
Bartosz (Matmarex) found the culprit. The week before, I made a change to TemplateData that changed the âtemplate parameter orderâ feature to be optional. This allows users to decide whether VisualEditor should force an order for the parameters in the wikitext. It turned out I forgot to update one of the references to this variable, which still assumed it was always present.
Brad (Anomie) fixed it later that week, and it was deployed the next day. Thanks! â https://phabricator.wikimedia.org/T213953
## đ *Current problems*
Take a look at the workboard and look for tasks that might need your help. The workboard lists known issues, grouped by the week in which they were first observed.
â https://phabricator.wikimedia.org/tag/wikimedia-production-error/
There are currently 188 open Wikimedia-prod-error tasks as of 12 February 2019. (Weâve had a slight increase since November; 165 in December, 172 in January.)
For this monthâs edition, Iâd like to draw attention to a few older issues that are still reproducible:
* [2013; Collection extension] Special:Book fatal error for blocked users. â https://phabricator.wikimedia.org/T56179 * [2013; CentralNotice] Fatal error when placeholder key contains a space. â https://phabricator.wikimedia.org/T58105 * [2014; LQT] Fatal error when attempting to view certain threads. â https://phabricator.wikimedia.org/T61791 * [2015; MassMessage] Warning about Invalid message parameters. â https://phabricator.wikimedia.org/T93110 * [2015; Wikibase] Warning âUnresolvedRedirectExceptionâ for some pages on Wikidata (and Commons). â https://phabricator.wikimedia.org/T93273
## đĄTerminology
A âFatal errorâ (or uncaught exception) prevents a user action. For example â a page might display âMWException: Unknown class NotificationCount.â, instead the article content. A âWarningâ (or non-fatal, or PHP error) lets the program continue to display a mostly page regardless. This may cause corrupt, incorrect, or incomplete information to be shown. For example â a user may receive a notification that says âYou have (null) new messagesâ.
## đ Thanks!
Thank you to everyone who has helped by reporting, investigating, or resolving problems in Wikimedia production. Including: Xiplusâ Anomie, Daimona Gilles, He7d3r, Jdforrester, MatmaRex, MModell, Nikerabbit, Catrope, Tchanders, Tgr, and Thiemo.
Thanks!
Until next time,
â Timo Tijhof
đ˘*There's a snake in my boot. Reach for the sky!*
-------
Footnotes:
[1] Incidents. â https://wikitech.wikimedia.org/wiki/Special:AllPages?from=Incident+documenta...
[2] Tasks closed. â https://phabricator.wikimedia.org/maniphest/query/COTGbmxGcm_l/#R
[3] Tasks created. â https://phabricator.wikimedia.org/maniphest/query/DLRuzOg9bSJA/#R
On Tue, Feb 12, 2019 at 10:54 PM Krinkle krinklemail@gmail.com wrote:
Brad fixed the code a few hours later, and it was deployed by Roan later that same day. Thanks! â https://phabricator.wikimedia.org/T213168
Correction: It was GergĹ Tisza who submitted the patch to fix the code for this one, not me.
wikitech-l@lists.wikimedia.org