Hi all! As a reminder, the switch to Remex on ru.wp, sv.wp, fi.wp and he.wp (and others per https://phabricator.wikimedia.org/T184656 ) is actually happening *today*, in different deployment windows. If you notice something weird after it happens, make sure you post diffs to https://www.mediawiki.org/wiki/Help_talk:Extension:Linter where the team will be able to quickly assess the situation.
Thank you all, and thanks Parsing team for working on this.
[Wikitech-ambassadors] Replacing Tidy on Wikimedia wikis, second wave
Subramanya Sastry ssastry at wikimedia.org
Mon Jan 22 19:46:58 UTC 2018
TL;DR
On January 31, 2018, on ru.wp, sv.wp, fi.wp and he.wp, we are going to turn off Tidy and switch to the Remex HTML5 parsing library. Besides those, another 200+ wikis will also be switched away from Tidy on that day. You can find the list of such wikis at T184656 [1]. Do any of you belong to or know someone active in these communities? While we've also announced this on Tech News, based on our previous experience, since we don't anticipate the change to be ground-breaking for these communities, we think that "spamming" the village pumps may be not so effective and so we'd appreciate your help in assuring these wikis that they can contact us @ mw:Help_talk:Extension:Linter [2] if needed, and that there's plenty of documentation to help with Linter fixes at mw:Help:Extension:Linter [3]. Thanks! Background
In July 2017, we announced [4] our intention to replace Tidy with a HTML5-based solution on the Wikimedia cluster by the end of June 2018 at the latest. Please refer to that original posting for specifics of the project and why we are replacing Tidy. Status of Tidy replacement
Over the last 3 months, we have now replaced Tidy with RemexHTML on mediawiki, testwiki, nowiki, fawiki, itwiki, dewiki and 170 other small wikis. [5] We have approached ruwiki, svwiki, fiwiki, hewiki for replacement this month based on remaining linter errors and progress those wikis have been making. We expect to approach other medium and large wikis for replacement next month. In addition, for any wiki that has < 10 linter errors in any high-priority category, we will be replacing Tidy with RemexHTML. T184656 [1] has the list of wikis that will see this change (this list includes wikis that have already had Tidy replaced in December). To be clear, if we notice problems (or if the wiki requests it), we will revert the change after identifying the source of the problem. If you notice any incorrect rendering, you can use ?action=parsermigration-edit to identify if the switch from Tidy actually caused it. Status of linter fixes
We have been publishing weekly stats [6] of changes to linter counts which shows how wikis have been progressing with making linter fixes. Based on what we've observed, of the 38 largest wikis, besides the one that have Tidy replaced already or will get it replaced this month, most other wikis seem to be making progress, albeit at different rates. idwiki, viwiki, jawiki, and rowiki haven't seen a lot of activity yet. Results from pixel diff tests
We have also been doing weekly test runs to calculate pixel diffs on about 70K pages which we have sampled from over 50 wikis. To do this, we generate a screenshot of a page with Tidy and one with RemexHTML, and compare the renderings while ignoring vertical whitespace shifts. We generate a numeric score for the diff that tries to be reflective of the magnitude of differences we are seeing. Thanks to fixes to pages and our testing infrastructure to more accurately detect differences, between July 2017 and January 2018, the percentage of pages that rendered with only vertical whitespace shifts increased from 91.9% to 94.6%. Similarly, the percentage of pages that rendered with pixel perfect accuracy went up from 63.2% to 68.3%. For technical reasons related to the testing setup that I will skip here, 100% for either metric is not achievable. Summary
Overall, at the end of January, about 400 of Wikimedia's wikis will have replaced Tidy. This includes 7 of the largest wikis. Linter fixes are also happening on lots of wikis, but some large wikis could pick up the pace. We still expect to replace Tidy on all wikis by end of June 2018, and your cooperation and help with fixing pages identified by the Linter tool is greatly appreciated. Subbu, Manager and Technical Lead, Parsing Team @ the WMF.
- https://phabricator.wikimedia.org/T184656
- https://www.mediawiki.org/wiki/Help_talk:Extension:Linter
- https://www.mediawiki.org/wiki/Help:Extension:Linter
- https://www.mediawiki.org/wiki/Talk:Parsing/Replacing_Tidy/FAQ
- https://phabricator.wikimedia.org/T175706
- https://www.mediawiki.org/wiki/Parsing/Replacing_Tidy/Linter/Stats
-- Erica Litrenta
Manager, Community Liaisons