On 07/06/2017 08:02 AM, Subramanya Sastry wrote:
TL;DR
The Parsing team wants to replace Tidy with a RemexHTML-based solution on the Wikimedia cluster by June 2018. This will require editors to fix pages and templates to address wikitext patterns that behave differently with RemexHTML. Please see 'What editors will need to do' section on the Tidy replacement FAQ [1].
......
- Monitoring progress
In order to monitor progress, we plan to do a weekly (or some such periodic frequency) test run that compares the rendering of pages with Tidy and with RemexHTML on a large sample of pages (in the 50K range) from a large subset of Wikimedia wikis (~50 or so). This will give us a pulse of how fixups are going, and when we might be able to flip the switch on different wikis.
I wanted to post some followups on this.
1. We have a revived dashboard that tracks linter error counts on wikis for all linter categories.
See https://tools.wmflabs.org/wikitext-deprecation/
2. We track the error counts as they change and publish weekly snapshots comparing counts to a July 24th baseline (which is when I first started collecting stats)
See https://www.mediawiki.org/wiki/Parsing/Replacing_Tidy/Linter/Stats
3. We also have a pixel-diffs test run (previously called visual diffs) that compares page rendering with Tidy and with RemexHTML. The test set has 73K pages sampled from 60 wikis. These diffs more accurately reflect what kind of rendering differences we can expect to see if pages are not fixed.
See http://mw-expt-tests.wmflabs.org/
4. Based on the runs above, I identified one more high priority linter category which is a Tidy whitespace bug and needs to be fixed (expect mostly templates, especially navboxes based on what I've seen in the test run above). Once the code is reviewed and deployed to the cluster, we'll start populating this category.
See https://gerrit.wikimedia.org/r/#/c/371068/ and https://gerrit.wikimedia.org/r/#/c/371071/
Thanks, Subbu.