Hello everyone,


As you are aware from previous postings on this list [1] [2] [3] [4] [5] [6], we have been progressively replacing Tidy with RemexHtml on all wikis on the wikimedia cluster. As of today, about 650 wikis have made the switch that include a number of large wikis. We aim to complete this switch over on the remaining 250 wikis by end of June 2018. Another 40 or so wikis will be switched on May 2nd.

There are a few large wikis (es, pt, uk, zh especially) that could use more attention addressing Linter issues so that when we make the switch end of June, some pages on these wiki don't render differently from how they do now.


I started investigating more closely where the remaining large wikis are with respect to the linter issues (high priority categories on the Special:LintErrors page) that are pertinent to these wikis. I am listing below results from running sql queries on quarry.wmflabs.org for these wikis. If you are a community member on any of these wikis, do try to address these on your wiki.

15 other large wikis:

See https://quarry.wmflabs.org/query/26474 for counts of linter issues for each of the 9 categories in the main namespace.

* es, pt, uk, zh wikis have total error counts over 10K and in some cases, it is usually one category which needs attention.
* vi, ro, sr, sh, ar, tr, id are not too bad but don't seem to have seen a lot of change which indicates that these wikis aren't looking at linter issues.
* fr, hu, ja, pl wikis seem to be in good shape overall. There has been a steady fixing of issues and I think all these will will be in fairly decent shape for replacing Tidy by end of June.

https://www.mediawiki.org/wiki/Parsing/Replacing_Tidy/FAQ#Simplified_instructions_for_fixing_pages has some summarized instructions for fixing issues in different categories

English Wikipedia:

See https://quarry.wmflabs.org/query/25665 for counts of linter issues for each of the 9 categories in the main namespace.

English wp has been making slow and gradual progress. I think overall, despite there still being ~8300 instances (not pages) that need fixing, enwp is in pretty good shape for replacing Tidy by end of June.


See https://quarry.wmflabs.org/query/25693 for counts of linter issues for each of the 9 categories in the File (ns6), Gallery (ns0), and Template (ns10) namespaces.

The vast majority of html5-misnesting errors on commons seem to come from the use of the {{lang}} template which uses a <span> tag to wrap content. However, it seems to be extremely common to pass content with paragraphs into the {{lang}} template. Right now, this doesn't cause any visible rendering issues and could be ignored temporarily, but we strongly recommend fixing lang to use <div> or on pages which misuse {{lang}} this way, replace use of {{lang}} by creating a new template ({{lang-block}} maybe?) that uses a <div> tag.

Some tips:

1. On some wikis, fixing templates usually fixes the problem. Over the last 6 months, I've personally spent many hours fixing 100s of templates on 10s of different wikis and can personally attest to the efficacy of that strategy.
2. A lot of the html5-misnesting errors seem to be from incorrectly using a <span> tag to wrap content that has paragraphs, lists, tables. In all these cases, changing them to <div> almost always fixes the problem.

If you need any assistance, please leave a message on https://www.mediawiki.org/wiki/Help_talk:Extension:Linter. Between 8 am - 4pm PST, you can also usually find us on IRC on #mediawiki-parsoid.


(on behalf of the Parsing team @ Wikimedia Foundation)

