Following the recent outage, we've had a new series of complaints
about the lack of improvements in CX, especially related to
server-side activities like saving/publishing pages.
Now, I know the team is involved in a long-term effort to merge the
editor with the VE, but is there an end in sight for that effort? Can
I tell people who ask "look, 6 more months then we'll have a much
better translation tool"?
Is there a publicly available roadmap for this project and more
generally, for CX?
MediaWiki core is upgrading its version of QUnit from 1.x to 2.x.
This means extensions or skins with QUnit tests must now be compatible
See https://phabricator.wikimedia.org/T170515 and
### Deprecated API
In 2014, QUnit started to overhaul its API, to be more robust and better
support async workflows. The most notable change was the removal of global
and static functions, in favour of more contextual methods.
The first part of this released in 1.15, and more was gradually introduced
in later releases. The vast majority of our codebases are already using the
new interfaces. In fact, the vast majority of our QUnit tests were written
after 2014 and never used the old interfaces in the first place.
For a short list of removed functions, see
If you find a QUnit Jenkins job for a MediaWiki extension or skin repo
starts failing, it is most likely due to this. Look for errors such as
"QUnit.start undefined", "test.callback is not a function",
"QUnit.asyncTest is undefined", and "QUnit.push is undefined",
There are also some methods that have been deprecated over the past few
years. These still work in QUnit 2. Please take a moment to familiarise
yourself with the renamed methods and new methods. Doing so will avoid
confusion when reading new code that uses them.
### New features
The 'setup' and 'teardown' module hooks are now called 'beforeEach' and
'afterEach'. The old names still work, but the new names clarify that these
hooks are run for each QUnit.test().
QUnit 2.0 also adds new 'before' and 'after' hooks, which run only once per
module. This is somewhat analogous to use of setUpBeforeClass() in PHPUnit.
Since QUnit 1.16, QUnit.test() supports returning a Promise from the test
callback. This automatically attaches an assert.async() handler and waits
for the promise to complete. It also asserts that the Promise will be
resolved, and fails the test if rejected. This helps avoid a common pitfal
where a test could timeout when forgetting to attach a promise.fail()
The copy used for command-line usage (grunt-karma) and Jenkins will be
upgraded by https://gerrit.wikimedia.org/r/367838
Please take a look at the new browser report with more detailed desktop
site data (all wikimedia projects agreggated):
* Data is very stable over the last year
* Chrome in the lead with 45% of traffic, closely followed by IE (18%) and
* The bulk of IE traffic is IE11 and IE7
* Edge shows up with 4% slowly catching up to Safari (5%)
* This data is still subject to fluctuations due to bot traffic not
identified as such. We will be working on this next year.
How to read this post?
* For those without time to read lengthy technical emails,
read the TL;DR section.
* For those who don't care about all the details but want to
help with this project, you can read sections 1 and 2 about Tidy,
and then skip to section 7.
* For those who like all their details, read the post in its entirety,
and follow the links.
Please ask follow up questions on wiki *on the FAQ’s talk page* . If you
find a bug, please report it *on Phabricator or on the page mentioned
The Parsing team wants to replace Tidy with a RemexHTML-based solution
Wikimedia cluster by June 2018. This will require editors to fix pages and
templates to address wikitext patterns that behave differently with
RemexHTML. Please see 'What editors will need to do' section on the Tidy
replacement FAQ .
1. What is Tidy?
Tidy  is a library currently used by MediaWiki to fix some HTML errors
found in wiki pages.
Badly formed markup is common on wiki pages when editors use HTML tags in
templates and on the page itself. (Ex: unclosed HTML tags, such as a <small>
without a </small>, are common). In some cases, MediaWiki can generate
erroneous HTML by itself. If we didn't fix these before sending it to
browsers, some would display things in a broken way to readers.
But Tidy also does other "cleanup" on its own that is not required for
correctness. Ex: it removes empty elements and adds whitespace between HTML
tags, which can sometimes change rendering.
2. Why replace it?
Since Tidy is based on HTML4 semantics and the Web has moved to HTML5, it
also makes some incorrect changes to HTML to 'fix' things that used to not
work; for example, Tidy will unexpectedly move a bullet list out of a table
caption even though that's allowed. HTML4 Tidy is no longer maintained or
packaged. There have also been a number of bug reports filed against Tidy
. Since Parsoid is based on HTML5 semantics, there are differences in
rendering between Parsoid's rendering of a page and current read view that
is based on Tidy.
3. Project status
Given all these considerations, the Parsing team started work to replace
 around mid-2015. Tim Starling started this work and after a survey of
existing options, decided to write a wrapper over a Java-based HTML5 parser.
At the time we started the project, we thought we could probably have Tidy
replaced by mid-2016. Alas!
4. What is replacing Tidy?
Tidy will be replaced by a RemexHTML-based solution that uses the
RemexHTML library along with some Tidy-compatibility shims to ensure
better parity with the current rendering. RemexHTML is a PHP library
wrote with C.Scott’s input that implements the HTML5 parsing spec.
5. Testing and followup
We knew that some pages will be affected and need fixing due to this change.
In order to more precisely identify what that would be, we wanted to do some
thorough testing. So, we built some new tools  and overhauled and
upgraded other test infrastructure  to let us evaluate the impacts of
replacing Tidy (among other such things in the future) which can be a
of a post all on its own.
You can find the details of our testing on the wiki , but we found
that a large number of pages had rendering differences. We analyzed the
results and categorized the source of differences. Based on that, to
process of replacement, we added a bunch of compatibility shims to mimic
Tidy does. I am skipping the details in this post. Even after that, newer
testing showed that this nevertheless still leaves us with a few patterns
that need fixing that we cannot / don't want to work around automatically.
6. Tools to assist editors: Linter & ParserMigration
In October 2016, at the parsing team offsite, Kunal ([[User:Legoktm (WMF)]])
dusted off the stalled wikitext linting project  and (with the help from
a bunch of people on the Parsoid, db/security/code review areas) built the
Linter extension that surfaces wikitext errors that Parsoid knows about to
let editors fix them.
Earlier this year, we decided to use Linter in service of Tidy replacement.
Based on our earlier testing results, we have added a set of high-priority
linter categories that identifies specific wikitext markup patterns on wiki
pages that need to be fixed .
Separately, Tim built the ParserMigration extension to let editors evaluate
their fixes to pages . You can enable this in your editing
replace '&action=edit' in your url bar with '&action=parsermigration-edit'.
7. What editors have to do
The part that you have all been waiting for!
Please see 'What editors will need to do' section on the Tidy
. We have added simplified instructions, so that even community members
who do not consider themselves "techies" can still learn about ways to fix
pages. We'll keep that section up to date based on feedback and questions.
But since it is a wiki, please also edit and tweak as required to make the
text useful for yourselves! This is a first call for fixes and it is about
the problems defined as "high priority". We'll issue other calls in the
future for any other necessary Tidy fixups.
* As noted on that page, the linter categories don't cover all the possible
sources of rendering differences. For example, there is still T157418
left to address. For those who have an opinion about this, please
on that task. We are still evaluating the best solution for this without
adding more cruft to wikitext behavior or kicking the cleanup can down
* As the issues in the identified linter categories are fixed, we might be
better able to isolate other issues that need addressing.
8. So, when will Tidy actually be replaced?
We really would like to get Tidy removed from the cluster latest by June
(or sooner if possible), and your assistance and prompt attention to these
markup issues would be very helpful. We will do this in a phased manner on
different wikis rather than all at once on all wikis.
We really want to do this as smoothly as possible without disrupting the
of editors or affecting the rendering of the large corpus of pages on the
various wikis. As you might have gathered from the text above, we have built
and leveraged a wide variety of tools to assist with this.
9. Monitoring progress
In order to monitor progress, we plan to do a weekly (or some such periodic
frequency) test run that compares the rendering of pages with Tidy and with
RemexHTML on a large sample of pages (in the 50K range) from a large subset
of Wikimedia wikis (~50 or so). This will give us a pulse of how fixups are
going, and when we might be able to flip the switch on different wikis.
Subramanya (Subbu) Sastry
For patrolling work, ORES usually has two levels of support:
- For basic support we usually provide a model that is called 'reverted'
and has less accuracy. It also risks perpetuating editor biases due to the
lack of differentiation between reasons that a change may have been
- For advanced support, we require manual labeling of a large sample of
edits, but then we can provide two more models: 'damaging' and 'goodfaith'.
ORES review tool can only be enabled on wikis where advanced support is
available and most other tools prefer the 'damaging' over the basic
'reverted' model as well.
So, for performance and capacity reasons we think it makes sense to remove
'reverted' models from the ORES service when 'damaging' model is made
available. However, we also want to be careful about making sure this
change doesn't disrupt the work of tool developers that make use of the
If you do, please voice your concerns now. If there is no objection within
the next two weeks, we'll begin the process of removing the 'reverted'
model for wikis that have the 'damaging' model available.
Related phabricator card: https://phabricator.wikimedia.org/T171059
Amir Sarabadani Tafreshi
Software Engineer (contractor)
Wikimedia Deutschland e.V. | Tempelhofer Ufer 23-24 | 10963 Berlin
Wikimedia Deutschland - Gesellschaft zur Förderung Freien Wissens e.V.
Eingetragen im Vereinsregister des Amtsgerichts Berlin-Charlottenburg unter
der Nummer 23855 B. Als gemeinnützig anerkannt durch das Finanzamt für
Körperschaften I Berlin, Steuernummer 27/681/51985.
I'd like to welcome you to join us at the CREDIT showcase next week,
Wednesday, 2-August-2017 at 1800 UTC / 1100 Pacific Time. We'd like to see
your demos, whether they're rough works in progress or polished production
material, or even just a telling of something you've been studying
recently. For more information on the upcoming event, as well as recordings
of previous events, please visit the following page:
And if you'd like to share the news about the upcoming CREDIT showcase,
here's some suggested verbiage. Thanks!
*I hope all is well with you! I wanted to let you know about CREDIT, a
monthly demo series that we’re running to showcaseopen source tech projects
from Wikimedia’s Community, Reading, Editing, Discovery, Infrastructure and
Technology teams. *
*CREDIT is open to the public, and we welcome questions and discussion. The
next CREDIT will be held on August 2nd at 11am PT / 2pm ET / 18:00 UTC. *
*There’s more info on MediaWiki, and on Etherpad, which is where we take
notes and ask questions. You can also ask questions on IRC in the Freenode
chatroom #wikimedia-office (web-based access here). Links to video will
become available at these locations shortly before the event.*
*Please feel free to pass this information along to any interested folks.
Our projects tend to focus on areas that might be of interest to folks
working across the open source tech community: language detection,
numerical sort, large data visualizations, maps, and all sorts of other
*If you have any questions, please let me know! Thanks, and I hope to see
you at CREDIT.*
Project Assistant, Engineering Admin