Summary
-------
Parsoid/PHP, the PHP port of Parsoid is now live everywhere for all
products on all wikis. Parsoid/JS is still deployed on the Wikimedia
cluster but doesn't receive any traffic and will be decommissioned
in the new year.
Context: Making Parsoid the default MediaWiki wikitext engine
-------------------------------------------------------------
In 2018, we completed the replacement of HTML4 Tidy from MediaWiki
with RemexHTML, a HTML5 parser.
The port of Parsoid to PHP is the next step along the way to integrate
Parsoid and the MediaWiki wikitext parser into a single wikitext engine.
That final product will let us bring the benefits of Parsoid's approach
to a wider set of products and let us start work on improving templates
and wikitext or other features without having to implement that in two
wikitext engines with different processing models.
See a Feb 27 tech talk for a full context [1]. A future blog post will
provide more details about the porting project and process.
Performance
-----------
Parsoid/PHP on the wikimedia cluster seems to be about 2x faster than
Parsoid/JS for the wikitext -> HTML and HTML -> wikitext endpoints.
This performance bump is a pleasant surprise given that going in, we
anticipated to incur some performance penalty. While we have various
theories about the factors contributing to this, we haven't had the
opportunity yet to investigate fully. The load on the MediaWiki Action API
endpoint was also significantly reduced on Dec 13 when we stopped
processing mirrored traffic on Parsoid/JS (Parsoid/PHP accesses the
database directly instead).
Timeline
--------
We started porting in earnest in Feb with some preparatory work over
the previous few months. We deployed Parsoid/PHP as a passive mirror
of the full volume of wikitext -> HTML requests in October & November
to discover and fix problems early. By Dec 2, we’d enabled Parsoid/PHP
for the majority of Parsoid clients on all wikis. On Dec 18, we switched
everything over to Parsoid/PHP.
Compared to an earlier estimate of 9 months, we took about 2 months longer
to get to this milestone.
Thanks to all the testing in place, in the end, this Parsoid/JS to
Parsoid/PHP switch went fairly smoothly with some minor glitches.
Credits
-------
This project to port Parsoid from Javascript (Node.js) to PHP was a
multi-team collaboration. The Parsing, Core Platform, Product
Infrastructure,
Service Ops, Security teams from the Wikimedia Foundation were the relevant
teams. We also benefited with some contractorhelp from Wikiteq. Thanks to
everyone involved!
[1]
https://www.mediawiki.org/wiki/Wikimedia_Technical_Talks#Episode_1:_The_lon…