[---- Long mail - but only relevant to extension developers ----]
Greetings!
As some of you might know, on the Parsing Team [0], we are aspiring to replace the core wikitext parser with Parsoid [1] on Wikimedia wikis late next year and start to put to rest the two-parser ghost that has haunted us for many years. In recent years, we achieved two major milestones along the way: replace HTML4 tidy with HTML5 Remex [2], and port Parsoid from Javascript to PHP [3].
Given that context, if you (help) maintain an extension that:
* uses a "parser hook" and/or * uses the "parser API" (i.e. uses public properties / methods in Parser.php, ParserOutput.php, ParserOptions.php, etc.)
please read on. If you don't fit that description, you can stop reading now!
Parsoid models and processes wikitext quite differently from the core parser - all that Parsoid guarantees is that the rendering is largely identical, not the specific process of generating the rendering. This means that extensions that extend the behavior of the parser will need to adapt to work with Parsoid instead to provide similar functionality. With that in mind, we have been working to more clearly specify how extensions need to adapt to the Parsoid regime.
PARSOID & EXTENSIONS:
At a high level, here are the questions we needed to answer, along with some highly simplified answers:
1. How do extensions "hook" into Parsoid? A. Extensions need to think in terms of transformations (convert this to that) instead of parser pipeline events (at this point in the pipeline, call this listener). An additional detail here is that extensions cannot maintain global ordered state within extension code since Parsoid doesn't guarantee handlers will be invoked in the same order in which they showed up in page source. See the wiki [4] for more details.
As for the mechanics of registration, Parsoid uses existing mechanisms based on the extension.json file.
2. When the registered hook listeners are invoked by Parsoid, how do they process any wikitext they need to process? A. Parsoid provides all registered listeners with an API object to interact with it. Direct use of Parsoid internals code is strongly discouraged and will be enforced in various ways including via code review.
3. How is the extension's output assimilated into the page output? A. The output is treated as a "fully-processed" page/DOM fragment (with some caveats which will be clarified on wiki). It is appropriately decorated with additional markup, and slotted into place into the page. Extensions need not make any special efforts (aka strip state) to protect it from the parsing pipeline.
Slides 8-12 of the August 12 2020 Tech Talk [7] goes over the differences. Check the wiki [4] for more details of Parsoid's Extension API. It also maps core parser hooks to Parsoid's extension functionality.
CURRENT STATUS:
We consider the current proposal to be in late draft stage. That said, as we discover unsupported functionality, we will augment the set of hooks and the Parsoid Extension API as needed.
While there are a wide variety of extensions in the MediaWiki universe with varied use cases, our initial goal for the next year is just Wikimedia wikis and hence extensions that are deployed on the Wikimedia wikis. Once we are done with that, we will turn our attention to supporting extension use cases in the wider MediaWiki universe. But, now is a good time for all extension developers to study and review this API and give us feedback.
Since the beginning of this year, we've refactored all of the extensions we've written Parsoid versions of (Cite, Gallery, Poem, Pre, JSON) to now strictly use the Parsoid Extension API without cheating by virtue of being in the Parsoid codebase. So, this proposal is actually backed by an implementation that is in production for Wikimedia wikis.
FEEDBACK:
Here is where you come in.
* If you maintain / develop an extension, please review the document to see if your extension's use case is covered.
Ideally, leave your feedback on the Parsoid Extension API talk page [5] since it helps keep it all in one place. Alternatively, you can also leave questions / concerns / other feedback on the Phabricator task we've filed for TechCom's RFC process [6].
* If you feel bold, start the process of updating your extensions *now*. Note that your extension will need to operate with both the existing core parser as well as Parsoid till such time we deprecate and stop using the core parser.
There are known functionality gaps related to exposing ParserOutput object and providing setFunctionHook functionality. If your extension needs those, you should probably wait for us to fill that gap.
DOCS / MORE INFO / CONTACT:
* Check the wiki page [4] for docs and discuss on the talk page [5] * Check the August 12, 2020 Tech Talk [7] * Look at Parsoid code for extensions [8] * Look at Parsoid docs for the Ext/ namespace [9] * Talk to us on IRC in the #mediawiki-parsoid channel * Email us at parsing-team@wikimedia.org
Thanks! Subbu (on behalf of the Parsing Team).
-------------------------------------------------------------------------
0. https://www.mediawiki.org/wiki/Parsing 1. https://www.mediawiki.org/wiki/Parsing/Parser_Unification 2. https://blog.wikimedia.org/2018/07/09/tidy-html5-replacement/ 3. https://techblog.wikimedia.org/2020/02/12/parsoid-in-php-or-there-and-back-a...
4. https://www.mediawiki.org/wiki/Parsoid/Extension_API 5. https://www.mediawiki.org/wiki/Parsoid/Talk:Extension_API 6. https://phabricator.wikimedia.org/T260714 7. Slides: https://commons.wikimedia.org/wiki/File:Parsoid_%26_Extensions_August_2020_T...
Video: https://www.youtube.com/watch?v=lS1xPkERWCM 8. https://github.com/wikimedia/parsoid/tree/master/src/Ext 9. https://doc.wikimedia.org/Parsoid-PHP/master/