Thank you both for the detailed replies. They were very helpful and I feel like I have a better understanding now. I'm still trying to wrap my head around Parsoid, its implementation, and how it fits in with the larger future of MediaWiki development.
Subramanya Sastry wrote:
The core parser has the following components:
- preprocessing that expands transclusions, extensions (including
Scribunto), parser functions, include directives, etc. to wikitext
- wikitext parser that converts wikitext to html
- Tidy that runs on the html produced by wikitext parser and fixes up
malformed html
Parsoid right now replaces the last two of the three components, but in a way that enables all of the functionality stated earlier.
Are you saying Parsoid can act as a replacement for HTMLTidy? That seems like a pretty huge win. Replacing Tidy has been a longstanding goal: https://phabricator.wikimedia.org/T4542.
But, there are several directions this can go from here (including implementing a preprocessor in Parsoid, for example). However, note that this discussion is not entirely about Parsoid but also about shared hosting support, mediawiki packaging, pure PHP mediawiki install, HTML-only wikis, etc. All those other decisions inform what Parsoid should focus on and how it evolves.
I think this is very well put. There's definitely a lot to think about.
MZMcBride