Given what I've seen so far, it might be best to aim for a gradual reimplementation of Parsoid features to make most of them work without a need for Parsoid, with the eventual goal of severing the need for Parsoid completely if possible. At any rate, the less the parser has to outsource, the less complicated things will be, correct?
Date: Tue, 20 Jan 2015 11:02:10 -0500 From: cananian@wikimedia.org To: wikitech-l@lists.wikimedia.org Subject: Re: [Wikitech-l] Parsoid's progress
I believe Subbu will follow up with a more complete response, but I'll note that:
- no plan survives first encounter with the enemy. Parsoid was going to
be simpler than the PHP parser, Parsoid was going to be written in PHP, then C, then prototyped in JS for a later implementation in C, etc. It has varied over time as we learned more about the problem. It is currently written in node.js and probably is at least the same order of complexity as the existing PHP parser. It is, however, built on slightly more solid foundations, so its behavior is more regular than the PHP parser in many places -- although I've been submitting patches to the core parser where necessary to try to bring them closer together. (c.f. https://gerrit.wikimedia.org/r/180982 for the most recent of these.) And, of course, Parsoid emits well-formed HTML which can be round-tripped.
In many cases Parsoid could be greatly simplified if we didn't have to maintain compatibility with various strange corner cases in the PHP parser.
- Parsoid contains a partial implementation of the PHP expandtemplates
module. It was decided (I think wisely) that we didn't really gain anything by trying to reimplement this on the Parsoid side, though, and it was better to use the existing PHP code via api.php. The alternative would be to basically reimplement quite a lot of mediawiki (lua embedding, the various parser functions extensions, etc) in node.js. This *could* be done -- there is no technical reason why it cannot -- but nobody thinks it's a good idea to spend time on right now.
But the expandtemplates stuff basically works. As I said, it doesn't contain all the crazy extensions that we use on the main WMF sites, but it would be reasonable to turn it on for a smaller stock mediawiki instance. In that sense it *could* be a full replacement for the Parser.
But note that even as a full parser replacement Parsoid depends on the PHP API in a large number of ways: imageinfo, siteinfo, language information, localized keywords for images, etc. The idea of "independence" is somewhat vague. --scott
On Mon, Jan 19, 2015 at 11:58 PM, MZMcBride z@mzmcbride.com wrote:
Matthew Flaschen wrote:
On 01/19/2015 08:15 AM, MZMcBride wrote:
And from this question flows another: why is Parsoid calling MediaWiki's api.php so regularly?
I think it uses it for some aspects of templates and hooks. I'm sure the Parsoid team could explain further.
I've been discussing Parsoid a bit and there's apparently an important distinction between the preprocessor(s) and the parser. Though in practice I think "parser" is used pretty generically. Further notes follow.
I'm told in Parsoid, <ref> and {{!}} are special-cased, while most other parser functions require using the expandtemplates module of MediaWiki's api.php. As I understand it, calling out to api.php is intended to be a permanent solution (I thought it might be a temporary shim).
If the goal was to just add more verbose markup to parser output, couldn't we just have done that (in PHP)? Node.js was chosen over PHP due to speed/performance considerations and concerns, from what I now understand.
The view that Parsoid is going to replace the PHP parser seems to be overly simplistic and goes back to the distinction between the parser and preprocessor. Full wikitext transformation seems to require a preprocessor.
MZMcBride
Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
-- (http://cscott.net) _______________________________________________ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l