C. Scott Ananian wrote:
- no plan survives first encounter with the enemy. Parsoid was going to
be simpler than the PHP parser, Parsoid was going to be written in PHP, then C, then prototyped in JS for a later implementation in C, etc. It has varied over time as we learned more about the problem. It is currently written in node.js and probably is at least the same order of complexity as the existing PHP parser.
Hrm.
In many cases Parsoid could be greatly simplified if we didn't have to maintain compatibility with various strange corner cases in the PHP parser.
I guess this is the part that I'm still struggling with. If the PHP parser is/was already doing the job of converting to wikitext to HTML, why would that need to be rewritten in Node.js? Wouldn't it have been simpler to make the HTML output more verbose in the PHP parser so that it could cleanly round-trip? I'm still not clear where Node.js (or C or JavaScript) came into this. I heard there were performance concerns with the PHP parser. Was that the case?
I'm mostly just curious... you can't un-milk the cow, as they say.
But note that even as a full parser replacement Parsoid depends on the PHP API in a large number of ways: imageinfo, siteinfo, language information, localized keywords for images, etc. The idea of "independence" is somewhat vague.
Hrm.
MZMcBride