On 9/7/20 10:15 AM, Roy Smith wrote:
Joaquin,
Thanks for your reply.
Regarding the data-parsoid route, I can't reproduce the trouble I was
having. I suspect I was just getting the /revision/tid part wrong.
Taking a step back, I think part of the problem was I apparently had
an incorrect mental model of how parsoid works. I was envisioning
something that took wikitext, parsed it into a semantic parse tree,
(kind of like mwparserfromhell does), and then takes that parse tree
and converts it to html. What I was trying to get at was the
intermediate parse tree. Looking at
https://www.mediawiki.org/wiki/Parsoid/API, this appeared to be the
pagebundle format, and I was groping around trying to find the API
which exposed that. I looked at the /html routes and thought to
myself, "No, that's not what I want. That's the HTML. I want the
parse tree".
Parsoid doesn't produce any intermediate parser tree. Parsoid's output
(HTML / DOM) is the canonical representation that captures wikitext
information for you and you can reliably get most information that you
want by inspecting that HTML based on the HTML spec Parsoid adheres to (
see
https://www.mediawiki.org/wiki/Specs/HTML ). There are caveats in
that Parsoid doesn't give you detailed information about nested
templates when templates are parsed, but most usecases don't need that.
So, if you parse Parsoid's HTML into DOM, you get the "parse tree" that
you want. You can the modify the HTML appropriately and as long as your
output confirms to Parsoid's HTML spec, you can post that HTML to
Parsoid and have it converted to wikitext.
For example,
https://github.com/wikimedia/parsoid-jsapi is a library
(now defunct since Parsoid/JS is not going to be maintained) that uses
Parsoid's DOM as the wikitext parse tree and replicates mwparserfromhell
functionality.
We haven't built anything equivalent for the PHP version of Parsoid yet.
However, Kunal (@legoktm) has built a Rust version of this. See
https://docs.rs/parsoid/0.2.0/parsoid/ ... So, if Rust is your thing,
you can use that library to manipulate wikitext similar to
mwparserfromhell. But if not, for now, you will still have to work with
a DOM to replicate mwparserfromhell functionality.
Eventually, hopefully, other language implementations will show up and
we expect much of the functionality provided by mwparserfromhell will be
available. But, mwparserfromhell is usable on dumps which you currently
cannot use Parsoid for. If you really wanted to, you can if you do a
whole bunch of additional work, but for all practical purposes, it is
non-trivial. So, that usecase is still not something we have targeted
for now.
I think the biggest thing that could be done to
improve the
documentation is to update
https://www.mediawiki.org/wiki/Parsoid/API.
That's the page you get to most directly when searching for parsoid
documentation.
As I indicated in my previous response, the information on that page is
accurate. Given the responses in this thread, what would be most helpful
wrt updating that page to eliminate some of the confusion around Parsoid
vs. RESTBase? Feel free to edit the page directly or email me privately
or respond on this thread and we'll tweak it approriately.
Thanks,
Subbu.