Thanks for your reply.
Regarding the data-parsoid route, I can't reproduce the trouble I was having. I suspect I was just getting the /revision/tid part wrong.
Taking a step back, I think part of the problem was I apparently had an incorrect mental model of how parsoid works. I was envisioning something that took wikitext, parsed it into a semantic parse tree, (kind of like mwparserfromhell does), and then takes that parse tree and converts it to html. What I was trying to get at was the intermediate parse tree. Looking at https://www.mediawiki.org/wiki/Parsoid/API
, this appeared to be the pagebundle format, and I was groping around trying to find the API which exposed that. I looked at the /html routes and thought to myself, "No, that's not what I want. That's the HTML. I want the parse tree". So I was trying things like:
with :format set to "pagebundle". For example, I tried
I think the biggest thing that could be done to improve the documentation is to update https://www.mediawiki.org/wiki/Parsoid/API
. That's the page you get to most directly when searching for parsoid documentation.
Some responses inline:
I know there's been a ton of work done of Parsoid lately. This is great, and the amount of effort that's gone into this functionality is really appreciated. It's clear that Parsoid is the way of the future, but the documentation of how you get a Parsoid parse tree via an AP call isI kind of confusing.
I found https://www.mediawiki.org/wiki/Parsoid/API
, which looks like it's long out of date. The last edit was almost 2 years ago. As far as I can tell, most of what it says is obsolete, and refers to a series of /v3 routes which don't actually exist.
This definitely looks outdated, I'll forward your email to the maintainers so maybe they can have a look and update it.
Maybe you can share exactly how you are querying the API and the responses you get, since this does seem to work fine for me (examples below). I think these APIs are the ones VisualEditor uses so they should work appropriately.
I tried querying https://en.wikipedia.org/api/rest_v1/page/html/Banana
first, and got back the response. On it, you can get the revision and "tid" from the ETag header, like it says on the swagger docs:ETag header indicating the revision and render timeuuid separated by a slash: "701384379/154d7bca-c264-11e5-8c2f-1b51b33b59fc" This ETag can be passed to the HTML save end point (as base_etag POST parameter), and can also be used to retrieve the exact corresponding data-parsoid metadata, by requesting the specific revision and tid indicated by the ETag.
Eventually, I discovered (see this thread
), that the way to get a Parsoid parse tree is via the https://en.wikipedia.org/api/rest_v1/page/html/
route, and digging the embedded JSON out of data-mw fragments scattered throughout the HTML. This seems counter-intuitive. And kind of awkward, since it's not even a full parse tree; it's just little snippets of parse trees, which I guess correspond to each template expansion?
I looked around and found https://www.mediawiki.org/wiki/Specs/HTML/2.1.0
linked on the Parsoid page, which has extensive documentation on how wikitext <-> HTML is translated. It seems to be more actively maintained. Hopefully this can give you some insight on how the responses relate to the wikitext and how to find what you want.
You may be interested in the #Template_markup
section from the previous spec given your problem statement.
Wikimedia Cloud Services mailing list
Cloud@lists.wikimedia.org (formerly firstname.lastname@example.org)
Wikimedia Cloud Services mailing listCloud@lists.wikimedia.org