Re: [Cloud] Parsoid APIs?

7 Sep 2020

Hi Roy,

Some responses inline:

On Fri, Sep 4, 2020 at 6:41 PM Roy Smith &lt;roy(a)panix.com&gt; wrote:

...
  I know there's been a ton of work done of Parsoid
lately.  This is great,
 and the amount of effort that's gone into this functionality is really
 appreciated.  It's clear that Parsoid is the way of the future, but the
 documentation of how you get a Parsoid parse tree via an AP call isI kind
 of confusing.

 I found https://www.mediawiki.org/wiki/Parsoid/API, which looks like it's
 long out of date.  The last edit was almost 2 years ago.  As far as I can
 tell, most of what it says is obsolete, and refers to a series of /v3
 routes which don't actually exist.

This definitely looks outdated, I'll forward your email to the maintainers
so maybe they can have a look and update it.

...

 I also found https://en.wikipedia.org/api/rest_v1/#/Page%20content, which
 seems more in line with the current reality.  But, the call I was most
 interested in, /page/data-parsoid/{title}/{revision}/{tid}, doesn't
 actually respond (at least not on en.wikipedia.org).

Maybe you can share exactly how you are querying the API and the responses
you get, since this does seem to work fine for me (examples below). I think
these APIs are the ones VisualEditor uses so they should work appropriately.

I tried querying https://en.wikipedia.org/api/rest_v1/page/html/Banana first,
and got back the response. On it, you can get the revision and "tid" from
the ETag header, like it says on the swagger docs:

*ETag header indicating the revision and render timeuuid separated by a
slash: "701384379/154d7bca-c264-11e5-8c2f-1b51b33b59fc" This ETag can be
passed to the HTML save end point (as base_etag POST parameter), and can
also be used to retrieve the exact corresponding data-parsoid metadata, by
requesting the specific revision and tid indicated by the ETag.*
With that information, you can then compose the new API call URL:
https://en.wikipedia.org/api/rest_v1/page/data-parsoid/Banana/975959204/7e3…
that
should successfully respond with the metadata.

I'm not 100% clear on the difference between data-mw information on the
/page/html response vs the one found on the /page/data-parsoid response,
but anyhow you should be able to use both endpoints as needed that way.

...

 Eventually, I discovered (see this thread

<https://en.wikipedia.org/w/index.php?title=Wikipedia:Village_pump_(technical)&oldid=976731421#Parsing_wikitext_in_javascript?>),
 that the way to get a Parsoid parse tree is via the
 https://en.wikipedia.org/api/rest_v1/page/html/ route,  and digging the
 embedded JSON out of data-mw fragments scattered throughout the HTML.  This
 seems counter-intuitive.  And kind of awkward, since it's not even a full
 parse tree; it's just little snippets of parse trees, which I guess
 correspond to each template expansion?

I looked around and found
https://www.mediawiki.org/wiki/Specs/HTML/2.1.0 linked
on the Parsoid page, which has extensive documentation on how wikitext <->
HTML is translated. It seems to be more actively maintained. Hopefully this
can give you some insight on how the responses relate to the wikitext and
how to find what you want.

...

 So, taking a step backwards, my ultimate goal is to be able to parse the
 wikitext of a page and discover the template calls, with their arguments.
 On the server side, I'm doing this in Python with mwparserfromhell, which
 is fine.  But now I need to do it on the client side, in browser-executed
 javascript.  I've looked at a few client-side libraries, but if Parsoid
 really is ready for prime time, it seems silly not to use it, and it's just
 a question of finding the right API calls.

  You may be interested in the #Template_markup
<https://www.mediawiki.org/wiki/Specs/HTML/2.1.0#Template_markup> section
from the previous spec given your problem statement.

...

 _______________________________________________
 Wikimedia Cloud Services mailing list
 Cloud(a)lists.wikimedia.org (formerly labs-l(a)lists.wikimedia.org)
 https://lists.wikimedia.org/mailman/listinfo/cloud

2024

2023

2022

2021

2020

2019

2018

2017

Re: [Cloud] Parsoid APIs?