Dear all,
is there an API for extraction the previous section title from a wikipage? My situation is the following. I have a wikipage that looks like that: <page> intro ==section1== text <math id=1>...</math> text <math id=2>...</math> ===section 2=== text <math id=3>...</math> </page> And I want to know the previous section title for each math object in that page 1->section1 2->section1 3->section 2
It's certainly doable to write a program that extracts that that information from the wikipage... but I guess seldom special cases would cause a lot of long tail trouble.
So is there a API that could be used for that. Both parsoid or the old regular parser works for me.
Best Physikerwelt
Moritz,
you can certainly do this in HTML, either using the PHP parser output or Parsoid. Parsoid output makes it easier to identify math extension output. If you need the wikitext for the heading, then Parsoid can also give you the source offsets of the that in data-parsoid (see the dsr property in there, it encodes startOffset, endOffset, startTagWidth, endTagWidth).
Gabriel
wikitech-l@lists.wikimedia.org