Gabriel Wicke, 10/06/2014 02:30:
If you haven't heard of it, thenhttps://www.mediawiki.org/wiki/Parsoid might be useful. It lets you work on HTML instead of wikitext, and can convert that HTML back to wikitext.
I'm also curious how the this work will interact with https://www.mediawiki.org/wiki/Content_translation, which is also based on Parsoid.
There is no interaction because PageMigration doesn't need to manipulate HTML. :)
The question might have been unclear: what would be interesting (if easily available) is the ability to input a wikitext and get as output *only* the wikitext "markup" i.e. everything except the "linguistic" plain text (with some approximation). So for the example at https://www.mediawiki.org/wiki/API:Parsing_wikitext#Example_2
[[foo]] [[API:Query|bar]] [http://www.example.com/ baz] -> [[]] [[API:Query|]] [http://www.example.com/ ]
or something like that.
AFAIK there are solutions to get the plain text, e.g. people often want to look up the text of a Wiktionary entry from the API (with varying degrees of success), but I'm not sure if there is something available to do the opposite or one would need to build it on top of those existing tools, by "subtraction".
Nemo