On Feb 28, 2014 12:52 PM, "Gabriel Wicke" <gwicke(a)wikimedia.org> wrote:
The Parsoid rendering (e.g. [1]) has pretty much all semantic
information in the DOM. There might still be wiktionary-specific issues
that we don't know about yet, but tasks like extracting template
parameters or the rendering of specific templates (IPA,..) are already
straightforward. Also see the DOM spec [2] for background.
Gabriel
Last time I tried doing anything like this was before parsoid existed, and
i'll admit my approach was probably the worst possible. However, the issue
was that each language formatted their pages differently, and some
languages did not format things consistently. I think there is a limit to
how much parsoid (or anything thats not AI) can help with that situation.
-bawolff