On Feb 28, 2014 12:52 PM, "Gabriel Wicke" gwicke@wikimedia.org wrote:
The Parsoid rendering (e.g. [1]) has pretty much all semantic information in the DOM. There might still be wiktionary-specific issues that we don't know about yet, but tasks like extracting template parameters or the rendering of specific templates (IPA,..) are already straightforward. Also see the DOM spec [2] for background.
Gabriel
Last time I tried doing anything like this was before parsoid existed, and i'll admit my approach was probably the worst possible. However, the issue was that each language formatted their pages differently, and some languages did not format things consistently. I think there is a limit to how much parsoid (or anything thats not AI) can help with that situation.
-bawolff