I am so glad that someone re-re-resurrects this topic :-)
On Fri, Oct 23, 2009 at 1:27 PM, Andrew Dunbar hippytrail@gmail.com wrote:
I've been spending hours on the parsing now and don't find it simple at all due to the fact that templates can be nested. Just extracting the Infobox as one big lump is hard due to the need to match nested {{ and }}
Not perfect, but try http://toolserver.org/~magnus/wiki2xml/w2x.php
1. Unckeck "Use API", chose "Do not use templates" 2. Enter article name(s) 3. Get XML 4. Parse XML, re-submit the wiki text in templates to process the next level of templates
I should really offer #4 in this...
Caveat: Will break on things like HTML attributes that are filled by templates etc.
Cheers, Magnus