On 04/07/2014 05:35 AM, Hannes Röst wrote:
== Template parser ==
https://github.com/hroest/pywikibot-compat/tree/feature/template_parser
For one bot project on the German Wikipedia I had to parse rather complex
templates and replace specific fields. The templates would contain nested
templates, math formulas and references inside. I thus wrote a template parser
which would parse these templates and return them as key-value pairs which
would make it easy to query specific keys and replace their values. The code
worked well on several thousand templates of the German chemistry project and
should be rather straightforward to use. This is library code, so there is no
bot associated with it, see templateparser.py and tests/test_templateparser.py
In order to correctly handle nesting and properly differentiate equal signs
belonging to key-value pairs from those in mathematical formulas etc, I also
had to write a partial wikimedia syntax parser which would recognize such
syntax in wikitext. This code is in textrange_parser.py and allows to extract
specific parts of a text (e.g. wikitables, templates, wikilinks, weblinks),
tests are in tests/test_textrange_parser.py
Have you looked into using mwparserfromhell[1]? It's a true parser which
even has C speedups. Support for it is already in pywikibot, it's just
not turned on by default.
[1]
https://github.com/earwig/mwparserfromhell
-- Legoktm