On 04/07/2014 05:35 AM, Hannes Röst wrote:
== Template parser ==
https://github.com/hroest/pywikibot-compat/tree/feature/template_parser
For one bot project on the German Wikipedia I had to parse rather complex templates and replace specific fields. The templates would contain nested templates, math formulas and references inside. I thus wrote a template parser which would parse these templates and return them as key-value pairs which would make it easy to query specific keys and replace their values. The code worked well on several thousand templates of the German chemistry project and should be rather straightforward to use. This is library code, so there is no bot associated with it, see templateparser.py and tests/test_templateparser.py
In order to correctly handle nesting and properly differentiate equal signs belonging to key-value pairs from those in mathematical formulas etc, I also had to write a partial wikimedia syntax parser which would recognize such syntax in wikitext. This code is in textrange_parser.py and allows to extract specific parts of a text (e.g. wikitables, templates, wikilinks, weblinks), tests are in tests/test_textrange_parser.py
Have you looked into using mwparserfromhell[1]? It's a true parser which even has C speedups. Support for it is already in pywikibot, it's just not turned on by default.
[1] https://github.com/earwig/mwparserfromhell
-- Legoktm