While parsing wiki code without specific python tools, I found a major problem into templates code, since regex can't manage so well nested structures. I solved such issue by a layman approach with a parseTemplate  routine, both in python and in javascript, which converts templates into a simple object (a dictionary + a list), coupled with another simple routine which rebuilds the template code from the original, or edited, object. The whole thing is - as I told - very rough and it has written for personal use only; but if anyone is interested about, please ask. 

Alex brollo

2014-06-08 23:47 GMT+02:00 Merlijn van Deen <valhallasw@arctus.nl>:
On 1 June 2014 01:57, Ricordisamoa <ricordisamoa@openmailbox.org> wrote:
Since gerrit:131263 , it seems to me that the excellent mwpfh is going to be used more and more extensively within our framework.
Am I right? For example, the DuplicateReferences detection and fix in reflinks.py could be brightly refactored without regular expressions.
Or are we supposed to do the opposite conversion, where possible?

My preference is to depend on mwpfh where possible - their parser support is much better than ours, and it makes much more sense to concentrate efforts in one place. However, there's one blocker for this: the Windows support of wmpfh. It uses a C extension, and it's hard to build C extensions under Windows -- so we'd need to help Windows users along installing it in some way. I've updated the issue at https://github.com/earwig/mwparserfromhell/issues/68 with some notes for that.


Pywikipedia-l mailing list