I am very much interested in tools that solve more problems than they cause. :-) Have you published it anywhere?
2014-06-09 10:49 GMT+02:00 Alex Brollo alex.brollo@gmail.com:
While parsing wiki code without specific python tools, I found a major problem into templates code, since regex can't manage so well nested structures. I solved such issue by a layman approach with a parseTemplate routine, both in python and in javascript, which converts templates into a simple object (a dictionary + a list), coupled with another simple routine which rebuilds the template code from the original, or edited, object. The whole thing is - as I told - very rough and it has written for personal use only; but if anyone is interested about, please ask.
Alex brollo
2014-06-08 23:47 GMT+02:00 Merlijn van Deen valhallasw@arctus.nl:
On 1 June 2014 01:57, Ricordisamoa ricordisamoa@openmailbox.org wrote:
Since gerrit:131263 https://gerrit.wikimedia.org/r/131263/ , it seems to me that the excellent mwpfh is going to be used more and more extensively within our framework. Am I right? For example, the DuplicateReferences detection and fix in reflinks.py could be brightly refactored without regular expressions. Or are we supposed to do the opposite conversion, where possible?
My preference is to depend on mwpfh where possible - their parser support is much better than ours, and it makes much more sense to concentrate efforts in one place. However, there's one blocker for this: the Windows support of wmpfh. It uses a C extension, and it's hard to build C extensions under Windows -- so we'd need to help Windows users along installing it in some way. I've updated the issue at https://github.com/earwig/mwparserfromhell/issues/68 with some notes for that.
Merlijn
Pywikipedia-l mailing list Pywikipedia-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/pywikipedia-l
Pywikipedia-l mailing list Pywikipedia-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/pywikipedia-l