Since gerrit:131263 https://gerrit.wikimedia.org/r/131263/ , it seems to me that the excellent mwpfh is going to be used more and more extensively within our framework. Am I right? For example, the DuplicateReferences detection and fix in reflinks.py could be brightly refactored without regular expressions. Or are we supposed to do the opposite conversion, where possible?
On 1 June 2014 01:57, Ricordisamoa ricordisamoa@openmailbox.org wrote:
Since gerrit:131263 https://gerrit.wikimedia.org/r/131263/ , it seems to me that the excellent mwpfh is going to be used more and more extensively within our framework. Am I right? For example, the DuplicateReferences detection and fix in reflinks.py could be brightly refactored without regular expressions. Or are we supposed to do the opposite conversion, where possible?
My preference is to depend on mwpfh where possible - their parser support is much better than ours, and it makes much more sense to concentrate efforts in one place. However, there's one blocker for this: the Windows support of wmpfh. It uses a C extension, and it's hard to build C extensions under Windows -- so we'd need to help Windows users along installing it in some way. I've updated the issue at https://github.com/earwig/mwparserfromhell/issues/68 with some notes for that.
Merlijn
While parsing wiki code without specific python tools, I found a major problem into templates code, since regex can't manage so well nested structures. I solved such issue by a layman approach with a parseTemplate routine, both in python and in javascript, which converts templates into a simple object (a dictionary + a list), coupled with another simple routine which rebuilds the template code from the original, or edited, object. The whole thing is - as I told - very rough and it has written for personal use only; but if anyone is interested about, please ask.
Alex brollo
2014-06-08 23:47 GMT+02:00 Merlijn van Deen valhallasw@arctus.nl:
On 1 June 2014 01:57, Ricordisamoa ricordisamoa@openmailbox.org wrote:
Since gerrit:131263 https://gerrit.wikimedia.org/r/131263/ , it seems to me that the excellent mwpfh is going to be used more and more extensively within our framework. Am I right? For example, the DuplicateReferences detection and fix in reflinks.py could be brightly refactored without regular expressions. Or are we supposed to do the opposite conversion, where possible?
My preference is to depend on mwpfh where possible - their parser support is much better than ours, and it makes much more sense to concentrate efforts in one place. However, there's one blocker for this: the Windows support of wmpfh. It uses a C extension, and it's hard to build C extensions under Windows -- so we'd need to help Windows users along installing it in some way. I've updated the issue at https://github.com/earwig/mwparserfromhell/issues/68 with some notes for that.
Merlijn
Pywikipedia-l mailing list Pywikipedia-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/pywikipedia-l
Thank you, but I think we will keep our dependency on mwparserfromhell. Even though it has issues on Windows, it is way more powerful and reliable than any other wikicode parser in Python. And it does not only parse nested templates, but also wikilinks, external links and HTML tags.
Il 09/06/2014 10:49, Alex Brollo ha scritto:
While parsing wiki code without specific python tools, I found a major problem into templates code, since regex can't manage so well nested structures. I solved such issue by a layman approach with a parseTemplate routine, both in python and in javascript, which converts templates into a simple object (a dictionary + a list), coupled with another simple routine which rebuilds the template code from the original, or edited, object. The whole thing is - as I told - very rough and it has written for personal use only; but if anyone is interested about, please ask.
Alex brollo
Far from comparing my brief, layman scripts with mpfth, or incouraging anyone to switch away from mpfth to my scripts, my aim was only to mention parseTemplate() and to incourage anyone to write same routines both in python and in javascript, when possible.
Alex
2014-06-10 14:23 GMT+02:00 Ricordisamoa ricordisamoa@openmailbox.org:
Thank you, but I think we will keep our dependency on mwparserfromhell. Even though it has issues on Windows, it is way more powerful and reliable than any other wikicode parser in Python. And it does not only parse nested templates, but also wikilinks, external links and HTML tags.
Il 09/06/2014 10:49, Alex Brollo ha scritto:
While parsing wiki code without specific python tools, I found a major
problem into templates code, since regex can't manage so well nested structures. I solved such issue by a layman approach with a parseTemplate routine, both in python and in javascript, which converts templates into a simple object (a dictionary + a list), coupled with another simple routine which rebuilds the template code from the original, or edited, object. The whole thing is - as I told - very rough and it has written for personal use only; but if anyone is interested about, please ask.
Alex brollo
Pywikipedia-l mailing list Pywikipedia-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/pywikipedia-l
I am very much interested in tools that solve more problems than they cause. :-) Have you published it anywhere?
2014-06-09 10:49 GMT+02:00 Alex Brollo alex.brollo@gmail.com:
While parsing wiki code without specific python tools, I found a major problem into templates code, since regex can't manage so well nested structures. I solved such issue by a layman approach with a parseTemplate routine, both in python and in javascript, which converts templates into a simple object (a dictionary + a list), coupled with another simple routine which rebuilds the template code from the original, or edited, object. The whole thing is - as I told - very rough and it has written for personal use only; but if anyone is interested about, please ask.
Alex brollo
2014-06-08 23:47 GMT+02:00 Merlijn van Deen valhallasw@arctus.nl:
On 1 June 2014 01:57, Ricordisamoa ricordisamoa@openmailbox.org wrote:
Since gerrit:131263 https://gerrit.wikimedia.org/r/131263/ , it seems to me that the excellent mwpfh is going to be used more and more extensively within our framework. Am I right? For example, the DuplicateReferences detection and fix in reflinks.py could be brightly refactored without regular expressions. Or are we supposed to do the opposite conversion, where possible?
My preference is to depend on mwpfh where possible - their parser support is much better than ours, and it makes much more sense to concentrate efforts in one place. However, there's one blocker for this: the Windows support of wmpfh. It uses a C extension, and it's hard to build C extensions under Windows -- so we'd need to help Windows users along installing it in some way. I've updated the issue at https://github.com/earwig/mwparserfromhell/issues/68 with some notes for that.
Merlijn
Pywikipedia-l mailing list Pywikipedia-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/pywikipedia-l
Pywikipedia-l mailing list Pywikipedia-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/pywikipedia-l
Javascript version of parseTemplate() is presently "published" into it.wikisource pages, since it's part of our running tools library. Python version is presently for personal use, I can publish the code into a it.wikisource page. Keep into consideration that both are not tools, but only functions, to be used into simple tools. Thanks for interest, it incourages me to share them. :-) As soon as I'll publish them decently, I'll send you the reference off-list, then feeel free to do anything with them (to laugh, to use, to share).
Alex
2014-06-12 6:12 GMT+02:00 Bináris wikiposta@gmail.com:
I am very much interested in tools that solve more problems than they cause. :-) Have you published it anywhere?
2014-06-09 10:49 GMT+02:00 Alex Brollo alex.brollo@gmail.com:
While parsing wiki code without specific python tools, I found a major
problem into templates code, since regex can't manage so well nested structures. I solved such issue by a layman approach with a parseTemplate routine, both in python and in javascript, which converts templates into a simple object (a dictionary + a list), coupled with another simple routine which rebuilds the template code from the original, or edited, object. The whole thing is - as I told - very rough and it has written for personal use only; but if anyone is interested about, please ask.
Alex brollo
2014-06-08 23:47 GMT+02:00 Merlijn van Deen valhallasw@arctus.nl:
On 1 June 2014 01:57, Ricordisamoa ricordisamoa@openmailbox.org wrote:
Since gerrit:131263 https://gerrit.wikimedia.org/r/131263/ , it seems to me that the excellent mwpfh is going to be used more and more extensively within our framework. Am I right? For example, the DuplicateReferences detection and fix in reflinks.py could be brightly refactored without regular expressions. Or are we supposed to do the opposite conversion, where possible?
My preference is to depend on mwpfh where possible - their parser support is much better than ours, and it makes much more sense to concentrate efforts in one place. However, there's one blocker for this: the Windows support of wmpfh. It uses a C extension, and it's hard to build C extensions under Windows -- so we'd need to help Windows users along installing it in some way. I've updated the issue at https://github.com/earwig/mwparserfromhell/issues/68 with some notes for that.
Merlijn
Pywikipedia-l mailing list Pywikipedia-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/pywikipedia-l
Pywikipedia-l mailing list Pywikipedia-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/pywikipedia-l
-- Bináris
Pywikipedia-l mailing list Pywikipedia-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/pywikipedia-l
Thanks. I am interested in Python version as the regex template parsing is really incomplete and causes troubles in text replacements. I think I will be able to build a function into my copy of textlib.
2014-06-12 8:28 GMT+02:00 Alex Brollo alex.brollo@gmail.com:
Javascript version of parseTemplate() is presently "published" into it.wikisource pages, since it's part of our running tools library. Python version is presently for personal use, I can publish the code into a it.wikisource page. Keep into consideration that both are not tools, but only functions, to be used into simple tools. Thanks for interest, it incourages me to share them. :-) As soon as I'll publish them decently, I'll send you the reference off-list, then feeel free to do anything with them (to laugh, to use, to share).
Alex
2014-06-12 6:12 GMT+02:00 Bináris wikiposta@gmail.com:
I am very much interested in tools that solve more problems than they
cause. :-) Have you published it anywhere?
2014-06-09 10:49 GMT+02:00 Alex Brollo alex.brollo@gmail.com:
While parsing wiki code without specific python tools, I found a major
problem into templates code, since regex can't manage so well nested structures. I solved such issue by a layman approach with a parseTemplate routine, both in python and in javascript, which converts templates into a simple object (a dictionary + a list), coupled with another simple routine which rebuilds the template code from the original, or edited, object. The whole thing is - as I told - very rough and it has written for personal use only; but if anyone is interested about, please ask.
Alex brollo
2014-06-08 23:47 GMT+02:00 Merlijn van Deen valhallasw@arctus.nl:
On 1 June 2014 01:57, Ricordisamoa ricordisamoa@openmailbox.org wrote:
Since gerrit:131263 https://gerrit.wikimedia.org/r/131263/ , it seems to me that the excellent mwpfh is going to be used more and more extensively within our framework. Am I right? For example, the DuplicateReferences detection and fix in reflinks.py could be brightly refactored without regular expressions. Or are we supposed to do the opposite conversion, where possible?
My preference is to depend on mwpfh where possible - their parser support is much better than ours, and it makes much more sense to concentrate efforts in one place. However, there's one blocker for this: the Windows support of wmpfh. It uses a C extension, and it's hard to build C extensions under Windows -- so we'd need to help Windows users along installing it in some way. I've updated the issue at https://github.com/earwig/mwparserfromhell/issues/68 with some notes for that.
Merlijn
Pywikipedia-l mailing list Pywikipedia-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/pywikipedia-l
Pywikipedia-l mailing list Pywikipedia-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/pywikipedia-l
-- Bináris
Pywikipedia-l mailing list Pywikipedia-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/pywikipedia-l
Pywikipedia-l mailing list Pywikipedia-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/pywikipedia-l