[Pywikipedia-l] mwparserfromhell

List overview All Threads
Download

newer

older

Re: [Pywikipedia-l] site.lang and...

[Pywikipedia-l] Wikidata API...

Ricordisamoa

1 Jun 2014 1 Jun '14

6:57 a.m.

Since gerrit:131263 https://gerrit.wikimedia.org/r/131263/ , it seems to me that the excellent mwpfh is going to be used more and more extensively within our framework. Am I right? For example, the DuplicateReferences detection and fix in reflinks.py could be brightly refactored without regular expressions. Or are we supposed to do the opposite conversion, where possible?

Attachments:

attachment.htm (text/html — 678 bytes)

Show replies by date

Merlijn van Deen

9 Jun 9 Jun

4:47 a.m.

On 1 June 2014 01:57, Ricordisamoa ricordisamoa@openmailbox.org wrote:

...

Since gerrit:131263 https://gerrit.wikimedia.org/r/131263/ , it seems to me that the excellent mwpfh is going to be used more and more extensively within our framework. Am I right? For example, the DuplicateReferences detection and fix in reflinks.py could be brightly refactored without regular expressions. Or are we supposed to do the opposite conversion, where possible?

My preference is to depend on mwpfh where possible - their parser support is much better than ours, and it makes much more sense to concentrate efforts in one place. However, there's one blocker for this: the Windows support of wmpfh. It uses a C extension, and it's hard to build C extensions under Windows -- so we'd need to help Windows users along installing it in some way. I've updated the issue at https://github.com/earwig/mwparserfromhell/issues/68 with some notes for that.

Merlijn

Alex Brollo

3:49 p.m.

While parsing wiki code without specific python tools, I found a major problem into templates code, since regex can't manage so well nested structures. I solved such issue by a layman approach with a parseTemplate routine, both in python and in javascript, which converts templates into a simple object (a dictionary + a list), coupled with another simple routine which rebuilds the template code from the original, or edited, object. The whole thing is - as I told - very rough and it has written for personal use only; but if anyone is interested about, please ask.

Alex brollo

2014-06-08 23:47 GMT+02:00 Merlijn van Deen valhallasw@arctus.nl:

...

On 1 June 2014 01:57, Ricordisamoa ricordisamoa@openmailbox.org wrote:

...
Since gerrit:131263 https://gerrit.wikimedia.org/r/131263/ , it seems to me that the excellent mwpfh is going to be used more and more extensively within our framework. Am I right? For example, the DuplicateReferences detection and fix in reflinks.py could be brightly refactored without regular expressions. Or are we supposed to do the opposite conversion, where possible?

My preference is to depend on mwpfh where possible - their parser support is much better than ours, and it makes much more sense to concentrate efforts in one place. However, there's one blocker for this: the Windows support of wmpfh. It uses a C extension, and it's hard to build C extensions under Windows -- so we'd need to help Windows users along installing it in some way. I've updated the issue at https://github.com/earwig/mwparserfromhell/issues/68 with some notes for that.

Merlijn

Pywikipedia-l mailing list Pywikipedia-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/pywikipedia-l

Ricordisamoa

10 Jun 10 Jun

7:23 p.m.

Thank you, but I think we will keep our dependency on mwparserfromhell. Even though it has issues on Windows, it is way more powerful and reliable than any other wikicode parser in Python. And it does not only parse nested templates, but also wikilinks, external links and HTML tags.

Il 09/06/2014 10:49, Alex Brollo ha scritto:

...

While parsing wiki code without specific python tools, I found a major problem into templates code, since regex can't manage so well nested structures. I solved such issue by a layman approach with a parseTemplate routine, both in python and in javascript, which converts templates into a simple object (a dictionary + a list), coupled with another simple routine which rebuilds the template code from the original, or edited, object. The whole thing is - as I told - very rough and it has written for personal use only; but if anyone is interested about, please ask.

Alex brollo

Alex Brollo

10:05 p.m.

Far from comparing my brief, layman scripts with mpfth, or incouraging anyone to switch away from mpfth to my scripts, my aim was only to mention parseTemplate() and to incourage anyone to write same routines both in python and in javascript, when possible.

Alex

2014-06-10 14:23 GMT+02:00 Ricordisamoa ricordisamoa@openmailbox.org:

...

Thank you, but I think we will keep our dependency on mwparserfromhell. Even though it has issues on Windows, it is way more powerful and reliable than any other wikicode parser in Python. And it does not only parse nested templates, but also wikilinks, external links and HTML tags.

Il 09/06/2014 10:49, Alex Brollo ha scritto:

While parsing wiki code without specific python tools, I found a major

...
problem into templates code, since regex can't manage so well nested structures. I solved such issue by a layman approach with a parseTemplate routine, both in python and in javascript, which converts templates into a simple object (a dictionary + a list), coupled with another simple routine which rebuilds the template code from the original, or edited, object. The whole thing is - as I told - very rough and it has written for personal use only; but if anyone is interested about, please ask.

Alex brollo

Pywikipedia-l mailing list Pywikipedia-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/pywikipedia-l

Bináris

12 Jun 12 Jun

11:12 a.m.

I am very much interested in tools that solve more problems than they cause. :-) Have you published it anywhere?

2014-06-09 10:49 GMT+02:00 Alex Brollo alex.brollo@gmail.com:

...

While parsing wiki code without specific python tools, I found a major problem into templates code, since regex can't manage so well nested structures. I solved such issue by a layman approach with a parseTemplate routine, both in python and in javascript, which converts templates into a simple object (a dictionary + a list), coupled with another simple routine which rebuilds the template code from the original, or edited, object. The whole thing is - as I told - very rough and it has written for personal use only; but if anyone is interested about, please ask.

Alex brollo

2014-06-08 23:47 GMT+02:00 Merlijn van Deen valhallasw@arctus.nl:

...
On 1 June 2014 01:57, Ricordisamoa ricordisamoa@openmailbox.org wrote:

...
Since gerrit:131263 https://gerrit.wikimedia.org/r/131263/ , it seems to me that the excellent mwpfh is going to be used more and more extensively within our framework. Am I right? For example, the DuplicateReferences detection and fix in reflinks.py could be brightly refactored without regular expressions. Or are we supposed to do the opposite conversion, where possible?

My preference is to depend on mwpfh where possible - their parser support is much better than ours, and it makes much more sense to concentrate efforts in one place. However, there's one blocker for this: the Windows support of wmpfh. It uses a C extension, and it's hard to build C extensions under Windows -- so we'd need to help Windows users along installing it in some way. I've updated the issue at https://github.com/earwig/mwparserfromhell/issues/68 with some notes for that.

Merlijn

Pywikipedia-l mailing list Pywikipedia-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/pywikipedia-l

Pywikipedia-l mailing list Pywikipedia-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/pywikipedia-l

-- Bináris

Alex Brollo

1:28 p.m.

Javascript version of parseTemplate() is presently "published" into it.wikisource pages, since it's part of our running tools library. Python version is presently for personal use, I can publish the code into a it.wikisource page. Keep into consideration that both are not tools, but only functions, to be used into simple tools. Thanks for interest, it incourages me to share them. :-) As soon as I'll publish them decently, I'll send you the reference off-list, then feeel free to do anything with them (to laugh, to use, to share).

Alex

2014-06-12 6:12 GMT+02:00 Bináris wikiposta@gmail.com:

...

I am very much interested in tools that solve more problems than they cause. :-) Have you published it anywhere?

2014-06-09 10:49 GMT+02:00 Alex Brollo alex.brollo@gmail.com:

While parsing wiki code without specific python tools, I found a major

...
problem into templates code, since regex can't manage so well nested structures. I solved such issue by a layman approach with a parseTemplate routine, both in python and in javascript, which converts templates into a simple object (a dictionary + a list), coupled with another simple routine which rebuilds the template code from the original, or edited, object. The whole thing is - as I told - very rough and it has written for personal use only; but if anyone is interested about, please ask.

Alex brollo

2014-06-08 23:47 GMT+02:00 Merlijn van Deen valhallasw@arctus.nl:

...
On 1 June 2014 01:57, Ricordisamoa ricordisamoa@openmailbox.org wrote:

...
Since gerrit:131263 https://gerrit.wikimedia.org/r/131263/ , it seems to me that the excellent mwpfh is going to be used more and more extensively within our framework. Am I right? For example, the DuplicateReferences detection and fix in reflinks.py could be brightly refactored without regular expressions. Or are we supposed to do the opposite conversion, where possible?

My preference is to depend on mwpfh where possible - their parser support is much better than ours, and it makes much more sense to concentrate efforts in one place. However, there's one blocker for this: the Windows support of wmpfh. It uses a C extension, and it's hard to build C extensions under Windows -- so we'd need to help Windows users along installing it in some way. I've updated the issue at https://github.com/earwig/mwparserfromhell/issues/68 with some notes for that.

Merlijn

Pywikipedia-l mailing list Pywikipedia-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/pywikipedia-l

Pywikipedia-l mailing list Pywikipedia-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/pywikipedia-l

-- Bináris

Pywikipedia-l mailing list Pywikipedia-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/pywikipedia-l

Bináris

1:36 p.m.

Thanks. I am interested in Python version as the regex template parsing is really incomplete and causes troubles in text replacements. I think I will be able to build a function into my copy of textlib.

2014-06-12 8:28 GMT+02:00 Alex Brollo alex.brollo@gmail.com:

...

Javascript version of parseTemplate() is presently "published" into it.wikisource pages, since it's part of our running tools library. Python version is presently for personal use, I can publish the code into a it.wikisource page. Keep into consideration that both are not tools, but only functions, to be used into simple tools. Thanks for interest, it incourages me to share them. :-) As soon as I'll publish them decently, I'll send you the reference off-list, then feeel free to do anything with them (to laugh, to use, to share).

Alex

2014-06-12 6:12 GMT+02:00 Bináris wikiposta@gmail.com:

I am very much interested in tools that solve more problems than they

...
cause. :-) Have you published it anywhere?

2014-06-09 10:49 GMT+02:00 Alex Brollo alex.brollo@gmail.com:

While parsing wiki code without specific python tools, I found a major

...
problem into templates code, since regex can't manage so well nested structures. I solved such issue by a layman approach with a parseTemplate routine, both in python and in javascript, which converts templates into a simple object (a dictionary + a list), coupled with another simple routine which rebuilds the template code from the original, or edited, object. The whole thing is - as I told - very rough and it has written for personal use only; but if anyone is interested about, please ask.

Alex brollo

2014-06-08 23:47 GMT+02:00 Merlijn van Deen valhallasw@arctus.nl:

...
On 1 June 2014 01:57, Ricordisamoa ricordisamoa@openmailbox.org wrote:

...
Since gerrit:131263 https://gerrit.wikimedia.org/r/131263/ , it seems to me that the excellent mwpfh is going to be used more and more extensively within our framework. Am I right? For example, the DuplicateReferences detection and fix in reflinks.py could be brightly refactored without regular expressions. Or are we supposed to do the opposite conversion, where possible?

My preference is to depend on mwpfh where possible - their parser support is much better than ours, and it makes much more sense to concentrate efforts in one place. However, there's one blocker for this: the Windows support of wmpfh. It uses a C extension, and it's hard to build C extensions under Windows -- so we'd need to help Windows users along installing it in some way. I've updated the issue at https://github.com/earwig/mwparserfromhell/issues/68 with some notes for that.

Merlijn

Pywikipedia-l mailing list Pywikipedia-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/pywikipedia-l

Pywikipedia-l mailing list Pywikipedia-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/pywikipedia-l

-- Bináris

Pywikipedia-l mailing list Pywikipedia-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/pywikipedia-l

Pywikipedia-l mailing list Pywikipedia-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/pywikipedia-l

-- Bináris

3815

Age (days ago)

3827

Last active (days ago)

pywikibot@lists.wikimedia.org

7 comments

4 participants

tags (0)

participants (4)

Alex Brollo
Bináris
Merlijn van Deen
Ricordisamoa