Oversimplification of pywikipediabot - intended or work in progress? - Pywikipedia-l

List overview All Threads
Download

newer

Oversimplification of pywikipediabot - intended or work in progress?

older

Tracking bug for feature parity...

Re: [Pywikipedia-l]...

Strainu

3 Sep 2013 3 Sep '13

10:39 p.m.

Hello,

I'm trying to convert a fairly large set of scripts from compat to core and I found a significant loss of functionality in getting image and template info. While writing this, I've noticed that the latest version of core also has some of these prblems. I will elaborate on this loss of functionality below, but I would like to know if this simplification is intended or if this is part of some work in progress.

For the image parsing, the function linkedPages(withImageLinks = True) used to provide images that were not included through templates, while imageLinks would provide all the images. In core, the linkedPages function no longer provides this capability, and I haven't found any replacement (I ported the old function in my code)

For template parsing, templatesWithParams from class Page used to provide a pair containing the template name and a list of parameters, with the full "key=value" string. Nowadays, we're getting a dictionary instead of that list. Normally there is nothing wrong with that, except that in Python 2 the dictionary is unordered, which means that: * the order of the parameters is forever lost * the original text cannot be reconstructed (because of the above and the missing whitespace information) - this means there is no easy way to identify and/or replace a particular instance of the template in a page with many identical templates. It used to be you could do it with simple find/replace operations, now it takes some more work.

I personally would like to have the old behavior back, it would save me and probably others a lot of work.

Thanks, Strainu

Show replies by date

Morten Wang

4 Sep 4 Sep

9:25 p.m.

New subject: Oversimplification of pywikipediabot - intended or work in progress?

I'm no expert when it comes to template manipulation using bots, but for the little template stuff I do I have found mwparserfromhell ( https://github.com/earwig/mwparserfromhell) to be a nice library that makes the job fairly easy, so you might want to look into that for your template needs.

Cheers, Morten

On 3 September 2013 15:39, Strainu strainu10@gmail.com wrote:

...

Hello,

I'm trying to convert a fairly large set of scripts from compat to core and I found a significant loss of functionality in getting image and template info. While writing this, I've noticed that the latest version of core also has some of these prblems. I will elaborate on this loss of functionality below, but I would like to know if this simplification is intended or if this is part of some work in progress.

For the image parsing, the function linkedPages(withImageLinks = True) used to provide images that were not included through templates, while imageLinks would provide all the images. In core, the linkedPages function no longer provides this capability, and I haven't found any replacement (I ported the old function in my code)

For template parsing, templatesWithParams from class Page used to provide a pair containing the template name and a list of parameters, with the full "key=value" string. Nowadays, we're getting a dictionary instead of that list. Normally there is nothing wrong with that, except that in Python 2 the dictionary is unordered, which means that:

the order of the parameters is forever lost

the original text cannot be reconstructed (because of the above and

the missing whitespace information) - this means there is no easy way to identify and/or replace a particular instance of the template in a page with many identical templates. It used to be you could do it with simple find/replace operations, now it takes some more work.

I personally would like to have the old behavior back, it would save me and probably others a lot of work.

Thanks, Strainu

Pywikipedia-l mailing list Pywikipedia-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/pywikipedia-l

legoktm

5 Sep 5 Sep

6:09 a.m.

New subject: Oversimplification of pywikipediabot - intended or work in progress?

Hi Strainu,

It's definitely a work in progress. There are very few features that have intentionally been removed, and all of those are explicitly marked with a "@deprecated" tag.

If there are features that you notice are missing that used to be available, filing a bug would be helpful so we can keep track of what still needs to be implemented.

Regarding templatesWithParams, using mwparserfromhell is probably the right way to go. I've started working on integrating that into the framework, but haven't gotten very far yet. We could probably use an OrderedDict[1] there too.

[1] http://docs.python.org/2/library/collections.html#collections.OrderedDict -- Legoktm

On Tue, Sep 3, 2013 at 1:39 PM, Strainu strainu10@gmail.com wrote:

...

Hello,

I'm trying to convert a fairly large set of scripts from compat to core and I found a significant loss of functionality in getting image and template info. While writing this, I've noticed that the latest version of core also has some of these prblems. I will elaborate on this loss of functionality below, but I would like to know if this simplification is intended or if this is part of some work in progress.

For the image parsing, the function linkedPages(withImageLinks = True) used to provide images that were not included through templates, while imageLinks would provide all the images. In core, the linkedPages function no longer provides this capability, and I haven't found any replacement (I ported the old function in my code)

For template parsing, templatesWithParams from class Page used to provide a pair containing the template name and a list of parameters, with the full "key=value" string. Nowadays, we're getting a dictionary instead of that list. Normally there is nothing wrong with that, except that in Python 2 the dictionary is unordered, which means that:

the order of the parameters is forever lost

the original text cannot be reconstructed (because of the above and

the missing whitespace information) - this means there is no easy way to identify and/or replace a particular instance of the template in a page with many identical templates. It used to be you could do it with simple find/replace operations, now it takes some more work.

I personally would like to have the old behavior back, it would save me and probably others a lot of work.

Thanks, Strainu

Pywikipedia-l mailing list Pywikipedia-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/pywikipedia-l

Strainu

1:34 p.m.

New subject: Oversimplification of pywikipediabot - intended or work in progress?

2013/9/5 legoktm legoktm.wikipedia@gmail.com:

...

Hi Strainu,

It's definitely a work in progress. There are very few features that have intentionally been removed, and all of those are explicitly marked with a "@deprecated" tag.

Glad to hear that!

...

If there are features that you notice are missing that used to be available, filing a bug would be helpful so we can keep track of what still needs to be implemented.

What's the preferred bugtracker these days?

...

Regarding templatesWithParams, using mwparserfromhell is probably the right way to go. I've started working on integrating that into the framework, but haven't gotten very far yet. We could probably use an OrderedDict[1] there too.

I thought templatesWithParams already uses mwparserfromhell by default?

Strainu

...

[1] http://docs.python.org/2/library/collections.html#collections.OrderedDict -- Legoktm

On Tue, Sep 3, 2013 at 1:39 PM, Strainu strainu10@gmail.com wrote:

...
Hello,

I'm trying to convert a fairly large set of scripts from compat to core and I found a significant loss of functionality in getting image and template info. While writing this, I've noticed that the latest version of core also has some of these prblems. I will elaborate on this loss of functionality below, but I would like to know if this simplification is intended or if this is part of some work in progress.

For the image parsing, the function linkedPages(withImageLinks = True) used to provide images that were not included through templates, while imageLinks would provide all the images. In core, the linkedPages function no longer provides this capability, and I haven't found any replacement (I ported the old function in my code)

For template parsing, templatesWithParams from class Page used to provide a pair containing the template name and a list of parameters, with the full "key=value" string. Nowadays, we're getting a dictionary instead of that list. Normally there is nothing wrong with that, except that in Python 2 the dictionary is unordered, which means that:

the order of the parameters is forever lost

the original text cannot be reconstructed (because of the above and

the missing whitespace information) - this means there is no easy way to identify and/or replace a particular instance of the template in a page with many identical templates. It used to be you could do it with simple find/replace operations, now it takes some more work.

I personally would like to have the old behavior back, it would save me and probably others a lot of work.

Thanks, Strainu

Pywikipedia-l mailing list Pywikipedia-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/pywikipedia-l

Pywikipedia-l mailing list Pywikipedia-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/pywikipedia-l

Alex Brollo

3:38 p.m.

New subject: Oversimplification of pywikipediabot - intended or work in progress?

About parsing templates, I wrote a python function parseTemplate(), it returns a dictionary name:value and a list, the order of parameters being recorded into the latter. There's too a rewrite function (to rebuild the template code into a "beautified" style from dict+list) and a javascript version of both, using the same algorithm and the same idea.

Both are too rough to be implemented into "canonical" collections of scripts.

Alex

2013/9/5 Strainu strainu10@gmail.com

...

2013/9/5 legoktm legoktm.wikipedia@gmail.com:

...
Hi Strainu,

It's definitely a work in progress. There are very few features that have intentionally been removed, and all of those are explicitly marked with a "@deprecated" tag.

Glad to hear that!

...
If there are features that you notice are missing that used to be

available,

...
filing a bug would be helpful so we can keep track of what still needs

to be

...
implemented.

What's the preferred bugtracker these days?

...
Regarding templatesWithParams, using mwparserfromhell is probably the

right

...
way to go. I've started working on integrating that into the framework,

but

...
haven't gotten very far yet. We could probably use an OrderedDict[1]

there

...
too.

I thought templatesWithParams already uses mwparserfromhell by default?

Strainu

...
[1]

http://docs.python.org/2/library/collections.html#collections.OrderedDict

...
-- Legoktm

On Tue, Sep 3, 2013 at 1:39 PM, Strainu strainu10@gmail.com wrote:

...
Hello,

I'm trying to convert a fairly large set of scripts from compat to core and I found a significant loss of functionality in getting image and template info. While writing this, I've noticed that the latest version of core also has some of these prblems. I will elaborate on this loss of functionality below, but I would like to know if this simplification is intended or if this is part of some work in progress.

For the image parsing, the function linkedPages(withImageLinks = True) used to provide images that were not included through templates, while imageLinks would provide all the images. In core, the linkedPages function no longer provides this capability, and I haven't found any replacement (I ported the old function in my code)

For template parsing, templatesWithParams from class Page used to provide a pair containing the template name and a list of parameters, with the full "key=value" string. Nowadays, we're getting a dictionary instead of that list. Normally there is nothing wrong with that, except that in Python 2 the dictionary is unordered, which means that:

the order of the parameters is forever lost

the original text cannot be reconstructed (because of the above and

the missing whitespace information) - this means there is no easy way to identify and/or replace a particular instance of the template in a page with many identical templates. It used to be you could do it with simple find/replace operations, now it takes some more work.

I personally would like to have the old behavior back, it would save me and probably others a lot of work.

Thanks, Strainu

Pywikipedia-l mailing list Pywikipedia-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/pywikipedia-l

Pywikipedia-l mailing list Pywikipedia-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/pywikipedia-l

Pywikipedia-l mailing list Pywikipedia-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/pywikipedia-l

Strainu

18 Oct 18 Oct

6:39 p.m.

New subject: Oversimplification of pywikipediabot - intended or work in progress?

2013/9/5 legoktm legoktm.wikipedia@gmail.com:

...

Hi Strainu,

It's definitely a work in progress. There are very few features that have intentionally been removed, and all of those are explicitly marked with a "@deprecated" tag.

If there are features that you notice are missing that used to be available, filing a bug would be helpful so we can keep track of what still needs to be implemented.

I've submitted bugs 55881 (for image retrieval) and 55882 (for template parsing)

...

Regarding templatesWithParams, using mwparserfromhell is probably the right way to go. I've started working on integrating that into the framework, but haven't gotten very far yet. We could probably use an OrderedDict[1] there too.

I've noted that into the bug. I might be able to provide a patch for that.

...

[1] http://docs.python.org/2/library/collections.html#collections.OrderedDict -- Legoktm

On Tue, Sep 3, 2013 at 1:39 PM, Strainu strainu10@gmail.com wrote:

...
Hello,

I'm trying to convert a fairly large set of scripts from compat to core and I found a significant loss of functionality in getting image and template info. While writing this, I've noticed that the latest version of core also has some of these prblems. I will elaborate on this loss of functionality below, but I would like to know if this simplification is intended or if this is part of some work in progress.

For the image parsing, the function linkedPages(withImageLinks = True) used to provide images that were not included through templates, while imageLinks would provide all the images. In core, the linkedPages function no longer provides this capability, and I haven't found any replacement (I ported the old function in my code)

For template parsing, templatesWithParams from class Page used to provide a pair containing the template name and a list of parameters, with the full "key=value" string. Nowadays, we're getting a dictionary instead of that list. Normally there is nothing wrong with that, except that in Python 2 the dictionary is unordered, which means that:

the order of the parameters is forever lost

the original text cannot be reconstructed (because of the above and

the missing whitespace information) - this means there is no easy way to identify and/or replace a particular instance of the template in a page with many identical templates. It used to be you could do it with simple find/replace operations, now it takes some more work.

I personally would like to have the old behavior back, it would save me and probably others a lot of work.

Thanks, Strainu

Pywikipedia-l mailing list Pywikipedia-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/pywikipedia-l

Pywikipedia-l mailing list Pywikipedia-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/pywikipedia-l

4083

Age (days ago)

4128

Last active (days ago)

pywikipedia-l@lists.wikimedia.org

5 comments

4 participants

tags (0)

participants (4)

Alex Brollo
legoktm
Morten Wang
Strainu