Hello,
I'm trying to convert a fairly large set of scripts from compat to core and I found a significant loss of functionality in getting image and template info. While writing this, I've noticed that the latest version of core also has some of these prblems. I will elaborate on this loss of functionality below, but I would like to know if this simplification is intended or if this is part of some work in progress.
For the image parsing, the function linkedPages(withImageLinks = True) used to provide images that were not included through templates, while imageLinks would provide all the images. In core, the linkedPages function no longer provides this capability, and I haven't found any replacement (I ported the old function in my code)
For template parsing, templatesWithParams from class Page used to provide a pair containing the template name and a list of parameters, with the full "key=value" string. Nowadays, we're getting a dictionary instead of that list. Normally there is nothing wrong with that, except that in Python 2 the dictionary is unordered, which means that: * the order of the parameters is forever lost * the original text cannot be reconstructed (because of the above and the missing whitespace information) - this means there is no easy way to identify and/or replace a particular instance of the template in a page with many identical templates. It used to be you could do it with simple find/replace operations, now it takes some more work.
I personally would like to have the old behavior back, it would save me and probably others a lot of work.
Thanks, Strainu
I'm no expert when it comes to template manipulation using bots, but for the little template stuff I do I have found mwparserfromhell ( https://github.com/earwig/mwparserfromhell) to be a nice library that makes the job fairly easy, so you might want to look into that for your template needs.
Cheers, Morten
On 3 September 2013 15:39, Strainu strainu10@gmail.com wrote:
Hello,
I'm trying to convert a fairly large set of scripts from compat to core and I found a significant loss of functionality in getting image and template info. While writing this, I've noticed that the latest version of core also has some of these prblems. I will elaborate on this loss of functionality below, but I would like to know if this simplification is intended or if this is part of some work in progress.
For the image parsing, the function linkedPages(withImageLinks = True) used to provide images that were not included through templates, while imageLinks would provide all the images. In core, the linkedPages function no longer provides this capability, and I haven't found any replacement (I ported the old function in my code)
For template parsing, templatesWithParams from class Page used to provide a pair containing the template name and a list of parameters, with the full "key=value" string. Nowadays, we're getting a dictionary instead of that list. Normally there is nothing wrong with that, except that in Python 2 the dictionary is unordered, which means that:
- the order of the parameters is forever lost
- the original text cannot be reconstructed (because of the above and
the missing whitespace information) - this means there is no easy way to identify and/or replace a particular instance of the template in a page with many identical templates. It used to be you could do it with simple find/replace operations, now it takes some more work.
I personally would like to have the old behavior back, it would save me and probably others a lot of work.
Thanks, Strainu
Pywikipedia-l mailing list Pywikipedia-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/pywikipedia-l
Hi Strainu,
It's definitely a work in progress. There are very few features that have intentionally been removed, and all of those are explicitly marked with a "@deprecated" tag.
If there are features that you notice are missing that used to be available, filing a bug would be helpful so we can keep track of what still needs to be implemented.
Regarding templatesWithParams, using mwparserfromhell is probably the right way to go. I've started working on integrating that into the framework, but haven't gotten very far yet. We could probably use an OrderedDict[1] there too.
[1] http://docs.python.org/2/library/collections.html#collections.OrderedDict -- Legoktm
On Tue, Sep 3, 2013 at 1:39 PM, Strainu strainu10@gmail.com wrote:
Hello,
I'm trying to convert a fairly large set of scripts from compat to core and I found a significant loss of functionality in getting image and template info. While writing this, I've noticed that the latest version of core also has some of these prblems. I will elaborate on this loss of functionality below, but I would like to know if this simplification is intended or if this is part of some work in progress.
For the image parsing, the function linkedPages(withImageLinks = True) used to provide images that were not included through templates, while imageLinks would provide all the images. In core, the linkedPages function no longer provides this capability, and I haven't found any replacement (I ported the old function in my code)
For template parsing, templatesWithParams from class Page used to provide a pair containing the template name and a list of parameters, with the full "key=value" string. Nowadays, we're getting a dictionary instead of that list. Normally there is nothing wrong with that, except that in Python 2 the dictionary is unordered, which means that:
- the order of the parameters is forever lost
- the original text cannot be reconstructed (because of the above and
the missing whitespace information) - this means there is no easy way to identify and/or replace a particular instance of the template in a page with many identical templates. It used to be you could do it with simple find/replace operations, now it takes some more work.
I personally would like to have the old behavior back, it would save me and probably others a lot of work.
Thanks, Strainu
Pywikipedia-l mailing list Pywikipedia-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/pywikipedia-l
2013/9/5 legoktm legoktm.wikipedia@gmail.com:
Hi Strainu,
It's definitely a work in progress. There are very few features that have intentionally been removed, and all of those are explicitly marked with a "@deprecated" tag.
Glad to hear that!
If there are features that you notice are missing that used to be available, filing a bug would be helpful so we can keep track of what still needs to be implemented.
What's the preferred bugtracker these days?
Regarding templatesWithParams, using mwparserfromhell is probably the right way to go. I've started working on integrating that into the framework, but haven't gotten very far yet. We could probably use an OrderedDict[1] there too.
I thought templatesWithParams already uses mwparserfromhell by default?
Strainu
[1] http://docs.python.org/2/library/collections.html#collections.OrderedDict -- Legoktm
On Tue, Sep 3, 2013 at 1:39 PM, Strainu strainu10@gmail.com wrote:
Hello,
I'm trying to convert a fairly large set of scripts from compat to core and I found a significant loss of functionality in getting image and template info. While writing this, I've noticed that the latest version of core also has some of these prblems. I will elaborate on this loss of functionality below, but I would like to know if this simplification is intended or if this is part of some work in progress.
For the image parsing, the function linkedPages(withImageLinks = True) used to provide images that were not included through templates, while imageLinks would provide all the images. In core, the linkedPages function no longer provides this capability, and I haven't found any replacement (I ported the old function in my code)
For template parsing, templatesWithParams from class Page used to provide a pair containing the template name and a list of parameters, with the full "key=value" string. Nowadays, we're getting a dictionary instead of that list. Normally there is nothing wrong with that, except that in Python 2 the dictionary is unordered, which means that:
- the order of the parameters is forever lost
- the original text cannot be reconstructed (because of the above and
the missing whitespace information) - this means there is no easy way to identify and/or replace a particular instance of the template in a page with many identical templates. It used to be you could do it with simple find/replace operations, now it takes some more work.
I personally would like to have the old behavior back, it would save me and probably others a lot of work.
Thanks, Strainu
Pywikipedia-l mailing list Pywikipedia-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/pywikipedia-l
Pywikipedia-l mailing list Pywikipedia-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/pywikipedia-l
About parsing templates, I wrote a python function parseTemplate(), it returns a dictionary name:value and a list, the order of parameters being recorded into the latter. There's too a rewrite function (to rebuild the template code into a "beautified" style from dict+list) and a javascript version of both, using the same algorithm and the same idea.
Both are too rough to be implemented into "canonical" collections of scripts.
Alex
2013/9/5 Strainu strainu10@gmail.com
2013/9/5 legoktm legoktm.wikipedia@gmail.com:
Hi Strainu,
It's definitely a work in progress. There are very few features that have intentionally been removed, and all of those are explicitly marked with a "@deprecated" tag.
Glad to hear that!
If there are features that you notice are missing that used to be
available,
filing a bug would be helpful so we can keep track of what still needs
to be
implemented.
What's the preferred bugtracker these days?
Regarding templatesWithParams, using mwparserfromhell is probably the
right
way to go. I've started working on integrating that into the framework,
but
haven't gotten very far yet. We could probably use an OrderedDict[1]
there
too.
I thought templatesWithParams already uses mwparserfromhell by default?
Strainu
[1]
http://docs.python.org/2/library/collections.html#collections.OrderedDict
-- Legoktm
On Tue, Sep 3, 2013 at 1:39 PM, Strainu strainu10@gmail.com wrote:
Hello,
I'm trying to convert a fairly large set of scripts from compat to core and I found a significant loss of functionality in getting image and template info. While writing this, I've noticed that the latest version of core also has some of these prblems. I will elaborate on this loss of functionality below, but I would like to know if this simplification is intended or if this is part of some work in progress.
For the image parsing, the function linkedPages(withImageLinks = True) used to provide images that were not included through templates, while imageLinks would provide all the images. In core, the linkedPages function no longer provides this capability, and I haven't found any replacement (I ported the old function in my code)
For template parsing, templatesWithParams from class Page used to provide a pair containing the template name and a list of parameters, with the full "key=value" string. Nowadays, we're getting a dictionary instead of that list. Normally there is nothing wrong with that, except that in Python 2 the dictionary is unordered, which means that:
- the order of the parameters is forever lost
- the original text cannot be reconstructed (because of the above and
the missing whitespace information) - this means there is no easy way to identify and/or replace a particular instance of the template in a page with many identical templates. It used to be you could do it with simple find/replace operations, now it takes some more work.
I personally would like to have the old behavior back, it would save me and probably others a lot of work.
Thanks, Strainu
Pywikipedia-l mailing list Pywikipedia-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/pywikipedia-l
Pywikipedia-l mailing list Pywikipedia-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/pywikipedia-l
Pywikipedia-l mailing list Pywikipedia-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/pywikipedia-l
2013/9/5 legoktm legoktm.wikipedia@gmail.com:
Hi Strainu,
It's definitely a work in progress. There are very few features that have intentionally been removed, and all of those are explicitly marked with a "@deprecated" tag.
If there are features that you notice are missing that used to be available, filing a bug would be helpful so we can keep track of what still needs to be implemented.
I've submitted bugs 55881 (for image retrieval) and 55882 (for template parsing)
Regarding templatesWithParams, using mwparserfromhell is probably the right way to go. I've started working on integrating that into the framework, but haven't gotten very far yet. We could probably use an OrderedDict[1] there too.
I've noted that into the bug. I might be able to provide a patch for that.
[1] http://docs.python.org/2/library/collections.html#collections.OrderedDict -- Legoktm
On Tue, Sep 3, 2013 at 1:39 PM, Strainu strainu10@gmail.com wrote:
Hello,
I'm trying to convert a fairly large set of scripts from compat to core and I found a significant loss of functionality in getting image and template info. While writing this, I've noticed that the latest version of core also has some of these prblems. I will elaborate on this loss of functionality below, but I would like to know if this simplification is intended or if this is part of some work in progress.
For the image parsing, the function linkedPages(withImageLinks = True) used to provide images that were not included through templates, while imageLinks would provide all the images. In core, the linkedPages function no longer provides this capability, and I haven't found any replacement (I ported the old function in my code)
For template parsing, templatesWithParams from class Page used to provide a pair containing the template name and a list of parameters, with the full "key=value" string. Nowadays, we're getting a dictionary instead of that list. Normally there is nothing wrong with that, except that in Python 2 the dictionary is unordered, which means that:
- the order of the parameters is forever lost
- the original text cannot be reconstructed (because of the above and
the missing whitespace information) - this means there is no easy way to identify and/or replace a particular instance of the template in a page with many identical templates. It used to be you could do it with simple find/replace operations, now it takes some more work.
I personally would like to have the old behavior back, it would save me and probably others a lot of work.
Thanks, Strainu
Pywikipedia-l mailing list Pywikipedia-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/pywikipedia-l
Pywikipedia-l mailing list Pywikipedia-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/pywikipedia-l