Feature Requests item #2936228, was opened at 2010-01-21 12:51
Message generated for change (Settings changed) made by xqt
You can respond by visiting:
https://sourceforge.net/tracker/?func=detail&atid=603141&aid=2936228&group_…
Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: None
Group: None
>Status: Closed
Priority: 5
Private: No
Submitted By: Nobody/Anonymous (nobody)
Assigned to: xqt (xqt)
Summary: enable/disable cosmetic changes
Initial Comment:
Is it possible to enable/disable cosmetic_changes.py for a particular script? For example, enable it for replace.py and disable for all other scripts, including interwiki.py? If not, please add this option.
----------------------------------------------------------------------
>Comment By: xqt (xqt)
Date: 2010-01-21 18:48
Message:
done in r7887
-cc overrules the default or user settings
----------------------------------------------------------------------
You can respond by visiting:
https://sourceforge.net/tracker/?func=detail&atid=603141&aid=2936228&group_…
Feature Requests item #2936228, was opened at 2010-01-21 12:51
Message generated for change (Settings changed) made by xqt
You can respond by visiting:
https://sourceforge.net/tracker/?func=detail&atid=603141&aid=2936228&group_…
Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: None
Group: None
Status: Open
Priority: 5
Private: No
Submitted By: Nobody/Anonymous (nobody)
>Assigned to: xqt (xqt)
Summary: enable/disable cosmetic changes
Initial Comment:
Is it possible to enable/disable cosmetic_changes.py for a particular script? For example, enable it for replace.py and disable for all other scripts, including interwiki.py? If not, please add this option.
----------------------------------------------------------------------
You can respond by visiting:
https://sourceforge.net/tracker/?func=detail&atid=603141&aid=2936228&group_…
Feature Requests item #2936228, was opened at 2010-01-21 11:51
Message generated for change (Tracker Item Submitted) made by nobody
You can respond by visiting:
https://sourceforge.net/tracker/?func=detail&atid=603141&aid=2936228&group_…
Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: None
Group: None
Status: Open
Priority: 5
Private: No
Submitted By: Nobody/Anonymous (nobody)
Assigned to: Nobody/Anonymous (nobody)
Summary: enable/disable cosmetic changes
Initial Comment:
Is it possible to enable/disable cosmetic_changes.py for a particular script? For example, enable it for replace.py and disable for all other scripts, including interwiki.py? If not, please add this option.
----------------------------------------------------------------------
You can respond by visiting:
https://sourceforge.net/tracker/?func=detail&atid=603141&aid=2936228&group_…
Bugs item #2619054, was opened at 2009-02-20 08:04
Message generated for change (Comment added) made by sf-robot
You can respond by visiting:
https://sourceforge.net/tracker/?func=detail&atid=603138&aid=2619054&group_…
Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: rewrite
Group: None
>Status: Closed
Resolution: Fixed
Priority: 5
Private: No
Submitted By: NicDumZ Nicolas Dumazet (nicdumz)
Assigned to: Russell Blau (russblau)
Summary: clarify between limit, number, batch and step parameters
Initial Comment:
I had a strange behavior of replace.py -weblink: that I couldn't quite diagnose: some pages were not treated.
First of all, those detailed logs are a great gift. They are a bit messy to understand at first, but thanks to those I found the bug and fixed it in r6386 ( http://svn.wikimedia.org/viewvc/pywikipedia?view=rev&revision=6386 ).
I believe that this parameter confusion is a very bad habit we have from the old framework. (the only reason there we have those bugs is because we merged pagegenerators from trunk.) We need to agree on common parameters for generators that have a global meaning, and stick to it.
I personally think that -limit might be a bit confusing (is it an api limit, a limit enforced by the local application on a huge fetched set, etc ?), while -number appears a bit more clear. But it's a personal opinion =)
What about -number for "number of items to retrieve", and -step, or -maxstep for the maximum number of items to retrieve at once ?
Actually, I don't mind about the names; we just need to agree on something meaningful enough, and document them in the file headings.
On a sidenote, replace.py -fix:yu-tld -weblink:*.yu is actually running on fr.wp. No issues sighted. =)
----------------------------------------------------------------------
>Comment By: SourceForge Robot (sf-robot)
Date: 2010-01-21 02:20
Message:
This Tracker item was closed automatically by the system. It was
previously set to a Pending status, and the original submitter
did not respond within 14 days (the time period specified by
the administrator of this Tracker).
----------------------------------------------------------------------
Comment By: Russell Blau (russblau)
Date: 2010-01-06 18:59
Message:
This was fixed a while back but I neglected to close the bug; please reopen
if any continuing problems exist.
----------------------------------------------------------------------
Comment By: NicDumZ Nicolas Dumazet (nicdumz)
Date: 2009-02-22 04:50
Message:
Well I think that one of the first steps here is to consider what is
currently done in the old pagegenerators =)
Here's a small summary of the "limits" enforced by our old
pagegenerators.
The overall internal naming consistency factor is quite low for now, not
to mention the surprising facts I found :s
I've considered for each generator, the pagegenerators function, and its
Site/Page/Image/Category counterpart: unless noted, both function parameter
namings are consistent.
* shortpages, new(pages|images), unusedfiles, withoutinterwiki,
uncategorized(images|categories|pages), unwatchedpages, ancientpages,
deadendpages, longpages, shortpages, search
They use "number" (meant as "batch"/"max") + boolean "repeat". Overall,
you can get either "number" items, or all.
* random(page|redirect) are good examples of inconsistencies:
they use number (batch/max) + repeat, but since Special:Random gives only
one page at a time, the actual "batch" parameter is always 1. (behavior is
"for _ in range(number), fetch one page")
And if repeat=True ... those functions never stop, if I'm right.
irrrk !!
* filelinks, imagelinks, interwiki
they scrap the article wikipage, and yield everything in one step from the
wikitext
* categorymembers, subcategories
they scrap category pages. No parameter is available, since the UI doesn't
let us customize the number of displayed links. Follows the (next) links on
the category page. Stops when all the items have been retrieved.
* allpages, prefixindex, getReferences
no function parameters. They use config.special_page_limit as "batch/max",
and all items are retrieved through repeated queries.
if special_page_limit > 999, getReferences sets it back to 999. (?!)
* linksearch
pagegenerators has a "step=500" parameter, the corresponding Site function
uses "limit=500". Meant as "batch/batch": all the links are retrieved
through repeated queries
* usercontribs
number=250, meant as "batch/max". All the contribs are retrieved through
repeated queries. if number>500, sets it back to 500
It seems that the most common used combination is number+repeat. But I
really don't think that it is the way to go, since you cannot accurately
describe the total number of items you want to retrieve: either number,
either all items...
I think a "batch" + "total" integer parameters could be more useful here
(namings are illustrative)
On the other hand, users should be able to say "I want to retrieve all the
items": looking into the code, I see that a "-1" convention is used now. If
I understand things correctly, it is used in a "batch" context: if we call
set_maximum_items(-1), in most of the cases, the API uses its default
xxlimit number. We could use such a convention for our "total" parameter
too. Be it -1, or None, whatever, but I think that with such a policy, we
should cover all the use cases.
Given what I found, I really don't think that backwards compatibility
should be a priority here. I would rather introduce a breaking change in
namings, so that people don't expect the new limits to work "as in the old
framework"... because in the old framework, limit behaviors were not even
internally consistent...
----------------------------------------------------------------------
Comment By: Russell Blau (russblau)
Date: 2009-02-20 15:00
Message:
A good point. A query can have two different types of limits: the limit on
the number of pages/links/whatever retrieved from the API in a single
request (defaults to "max"), and the limit on the total number of items to
be retrieved from a repeated query. We should do this in a way that is (a)
internally consistent among all generators, and (b) as much as possible,
backwards-compatible with the old pagegenerators module (but this is
secondary to getting something that works).
----------------------------------------------------------------------
You can respond by visiting:
https://sourceforge.net/tracker/?func=detail&atid=603138&aid=2619054&group_…
Bugs item #2629586, was opened at 2009-02-23 08:41
Message generated for change (Comment added) made by sf-robot
You can respond by visiting:
https://sourceforge.net/tracker/?func=detail&atid=603138&aid=2629586&group_…
Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: rewrite
Group: None
>Status: Closed
Resolution: None
Priority: 5
Private: No
Submitted By: NicDumZ Nicolas Dumazet (nicdumz)
Assigned to: Russell Blau (russblau)
Summary: TerminalHandler.emit crashes when message is a string
Initial Comment:
Sometimes self.format(record) can return a string, and this case is currently not handled.
How to trigger this behavior ?
1) modify Site.loadpageinfo() so that the error "u"loadpageinfo: Query on %s returned data on '%s' will be raised everytime.
2) run category.py move -from:nonasciititle -to:nonasciititle2
3) the loadpageinfo error will be triggered; category.py will catch this Error at top-level and will call pywikibot.logging.exception("Fatal error:")
Here, this gives :
pywikibot/scripts$ python category.py move -from:"Athlte du combin nordique aux Jeux olympiques" -to:"Coureur du combin nordique aux Jeux olympiques" -debug
Reading dump from category.dump.bz2
Found 1 wikipedia:fr processes running, including this one.
Traceback (most recent call last):
File ".../pywikibot/bot.py", line 95, in emit
"xmlcharrefreplace"))
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 671: ordinal not in range(128)
Dumping to category.dump.bz2, please wait...
I don't understand exactly how a string is returned by format(), and why a unicode message is expected, but it happens.
The stacktrace here is parcticularly cryptic. True, I'm still not used to the logging system, but I had to place manually old-fashioned "print"s everywhere to track the issue and understand what CAUSED this. :/
I have patched emit() in r6423 so it doesn't crash on a string message. However, Russ, I think that you might want to fix the source of the problem, in the logging system itself, rather than solving the effect. Feel free to revert this :)
----------------------------------------------------------------------
>Comment By: SourceForge Robot (sf-robot)
Date: 2010-01-21 02:20
Message:
This Tracker item was closed automatically by the system. It was
previously set to a Pending status, and the original submitter
did not respond within 14 days (the time period specified by
the administrator of this Tracker).
----------------------------------------------------------------------
Comment By: Russell Blau (russblau)
Date: 2010-01-06 19:04
Message:
I'm not sure if there is anything here that still needs to be fixed; fixing
bugs in the logging module is certainly outside the abilities of this
project! ;-)
----------------------------------------------------------------------
You can respond by visiting:
https://sourceforge.net/tracker/?func=detail&atid=603138&aid=2629586&group_…
Bugs item #2935366, was opened at 2010-01-20 02:19
Message generated for change (Settings changed) made by xqt
You can respond by visiting:
https://sourceforge.net/tracker/?func=detail&atid=603138&aid=2935366&group_…
Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: General
Group: None
>Status: Closed
Resolution: Fixed
Priority: 5
Private: No
Submitted By: Davide Bolsi (basilicofresco)
Assigned to: xqt (xqt)
Summary: Dump file: 'NoneType' object has no attribute 'strip'
Initial Comment:
Replace.py still halts on some pages with deleted usernames. The problem was solved for direct page loading (thanks), but is still present with dump file scanning.
C:\pywikipedia>replace.py -xml:enwiki-20091128-pages-articles.xml "a" "b" -xmlstart:"Successions of Philosophers" -lang:en
Reading XML dump...
Traceback (most recent call last):
File "C:\pywikipedia\pagegenerators.py", line 849, in __iter__
for page in self.wrapped_gen:
File "C:\pywikipedia\pagegenerators.py", line 781, in DuplicateFilterPageGenerator
for page in generator:
File "C:\pywikipedia\replace.py", line 218, in __iter__
for entry in self.parser:
File "C:\pywikipedia\xmlreader.py", line 301, in new_parse
for rev in self._parse(event, elem):
File "C:\pywikipedia\xmlreader.py", line 310, in _parse_only_latest
yield self._create_revision(revision)
File "C:\pywikipedia\xmlreader.py", line 347, in _create_revision
redirect=self.isredirect
File "C:\pywikipedia\xmlreader.py", line 64, in __init__
self.username = username.strip()
AttributeError: 'NoneType' object has no attribute 'strip'
'NoneType' object has no attribute 'strip'
C:\pywikipedia>version.py
Pywikipedia [http] trunk/pywikipedia (r7885, 2010/01/19, 07:31:56)
Python 2.6 (r26:66721, Oct 2 2008, 11:35:03) [MSC v.1500 32 bit (Intel)]
----------------------------------------------------------------------
Comment By: Davide Bolsi (basilicofresco)
Date: 2010-01-20 09:47
Message:
Now it works, thanks!
----------------------------------------------------------------------
Comment By: xqt (xqt)
Date: 2010-01-20 07:33
Message:
done in r7886.
Please confirm whether it works.
----------------------------------------------------------------------
You can respond by visiting:
https://sourceforge.net/tracker/?func=detail&atid=603138&aid=2935366&group_…
Bugs item #2935366, was opened at 2010-01-20 02:19
Message generated for change (Comment added) made by basilicofresco
You can respond by visiting:
https://sourceforge.net/tracker/?func=detail&atid=603138&aid=2935366&group_…
Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: General
Group: None
>Status: Open
Resolution: Fixed
Priority: 5
Private: No
Submitted By: Davide Bolsi (basilicofresco)
Assigned to: xqt (xqt)
Summary: Dump file: 'NoneType' object has no attribute 'strip'
Initial Comment:
Replace.py still halts on some pages with deleted usernames. The problem was solved for direct page loading (thanks), but is still present with dump file scanning.
C:\pywikipedia>replace.py -xml:enwiki-20091128-pages-articles.xml "a" "b" -xmlstart:"Successions of Philosophers" -lang:en
Reading XML dump...
Traceback (most recent call last):
File "C:\pywikipedia\pagegenerators.py", line 849, in __iter__
for page in self.wrapped_gen:
File "C:\pywikipedia\pagegenerators.py", line 781, in DuplicateFilterPageGenerator
for page in generator:
File "C:\pywikipedia\replace.py", line 218, in __iter__
for entry in self.parser:
File "C:\pywikipedia\xmlreader.py", line 301, in new_parse
for rev in self._parse(event, elem):
File "C:\pywikipedia\xmlreader.py", line 310, in _parse_only_latest
yield self._create_revision(revision)
File "C:\pywikipedia\xmlreader.py", line 347, in _create_revision
redirect=self.isredirect
File "C:\pywikipedia\xmlreader.py", line 64, in __init__
self.username = username.strip()
AttributeError: 'NoneType' object has no attribute 'strip'
'NoneType' object has no attribute 'strip'
C:\pywikipedia>version.py
Pywikipedia [http] trunk/pywikipedia (r7885, 2010/01/19, 07:31:56)
Python 2.6 (r26:66721, Oct 2 2008, 11:35:03) [MSC v.1500 32 bit (Intel)]
----------------------------------------------------------------------
>Comment By: Davide Bolsi (basilicofresco)
Date: 2010-01-20 09:47
Message:
Now it works, thanks!
----------------------------------------------------------------------
Comment By: xqt (xqt)
Date: 2010-01-20 07:33
Message:
done in r7886.
Please confirm whether it works.
----------------------------------------------------------------------
You can respond by visiting:
https://sourceforge.net/tracker/?func=detail&atid=603138&aid=2935366&group_…
Bugs item #2935366, was opened at 2010-01-20 02:19
Message generated for change (Comment added) made by xqt
You can respond by visiting:
https://sourceforge.net/tracker/?func=detail&atid=603138&aid=2935366&group_…
Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: General
Group: None
>Status: Pending
>Resolution: Fixed
Priority: 5
Private: No
Submitted By: Davide Bolsi (basilicofresco)
>Assigned to: xqt (xqt)
Summary: Dump file: 'NoneType' object has no attribute 'strip'
Initial Comment:
Replace.py still halts on some pages with deleted usernames. The problem was solved for direct page loading (thanks), but is still present with dump file scanning.
C:\pywikipedia>replace.py -xml:enwiki-20091128-pages-articles.xml "a" "b" -xmlstart:"Successions of Philosophers" -lang:en
Reading XML dump...
Traceback (most recent call last):
File "C:\pywikipedia\pagegenerators.py", line 849, in __iter__
for page in self.wrapped_gen:
File "C:\pywikipedia\pagegenerators.py", line 781, in DuplicateFilterPageGenerator
for page in generator:
File "C:\pywikipedia\replace.py", line 218, in __iter__
for entry in self.parser:
File "C:\pywikipedia\xmlreader.py", line 301, in new_parse
for rev in self._parse(event, elem):
File "C:\pywikipedia\xmlreader.py", line 310, in _parse_only_latest
yield self._create_revision(revision)
File "C:\pywikipedia\xmlreader.py", line 347, in _create_revision
redirect=self.isredirect
File "C:\pywikipedia\xmlreader.py", line 64, in __init__
self.username = username.strip()
AttributeError: 'NoneType' object has no attribute 'strip'
'NoneType' object has no attribute 'strip'
C:\pywikipedia>version.py
Pywikipedia [http] trunk/pywikipedia (r7885, 2010/01/19, 07:31:56)
Python 2.6 (r26:66721, Oct 2 2008, 11:35:03) [MSC v.1500 32 bit (Intel)]
----------------------------------------------------------------------
>Comment By: xqt (xqt)
Date: 2010-01-20 07:33
Message:
done in r7886.
Please confirm whether it works.
----------------------------------------------------------------------
You can respond by visiting:
https://sourceforge.net/tracker/?func=detail&atid=603138&aid=2935366&group_…
Bugs item #2935366, was opened at 2010-01-20 02:19
Message generated for change (Tracker Item Submitted) made by basilicofresco
You can respond by visiting:
https://sourceforge.net/tracker/?func=detail&atid=603138&aid=2935366&group_…
Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: General
Group: None
Status: Open
Resolution: None
Priority: 5
Private: No
Submitted By: Davide Bolsi (basilicofresco)
Assigned to: Nobody/Anonymous (nobody)
Summary: Dump file: 'NoneType' object has no attribute 'strip'
Initial Comment:
Replace.py still halts on some pages with deleted usernames. The problem was solved for direct page loading (thanks), but is still present with dump file scanning.
C:\pywikipedia>replace.py -xml:enwiki-20091128-pages-articles.xml "a" "b" -xmlstart:"Successions of Philosophers" -lang:en
Reading XML dump...
Traceback (most recent call last):
File "C:\pywikipedia\pagegenerators.py", line 849, in __iter__
for page in self.wrapped_gen:
File "C:\pywikipedia\pagegenerators.py", line 781, in DuplicateFilterPageGenerator
for page in generator:
File "C:\pywikipedia\replace.py", line 218, in __iter__
for entry in self.parser:
File "C:\pywikipedia\xmlreader.py", line 301, in new_parse
for rev in self._parse(event, elem):
File "C:\pywikipedia\xmlreader.py", line 310, in _parse_only_latest
yield self._create_revision(revision)
File "C:\pywikipedia\xmlreader.py", line 347, in _create_revision
redirect=self.isredirect
File "C:\pywikipedia\xmlreader.py", line 64, in __init__
self.username = username.strip()
AttributeError: 'NoneType' object has no attribute 'strip'
'NoneType' object has no attribute 'strip'
C:\pywikipedia>version.py
Pywikipedia [http] trunk/pywikipedia (r7885, 2010/01/19, 07:31:56)
Python 2.6 (r26:66721, Oct 2 2008, 11:35:03) [MSC v.1500 32 bit (Intel)]
----------------------------------------------------------------------
You can respond by visiting:
https://sourceforge.net/tracker/?func=detail&atid=603138&aid=2935366&group_…
Bugs item #2929809, was opened at 2010-01-11 14:24
Message generated for change (Comment added) made by xqt
You can respond by visiting:
https://sourceforge.net/tracker/?func=detail&atid=603138&aid=2929809&group_…
Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: General
Group: None
>Status: Closed
>Resolution: Fixed
Priority: 5
Private: No
Submitted By: Davide Bolsi (basilicofresco)
>Assigned to: xqt (xqt)
Summary: 'NoneType' object has no attribute 'strip'
Initial Comment:
I was getting a strange error with replace.py and some large file dumps, so I did some testing...
Well, I discovered that replace.py halts while loading some pages: eg. "Technical Architecture Group" on en.wikipedia, but I got the same error also with a page on it.wikipedia.
It halts on the very same page both with dump file and with direct page loading. This error is particular annoying because for example I'm not able to full scan the whole dump.
Examples:
1) direct page loading
C:\pywikipedia>replace.py -lang:en -page:"Technical Architecture Group" "a" "b"
Getting 1 pages from wikipedia:en...
Traceback (most recent call last):
File "C:\pywikipedia\pagegenerators.py", line 860, in __iter__
for loaded_page in self.preload(somePages):
File "C:\pywikipedia\pagegenerators.py", line 879, in preload
wikipedia.getall(site, pagesThisSite)
File "C:\pywikipedia\wikipedia.py", line 4159, in getall
_GetAll(site, pages, throttle, force).run()
File "C:\pywikipedia\wikipedia.py", line 3842, in run
xml.sax.parseString(data, handler)
File "C:\Python26\lib\xml\sax\__init__.py", line 49, in parseString
parser.parse(inpsrc)
File "C:\Python26\lib\xml\sax\expatreader.py", line 107, in parse
xmlreader.IncrementalParser.parse(self, source)
File "C:\Python26\lib\xml\sax\xmlreader.py", line 123, in parse
self.feed(buffer)
File "C:\Python26\lib\xml\sax\expatreader.py", line 207, in feed
self._parser.Parse(data, isFinal)
File "C:\Python26\lib\xml\sax\expatreader.py", line 304, in end_element
self._cont_handler.endElement(name)
File "C:\pywikipedia\xmlreader.py", line 182, in endElement
text, self.username,
AttributeError: MediaWikiXmlHandler instance has no attribute 'username'
MediaWikiXmlHandler instance has no attribute 'username'
2) dump file (on this dump "Successions of Philosophers" immediately precedes "Technical Architecture Group")
C:\pywikipedia>replace.py -xml:enwiki-20091128-pages-articles.xml -lang:en -xmlstart:"Successions of
Philosophers" "a" "b"
Reading XML dump...
Getting 1 pages from wikipedia:en...
>>> Successions of Philosophers <<<
[...cut......cut......cut...]
Do you want to accept these changes? ([y]es, [N]o, [e]dit, open in [b]rowser, [a]ll, [q]uit) n
Traceback (most recent call last):
File "C:\pywikipedia\pagegenerators.py", line 847, in __iter__
for page in self.wrapped_gen:
File "C:\pywikipedia\pagegenerators.py", line 779, in DuplicateFilterPageGenerator
for page in generator:
File "C:\pywikipedia\replace.py", line 218, in __iter__
for entry in self.parser:
File "C:\pywikipedia\xmlreader.py", line 295, in new_parse
for rev in self._parse(event, elem):
File "C:\pywikipedia\xmlreader.py", line 304, in _parse_only_latest
yield self._create_revision(revision)
File "C:\pywikipedia\xmlreader.py", line 341, in _create_revision
redirect=self.isredirect
File "C:\pywikipedia\xmlreader.py", line 64, in __init__
self.username = username.strip()
AttributeError: 'NoneType' object has no attribute 'strip'
'NoneType' object has no attribute 'strip'
Thanks in advance!
----------------------------------------------------------------------
>Comment By: xqt (xqt)
Date: 2010-01-19 08:37
Message:
done in r7885
Some usernames are deleted:
http://en.wikipedia.org/w/index.php?title=Technical_Architecture_Group&acti…
----------------------------------------------------------------------
You can respond by visiting:
https://sourceforge.net/tracker/?func=detail&atid=603138&aid=2929809&group_…