Bugs item #2619054, was opened at 2009-02-20 03:04
Message generated for change (Comment added) made by russblau
You can respond by visiting:
https://sourceforge.net/tracker/?func=detail&atid=603138&aid=2619054&group_…
Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: rewrite
Group: None
Status: Open
Resolution: None
Priority: 5
Private: No
Submitted By: NicDumZ — Nicolas Dumazet (nicdumz)
Assigned to: Russell Blau (russblau)
Summary: clarify between limit, number, batch and step parameters
Initial Comment:
I had a strange behavior of replace.py -weblink: that I couldn't quite diagnose: some pages were not treated.
First of all, those detailed logs are a great gift. They are a bit messy to understand at first, but thanks to those I found the bug and fixed it in r6386 ( http://svn.wikimedia.org/viewvc/pywikipedia?view=rev&revision=6386 ).
I believe that this parameter confusion is a very bad habit we have from the old framework. (the only reason there we have those bugs is because we merged pagegenerators from trunk.) We need to agree on common parameters for generators that have a global meaning, and stick to it.
I personally think that -limit might be a bit confusing (is it an api limit, a limit enforced by the local application on a huge fetched set, etc ?), while -number appears a bit more clear. But it's a personal opinion =)
What about -number for "number of items to retrieve", and -step, or -maxstep for the maximum number of items to retrieve at once ?
Actually, I don't mind about the names; we just need to agree on something meaningful enough, and document them in the file headings.
On a sidenote, replace.py -fix:yu-tld -weblink:*.yu is actually running on fr.wp. No issues sighted. =)
----------------------------------------------------------------------
>Comment By: Russell Blau (russblau)
Date: 2009-02-20 10:00
Message:
A good point. A query can have two different types of limits: the limit on
the number of pages/links/whatever retrieved from the API in a single
request (defaults to "max"), and the limit on the total number of items to
be retrieved from a repeated query. We should do this in a way that is (a)
internally consistent among all generators, and (b) as much as possible,
backwards-compatible with the old pagegenerators module (but this is
secondary to getting something that works).
----------------------------------------------------------------------
You can respond by visiting:
https://sourceforge.net/tracker/?func=detail&atid=603138&aid=2619054&group_…
Bugs item #2618865, was opened at 2009-02-20 01:41
Message generated for change (Comment added) made by russblau
You can respond by visiting:
https://sourceforge.net/tracker/?func=detail&atid=603138&aid=2618865&group_…
Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: rewrite
Group: None
>Status: Closed
>Resolution: Fixed
Priority: 5
Private: No
Submitted By: NicDumZ — Nicolas Dumazet (nicdumz)
Assigned to: Russell Blau (russblau)
Summary: output() is broken
Initial Comment:
$python pywikibot/pagegenerators.py -cat:1918
No handlers could be found for logger "wiki.config2"
Found 1 wikipedia:fr processes running, including this one.
$
Nothing is ever printed =)
Turning on the debug parameters however prints correctly the page names:
$python pywikibot/pagegenerators.py -cat:1918 -debug
No handlers could be found for logger "wiki.config2"
Found 1 wikipedia:fr processes running, including this one.
Grippe de 1918
1918 en bande dessinée
Armistice de Moudros
Déclaration d'indépendance de la Lituanie
Guerre d'indépendance lettone
1918
[and so on...]
=)
----------------------------------------------------------------------
>Comment By: Russell Blau (russblau)
Date: 2009-02-20 09:56
Message:
Fixed in r6387 and r6388
----------------------------------------------------------------------
You can respond by visiting:
https://sourceforge.net/tracker/?func=detail&atid=603138&aid=2618865&group_…
Revision: 6388
Author: russblau
Date: 2009-02-20 14:36:24 +0000 (Fri, 20 Feb 2009)
Log Message:
-----------
Initialize logging handlers if output() is called before handleArgs()
Modified Paths:
--------------
branches/rewrite/pywikibot/bot.py
Modified: branches/rewrite/pywikibot/bot.py
===================================================================
--- branches/rewrite/pywikibot/bot.py 2009-02-20 14:25:13 UTC (rev 6387)
+++ branches/rewrite/pywikibot/bot.py 2009-02-20 14:36:24 UTC (rev 6388)
@@ -130,6 +130,11 @@
implemented)
"""
+ # make sure logging system has been initialized
+ root = pywikibot.logging.getLogger()
+ if root.level == 30: # init_handlers sets this level
+ init_handlers()
+
if decoder:
text = unicode(text, decoder)
elif type(text) is not unicode:
@@ -186,6 +191,89 @@
return data
+def init_handlers():
+ """Initialize logging system for terminal-based bots"""
+
+ # All user output is routed through the logging module.
+ # Each type of output is handled by an appropriate handler object.
+ # This structure is used to permit eventual development of other
+ # user interfaces (GUIs) without modifying the core bot code.
+ # The following output levels are defined:
+ # DEBUG - only for file logging; debugging messages
+ # STDOUT - output that must be sent to sys.stdout (for bots that may
+ # have their output redirected to a file or other destination)
+ # VERBOSE - optional progress information for display to user
+ # INFO - normal (non-optional) progress information for display to user
+ # INPUT - prompts requiring user response
+ # WARN - user warning messages
+ # ERROR - user error messages
+ # CRITICAL - fatal error messages
+ # Accordingly, do ''not'' use print statements in bot code; instead,
+ # use pywikibot.output function.
+
+ moduleName = calledModuleName()
+ if not moduleName:
+ moduleName = "terminal-interface"
+
+ logging.addLevelName(VERBOSE, "VERBOSE")
+ # for messages to be displayed on terminal at "verbose" setting
+ # use INFO for messages to be displayed even on non-verbose setting
+ logging.addLevelName(STDOUT, "STDOUT")
+ # for messages to be displayed to stdout
+ logging.addLevelName(INPUT, "INPUT")
+ # for prompts requiring user response
+
+ root_logger = logging.getLogger()
+ root_logger.handlers = [] # get rid of default handler
+ root_logger.setLevel(DEBUG+1) # all records except DEBUG go to logger
+
+ # configure default handler for VERBOSE and INFO levels
+ default_handler = TerminalHandler(strm=sys.stderr)
+ if config.verbose_output:
+ default_handler.setLevel(VERBOSE)
+ else:
+ default_handler.setLevel(INFO)
+ default_handler.addFilter(MaxLevelFilter(INPUT))
+ default_handler.setFormatter(logging.Formatter(fmt="%(message)s"))
+ root_logger.addHandler(default_handler)
+
+ # if user has enabled file logging, configure file handler
+ if moduleName in config.log or '*' in config.log:
+ if config.logfilename:
+ logfile = config.datafilepath(config.logfilename)
+ else:
+ logfile = config.datafilepath("%s-bot.log" % moduleName)
+ file_handler = logging.handlers.RotatingFileHandler(
+ filename=logfile, maxBytes=2 << 20, backupCount=5)
+
+ file_handler.setLevel(DEBUG)
+ form = logging.Formatter(
+ fmt="%(asctime)s %(filename)18s, %(lineno)d: "
+ "%(levelname)-8s %(message)s",
+ datefmt="%Y-%m-%d %H:%M:%S"
+ )
+ file_handler.setFormatter(form)
+ root_logger.addHandler(file_handler)
+ for component in config.debug_log:
+ debuglogger = logging.getLogger(component)
+ debuglogger.setLevel(DEBUG)
+ debuglogger.addHandler(file_handler)
+
+ # handler for level STDOUT
+ output_handler = TerminalHandler(strm=sys.stdout)
+ output_handler.setLevel(STDOUT)
+ output_handler.addFilter(MaxLevelFilter(STDOUT))
+ output_handler.setFormatter(logging.Formatter(fmt="%(message)s"))
+ root_logger.addHandler(output_handler)
+
+ # handler for levels WARNING and higher
+ warning_handler = TerminalHandler(strm=sys.stderr)
+ warning_handler.setLevel(logging.WARNING)
+ warning_handler.setFormatter(
+ logging.Formatter(fmt="%(levelname)s: %(message)s"))
+ root_logger.addHandler(warning_handler)
+
+
# Command line parsing and help
def calledModuleName():
@@ -294,84 +382,9 @@
if username:
config.usernames[config.family][config.mylang] = username
- # initialize logging system for terminal-based bots
+ init_handlers()
- # All user output is routed through the logging module.
- # Each type of output is handled by an appropriate handler object.
- # This structure is used to permit eventual development of other
- # user interfaces (GUIs) without modifying the core bot code.
- # The following output levels are defined:
- # DEBUG - only for file logging; debugging messages
- # STDOUT - output that must be sent to sys.stdout (for bots that may
- # have their output redirected to a file or other destination)
- # VERBOSE - optional progress information for display to user
- # INFO - normal (non-optional) progress information for display to user
- # INPUT - prompts requiring user response
- # WARN - user warning messages
- # ERROR - user error messages
- # CRITICAL - fatal error messages
- # Accordingly, do ''not'' use print statements in bot code; instead,
- # use pywikibot.output function.
-
- logging.addLevelName(VERBOSE, "VERBOSE")
- # for messages to be displayed on terminal at "verbose" setting
- # use INFO for messages to be displayed even on non-verbose setting
- logging.addLevelName(STDOUT, "STDOUT")
- # for messages to be displayed to stdout
- logging.addLevelName(INPUT, "INPUT")
- # for prompts requiring user response
-
- root_logger = logging.getLogger()
- root_logger.handlers = [] # get rid of default handler
- root_logger.setLevel(DEBUG+1) # all records except DEBUG go to logger
-
- # configure default handler for VERBOSE and INFO levels
- default_handler = TerminalHandler(strm=sys.stderr)
if config.verbose_output:
- default_handler.setLevel(VERBOSE)
- else:
- default_handler.setLevel(INFO)
- default_handler.addFilter(MaxLevelFilter(INPUT))
- default_handler.setFormatter(logging.Formatter(fmt="%(message)s"))
- root_logger.addHandler(default_handler)
-
- # if user has enabled file logging, configure file handler
- if moduleName in config.log or '*' in config.log:
- if config.logfilename:
- logfile = config.datafilepath(config.logfilename)
- else:
- logfile = config.datafilepath("%s-bot.log" % moduleName)
- file_handler = logging.handlers.RotatingFileHandler(
- filename=logfile, maxBytes=2 << 20, backupCount=5)
-
- file_handler.setLevel(DEBUG)
- form = logging.Formatter(
- fmt="%(asctime)s %(filename)18s, %(lineno)d: "
- "%(levelname)-8s %(message)s",
- datefmt="%Y-%m-%d %H:%M:%S"
- )
- file_handler.setFormatter(form)
- root_logger.addHandler(file_handler)
- for component in config.debug_log:
- debuglogger = logging.getLogger(component)
- debuglogger.setLevel(DEBUG)
- debuglogger.addHandler(file_handler)
-
- # handler for level STDOUT
- output_handler = TerminalHandler(strm=sys.stdout)
- output_handler.setLevel(STDOUT)
- output_handler.addFilter(MaxLevelFilter(STDOUT))
- output_handler.setFormatter(logging.Formatter(fmt="%(message)s"))
- root_logger.addHandler(output_handler)
-
- # handler for levels WARNING and higher
- warning_handler = TerminalHandler(strm=sys.stderr)
- warning_handler.setLevel(logging.WARNING)
- warning_handler.setFormatter(
- logging.Formatter(fmt="%(levelname)s: %(message)s"))
- root_logger.addHandler(warning_handler)
-
- if config.verbose_output:
import re
ver = pywikibot.__version__ # probably can be improved on
m = re.search(r"\$Id: .* (\d+ \d+-\d+-\d+ \d+:\d+:\d+Z) .*\$", ver)
Bugs item #2619054, was opened at 2009-02-20 09:04
Message generated for change (Tracker Item Submitted) made by Item Submitter
You can respond by visiting:
https://sourceforge.net/tracker/?func=detail&atid=603138&aid=2619054&group_…
Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: rewrite
Group: None
Status: Open
Resolution: None
Priority: 5
Private: No
Submitted By: NicDumZ — Nicolas Dumazet (nicdumz)
Assigned to: Russell Blau (russblau)
Summary: clarify between limit, number, batch and step parameters
Initial Comment:
I had a strange behavior of replace.py -weblink: that I couldn't quite diagnose: some pages were not treated.
First of all, those detailed logs are a great gift. They are a bit messy to understand at first, but thanks to those I found the bug and fixed it in r6386 ( http://svn.wikimedia.org/viewvc/pywikipedia?view=rev&revision=6386 ).
I believe that this parameter confusion is a very bad habit we have from the old framework. (the only reason there we have those bugs is because we merged pagegenerators from trunk.) We need to agree on common parameters for generators that have a global meaning, and stick to it.
I personally think that -limit might be a bit confusing (is it an api limit, a limit enforced by the local application on a huge fetched set, etc ?), while -number appears a bit more clear. But it's a personal opinion =)
What about -number for "number of items to retrieve", and -step, or -maxstep for the maximum number of items to retrieve at once ?
Actually, I don't mind about the names; we just need to agree on something meaningful enough, and document them in the file headings.
On a sidenote, replace.py -fix:yu-tld -weblink:*.yu is actually running on fr.wp. No issues sighted. =)
----------------------------------------------------------------------
You can respond by visiting:
https://sourceforge.net/tracker/?func=detail&atid=603138&aid=2619054&group_…
Revision: 6386
Author: nicdumz
Date: 2009-02-20 07:53:28 +0000 (Fri, 20 Feb 2009)
Log Message:
-----------
Because of arguments confusion, the default behavior of linksearch() was to yield only 500 pages. Changing the defaults so that it crawls all the pages.
Modified Paths:
--------------
branches/rewrite/pywikibot/pagegenerators.py
branches/rewrite/pywikibot/site.py
Modified: branches/rewrite/pywikibot/pagegenerators.py
===================================================================
--- branches/rewrite/pywikibot/pagegenerators.py 2009-02-20 07:06:03 UTC (rev 6385)
+++ branches/rewrite/pywikibot/pagegenerators.py 2009-02-20 07:53:28 UTC (rev 6386)
@@ -723,14 +723,14 @@
for page in site.shortpages(number=number, repeat=repeat):
yield page[0]
-def LinksearchPageGenerator(link, step=500, site=None):
+def LinksearchPageGenerator(link, limit=None, site=None):
"""Yields all pages that include a specified link, according to
[[Special:Linksearch]].
"""
if site is None:
site = pywikibot.Site()
- for page in site.linksearch(link, limit=step):
+ for page in site.linksearch(link, limit=limit):
yield page
def SearchPageGenerator(query, number = 100, namespaces = None, site = None):
Modified: branches/rewrite/pywikibot/site.py
===================================================================
--- branches/rewrite/pywikibot/site.py 2009-02-20 07:06:03 UTC (rev 6385)
+++ branches/rewrite/pywikibot/site.py 2009-02-20 07:53:28 UTC (rev 6386)
@@ -2452,7 +2452,7 @@
# TODO: implement patrol
- def linksearch(self, siteurl, limit=500):
+ def linksearch(self, siteurl, limit=None):
"""Backwards-compatible interface to exturlusage()"""
return self.exturlusage(siteurl, limit=limit)