Bugs item #1809802, was opened at 2007-10-08 14:34 Message generated for change (Tracker Item Submitted) made by Item Submitter You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=603138&aid=1809802...
Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. Category: None Group: None Status: Open Resolution: None Priority: 5 Private: No Submitted By: Nobody/Anonymous (nobody) Assigned to: Nobody/Anonymous (nobody) Summary: weblinkchecker.py inefficiently respects max_external_links
Initial Comment: I noticed while testing my new system that setting max_external_links to anything above 250 seems to be pointless, as 250 page names is hardcoded
gen = pagegenerators.PreloadingGenerator(gen, pageNumber = 250)
So if more than 250 threads were to be created, they would have nothing to do, because it seems a fresh batch of page names (one per thread) will only be fetched once all the previous 250 threads have finished (I could be wrong here). In that case, it'd be better to have a statement like gen = pagegenerators.PreloadingGenerator(gen, pageNumber = config.max_external_links)
so you fetch at least as much or more page names than the current batch of threads need (I figure the more stored page names have, the less often it would need to wait for downloads). ie. something like:
--- weblinkchecker.py 2007-10-08 17:15:09.000000000 -0400 +++ weblinkchecker.py.bak 2007-10-08 17:14:58.000000000 -0400 @@ -729,7 +729,7 @@ if gen: if namespaces != []: gen = pagegenerators.NamespaceFilterPageGenerator(gen, namespaces) - gen = pagegenerators.PreloadingGenerator(gen, pageNumber = (config.max_external_links * 2)) + gen = pagegenerators.PreloadingGenerator(gen, pageNumber = 260) gen = pagegenerators.RedirectFilterPageGenerator(gen) bot = WeblinkCheckerRobot(gen) try:
----------------------------------------------------------------------
You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=603138&aid=1809802...