[Pywikipedia-l] PreloadingGenerator memory leak

simone malacarne s.malacarne at gmail.com
Mon Oct 13 23:03:53 UTC 2008


I'm using PreloadingGenerator function to read a very big category (>
80.000 articles). My function is this:
cat = catlib.Category(site,'Categoria:BioBot')
gen = pagegenerators.CategorizedPageGenerator(cat, recurse=False, start=u'')
preloadingGen = pagegenerators.PreloadingGenerator(gen, pageNumber=60)
for page in preloadingGen:
       do something

all is working correctly, problem is that i can see progressive memory
and cpu occupation (at the end about 1.7 gigabyte) plus read speed is
slow down constantly over time.
I track the problem to getall function called by preloadinggenerator.
Inside _GetAll i found this instruction:
get_throttle(requestsize = len(self.pages))
i think this is responsible for the problem. Someway self.pages is not
reseted over time so the memory leak, then it grow bigger and bigger
and so throttle time increase.
I try to change this into:
get_throttle(requestsize = 10)
and now the read rate is constant over time, not slowing down
(naturally the mem leak is still there).

Hope this help
Bye
Simone



More information about the Pywikipedia-l mailing list