I'm using PreloadingGenerator function to read a very big category (> 80.000 articles). My function is this: cat = catlib.Category(site,'Categoria:BioBot') gen = pagegenerators.CategorizedPageGenerator(cat, recurse=False, start=u'') preloadingGen = pagegenerators.PreloadingGenerator(gen, pageNumber=60) for page in preloadingGen: do something
all is working correctly, problem is that i can see progressive memory and cpu occupation (at the end about 1.7 gigabyte) plus read speed is slow down constantly over time. I track the problem to getall function called by preloadinggenerator. Inside _GetAll i found this instruction: get_throttle(requestsize = len(self.pages)) i think this is responsible for the problem. Someway self.pages is not reseted over time so the memory leak, then it grow bigger and bigger and so throttle time increase. I try to change this into: get_throttle(requestsize = 10) and now the read rate is constant over time, not slowing down (naturally the mem leak is still there).
Hope this help Bye Simone