Bugs item #1973804, was opened at 2008-05-27 02:53 Message generated for change (Comment added) made by melancholie You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=603138&aid=1973804...
Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. Category: General Group: None Status: Closed Resolution: Fixed Priority: 8 Private: No Submitted By: Melancholie (melancholie) Assigned to: NicDumZ — Nicolas Dumazet (nicdumz) Summary: Huge memory consumption during changing process
Initial Comment: As soon as the changing process (putting/saving of pages) is started, interwiki.py (r5440) consumes more than 100 MB of memory (RAM+Swap) if bot is working on many wikis. Memory usage grows during changing process. When changing process is finished, the memory suddenly gets flushed. Memory usage is normal again then, but only until the next 'putting-pages process' proceeds ;-)
----------------------------------------------------------------------
Comment By: Melancholie (melancholie)
Date: 2008-06-06 10:54
Message: Logged In: YES user_id=2089773 Originator: YES
BTW: Is there any array that does not get properly flushed/deleted (or not as early as possible)? The longer a bot runs, the more RAM is taken up (if working on all wikis), until a relatively high (?) maximum is reached. Even with diskcache enabled.
----------------------------------------------------------------------
Comment By: NicDumZ — Nicolas Dumazet (nicdumz) Date: 2008-05-30 07:57
Message: Logged In: YES user_id=1963242 Originator: NO
Thanks to russblau and Bryan, this drawback should be highly reduced by now - and I'm going to add, because previous versions caveats are well know - if you run python 2.5 .
russblau made sure every Site objects were cached, avoiding to recreate a new Site object everytime a new page is found. This should help a lot with our current interwiki issue.
Bryan introduced the diskcache feature to save mediawiki messages on disk to try to reduce RAM usage (set use_diskcache = True in user-config.py if you need it)
I'm closing this bug since the overhead on put/save of pages for interwiki.py is fixed. :)
----------------------------------------------------------------------
Comment By: NicDumZ — Nicolas Dumazet (nicdumz) Date: 2008-05-29 13:24
Message: Logged In: YES user_id=1963242 Originator: NO
Okay, this has been partially fixed by r5461.
However, the fact that it is slow at _EACH_ put means that mediawiki messages are retrieved at _EACH_ put. And since every Site object does not ever retrieve its messages more than once, that might mean that the creation of Site objects in interwiki.py is suboptimal.
A nice thing to check would be : Are we sure that only a single Site object is created per site in an interwiki.py run ?
----------------------------------------------------------------------
Comment By: Melancholie (melancholie) Date: 2008-05-29 09:09
Message: Logged In: YES user_id=2089773 Originator: YES
This bug is definitely because of that change: http://svn.wikimedia.org/viewvc/pywikipedia/trunk/pywikipedia/wikipedia.py?r...
----------------------------------------------------------------------
Comment By: Melancholie (melancholie) Date: 2008-05-28 07:21
Message: Logged In: YES user_id=2089773 Originator: YES
On low memory systems that does even lead to: Inconsistency detected by ld.so: dl-minimal.c: 84: __libc_memalign: Assertion `page != ((void *) -1)' failed!
Does that have to do with BeautifulSoup.py? The revision that used (c)ElementTree did not cause that kind of bug!
----------------------------------------------------------------------
You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=603138&aid=1973804...
pywikipedia-l@lists.wikimedia.org