[Pywikipedia-l] [ pywikipediabot-Bugs-1973804 ] Huge memory consumption during changing process

SourceForge.net noreply at sourceforge.net
Fri May 30 05:57:47 UTC 2008


Bugs item #1973804, was opened at 2008-05-27 02:53
Message generated for change (Comment added) made by nicdumz
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=603138&aid=1973804&group_id=93107

Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
>Category: General
Group: None
>Status: Closed
>Resolution: Fixed
Priority: 8
Private: No
Submitted By: Melancholie (melancholie)
>Assigned to: NicDumZ — Nicolas Dumazet (nicdumz)
Summary: Huge memory consumption during changing process

Initial Comment:
As soon as the changing process (putting/saving of pages) is started, interwiki.py (r5440) consumes more than 100 MB of memory (RAM+Swap) if bot is working on many wikis. Memory usage grows during changing process. When changing process is finished, the memory suddenly gets flushed. Memory usage is normal again then, but only until the next 'putting-pages process' proceeds ;-)

----------------------------------------------------------------------

>Comment By: NicDumZ — Nicolas Dumazet (nicdumz)
Date: 2008-05-30 07:57

Message:
Logged In: YES 
user_id=1963242
Originator: NO

Thanks to russblau and Bryan, this drawback should be highly reduced by
now - and I'm going to add, because previous versions caveats are well know
- if you run python 2.5 .

russblau made sure every Site objects were cached, avoiding to recreate a
new Site object everytime a new page is found. This should help a lot with
our current interwiki issue.

Bryan introduced the diskcache feature to save mediawiki messages on disk
to try to reduce RAM usage (set use_diskcache = True in user-config.py if
you need it)

I'm closing this bug since the overhead on put/save of pages for
interwiki.py is fixed. :)

----------------------------------------------------------------------

Comment By: NicDumZ — Nicolas Dumazet (nicdumz)
Date: 2008-05-29 13:24

Message:
Logged In: YES 
user_id=1963242
Originator: NO

Okay, this has been partially fixed by r5461.

However, the fact that it is slow at _EACH_ put means that mediawiki
messages are retrieved at _EACH_ put. And since every Site object does not
ever retrieve its messages more than once, that might mean that the
creation of Site objects in interwiki.py is suboptimal.

A nice thing to check would be : Are we sure that only a single Site
object is created per site in an interwiki.py run ?

----------------------------------------------------------------------

Comment By: Melancholie (melancholie)
Date: 2008-05-29 09:09

Message:
Logged In: YES 
user_id=2089773
Originator: YES

This bug is definitely because of that change:
http://svn.wikimedia.org/viewvc/pywikipedia/trunk/pywikipedia/wikipedia.py?r1=5437&r2=5438

----------------------------------------------------------------------

Comment By: Melancholie (melancholie)
Date: 2008-05-28 07:21

Message:
Logged In: YES 
user_id=2089773
Originator: YES

On low memory systems that does even lead to:
Inconsistency detected by ld.so: dl-minimal.c: 84: __libc_memalign:
Assertion `page != ((void *) -1)' failed!

Does that have to do with BeautifulSoup.py?
The revision that used (c)ElementTree did not cause that kind of bug!

----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=603138&aid=1973804&group_id=93107



More information about the Pywikipedia-l mailing list