Revision: 5463
Author: rotem
Date: 2008-05-29 14:04:25 +0000 (Thu, 29 May 2008)
Log Message:
-----------
I guess this is what it means; clarifying comment in Site.linkto.
Modified Paths:
--------------
trunk/pywikipedia/wikipedia.py
Modified: trunk/pywikipedia/wikipedia.py
===================================================================
--- trunk/pywikipedia/wikipedia.py 2008-05-29 13:57:41 UTC (rev 5462)
+++ trunk/pywikipedia/wikipedia.py 2008-05-29 14:04:25 UTC (rev 5463)
@@ -3358,7 +3358,7 @@
link = links[site].aslink(forceInterwiki=True)
s.append(link)
except AttributeError:
- s.append(insite.linkto(links[site], othersite=insite))
+ s.append(getSite(site).linkto(links[site], othersite=insite))
if insite.lang in insite.family.interwiki_on_one_line:
sep = u' '
else:
@@ -5096,7 +5096,8 @@
def linkto(self, title, othersite = None):
"""Return unicode string in the form of a wikilink to 'title'
- Use optional Site argument 'othersite' to generate an interwiki link.
+ Use optional Site argument 'othersite' to generate an interwiki link
+ from the other site to the current site.
"""
if othersite and othersite.lang != self.lang:
Bugs item #1973804, was opened at 2008-05-27 02:53
Message generated for change (Comment added) made by nicdumz
You can respond by visiting:
https://sourceforge.net/tracker/?func=detail&atid=603138&aid=1973804&group_…
Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: interwiki
Group: None
Status: Open
Resolution: None
Priority: 8
Private: No
Submitted By: Melancholie (melancholie)
Assigned to: Nobody/Anonymous (nobody)
Summary: Huge memory consumption during changing process
Initial Comment:
As soon as the changing process (putting/saving of pages) is started, interwiki.py (r5440) consumes more than 100 MB of memory (RAM+Swap) if bot is working on many wikis. Memory usage grows during changing process. When changing process is finished, the memory suddenly gets flushed. Memory usage is normal again then, but only until the next 'putting-pages process' proceeds ;-)
----------------------------------------------------------------------
>Comment By: NicDumZ — Nicolas Dumazet (nicdumz)
Date: 2008-05-29 13:24
Message:
Logged In: YES
user_id=1963242
Originator: NO
Okay, this has been partially fixed by r5461.
However, the fact that it is slow at _EACH_ put means that mediawiki
messages are retrieved at _EACH_ put. And since every Site object does not
ever retrieve its messages more than once, that might mean that the
creation of Site objects in interwiki.py is suboptimal.
A nice thing to check would be : Are we sure that only a single Site
object is created per site in an interwiki.py run ?
----------------------------------------------------------------------
Comment By: Melancholie (melancholie)
Date: 2008-05-29 09:09
Message:
Logged In: YES
user_id=2089773
Originator: YES
This bug is definitely because of that change:
http://svn.wikimedia.org/viewvc/pywikipedia/trunk/pywikipedia/wikipedia.py?…
----------------------------------------------------------------------
Comment By: Melancholie (melancholie)
Date: 2008-05-28 07:21
Message:
Logged In: YES
user_id=2089773
Originator: YES
On low memory systems that does even lead to:
Inconsistency detected by ld.so: dl-minimal.c: 84: __libc_memalign:
Assertion `page != ((void *) -1)' failed!
Does that have to do with BeautifulSoup.py?
The revision that used (c)ElementTree did not cause that kind of bug!
----------------------------------------------------------------------
You can respond by visiting:
https://sourceforge.net/tracker/?func=detail&atid=603138&aid=1973804&group_…
Bugs item #1977421, was opened at 2008-05-29 09:24
Message generated for change (Comment added) made by nicdumz
You can respond by visiting:
https://sourceforge.net/tracker/?func=detail&atid=603138&aid=1977421&group_…
Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: General
Group: None
>Status: Closed
>Resolution: Fixed
Priority: 5
Private: No
Submitted By: Melancholie (melancholie)
>Assigned to: NicDumZ — Nicolas Dumazet (nicdumz)
Summary: Use cElementTree instead of BeautifulSoup, if installed
Initial Comment:
Use cElementTree instead of BeautifulSoup, if available!
cElementTree is much much faster, and uses less memory. See:
http://www.oluyede.org/blog/2007/08/25/sgml-python-parsers-benchmark/
----------------------------------------------------------------------
>Comment By: NicDumZ — Nicolas Dumazet (nicdumz)
Date: 2008-05-29 13:21
Message:
Logged In: YES
user_id=1963242
Originator: NO
yes, cElementTree is definitely faster. Rev5461 uses (c)ElementTree when
available, and BS when it's not.
Thanks for the diagnosis :)
----------------------------------------------------------------------
You can respond by visiting:
https://sourceforge.net/tracker/?func=detail&atid=603138&aid=1977421&group_…
Revision: 5461
Author: nicdumz
Date: 2008-05-29 11:19:36 +0000 (Thu, 29 May 2008)
Log Message:
-----------
Making BeautifulSoup only a fallback solution. Message parsing with BS is really slow, per bug #1973804
Modified Paths:
--------------
trunk/pywikipedia/wikipedia.py
Modified: trunk/pywikipedia/wikipedia.py
===================================================================
--- trunk/pywikipedia/wikipedia.py 2008-05-29 06:20:42 UTC (rev 5460)
+++ trunk/pywikipedia/wikipedia.py 2008-05-29 11:19:36 UTC (rev 5461)
@@ -4458,23 +4458,43 @@
def mediawiki_message(self, key):
"""Return the MediaWiki message text for key "key" """
- # Allmessages is retrieved once for all in a session
+ # Allmessages is retrieved once for all per created Site object
if not self._mediawiki_messages:
if verbose:
output(
u"Retrieving mediawiki messages from Special:Allmessages")
+ elementtree = True
+ try:
+ try:
+ from xml.etree.cElementTree import XML # 2.5
+ except ImportError:
+ try:
+ from cElementTree import XML
+ except ImportError:
+ from elementtree.ElementTree import XML
+ except ImportError:
+ if verbose:
+ output(u'Elementtree was not found, using BeautifulSoup instead')
+ elementtree = False
+
retry_idle_time = 1
while True:
get_throttle()
xml = self.getUrl(self.get_address("Special:Allmessages")
+ "&ot=xml")
- tree = BeautifulStoneSoup(xml)
# xml structure is :
# <messages lang="fr">
# <message name="about">À propos</message>
# ...
# </messages>
- self._mediawiki_messages = dict([(tag.get('name').lower(), tag.string)
+ if elementtree:
+ decode = xml.encode(self.encoding())
+ tree = XML(decode)
+ self._mediawiki_messages = dict([(tag.get('name').lower(), tag.text)
+ for tag in tree.getiterator('message')])
+ else:
+ tree = BeautifulStoneSoup(xml)
+ self._mediawiki_messages = dict([(tag.get('name').lower(), tag.string)
for tag in tree.findAll('message')])
if not self._mediawiki_messages:
Bugs item #1977421, was opened at 2008-05-29 09:24
Message generated for change (Settings changed) made by melancholie
You can respond by visiting:
https://sourceforge.net/tracker/?func=detail&atid=603138&aid=1977421&group_…
Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: General
Group: None
Status: Open
Resolution: None
Priority: 5
Private: No
Submitted By: Melancholie (melancholie)
Assigned to: Nobody/Anonymous (nobody)
>Summary: Use cElementTree instead of BeautifulSoup, if installed
Initial Comment:
Use cElementTree instead of BeautifulSoup, if available!
cElementTree is much much faster, and uses less memory. See:
http://www.oluyede.org/blog/2007/08/25/sgml-python-parsers-benchmark/
----------------------------------------------------------------------
You can respond by visiting:
https://sourceforge.net/tracker/?func=detail&atid=603138&aid=1977421&group_…
Bugs item #1977421, was opened at 2008-05-29 09:24
Message generated for change (Tracker Item Submitted) made by Item Submitter
You can respond by visiting:
https://sourceforge.net/tracker/?func=detail&atid=603138&aid=1977421&group_…
Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: General
Group: None
Status: Open
Resolution: None
Priority: 5
Private: No
Submitted By: Melancholie (melancholie)
Assigned to: Nobody/Anonymous (nobody)
Summary: Use cElementTree instead of BeautifulSoup, if available
Initial Comment:
Use cElementTree instead of BeautifulSoup, if available!
cElementTree is much much faster, and uses less memory. See:
http://www.oluyede.org/blog/2007/08/25/sgml-python-parsers-benchmark/
----------------------------------------------------------------------
You can respond by visiting:
https://sourceforge.net/tracker/?func=detail&atid=603138&aid=1977421&group_…
Bugs item #1973804, was opened at 2008-05-27 02:53
Message generated for change (Comment added) made by melancholie
You can respond by visiting:
https://sourceforge.net/tracker/?func=detail&atid=603138&aid=1973804&group_…
Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: interwiki
Group: None
Status: Open
Resolution: None
Priority: 8
Private: No
Submitted By: Melancholie (melancholie)
Assigned to: Nobody/Anonymous (nobody)
Summary: Huge memory consumption during changing process
Initial Comment:
As soon as the changing process (putting/saving of pages) is started, interwiki.py (r5440) consumes more than 100 MB of memory (RAM+Swap) if bot is working on many wikis. Memory usage grows during changing process. When changing process is finished, the memory suddenly gets flushed. Memory usage is normal again then, but only until the next 'putting-pages process' proceeds ;-)
----------------------------------------------------------------------
>Comment By: Melancholie (melancholie)
Date: 2008-05-29 09:09
Message:
Logged In: YES
user_id=2089773
Originator: YES
This bug is definitely because of that change:
http://svn.wikimedia.org/viewvc/pywikipedia/trunk/pywikipedia/wikipedia.py?…
----------------------------------------------------------------------
Comment By: Melancholie (melancholie)
Date: 2008-05-28 07:21
Message:
Logged In: YES
user_id=2089773
Originator: YES
On low memory systems that does even lead to:
Inconsistency detected by ld.so: dl-minimal.c: 84: __libc_memalign:
Assertion `page != ((void *) -1)' failed!
Does that have to do with BeautifulSoup.py?
The revision that used (c)ElementTree did not cause that kind of bug!
----------------------------------------------------------------------
You can respond by visiting:
https://sourceforge.net/tracker/?func=detail&atid=603138&aid=1973804&group_…