Bugs item #1914786, was opened at 2008-03-15 13:13
Message generated for change (Comment added) made by lusum
You can respond by visiting:
https://sourceforge.net/tracker/?func=detail&atid=603138&aid=1914786&group_…
Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: None
Group: None
Status: Open
Resolution: None
Priority: 8
Private: No
Submitted By: Filnik (filnik)
Assigned to: Nobody/Anonymous (nobody)
Summary: Great Waste of CPU and/or RAM on interwiki.py and replace.py
Initial Comment:
Hello, I have a lot of regex to run with replace.py and I've group them in the fixes.py file and make it run through the whole italian wikipedia. But, after 1-2 days of running, the program was wasting this amount of resources:
filnik 32709 8.9 11.1 1074936 903376 pts/113 Sl+ Mar12 357:09 python2.5 pynik <etc...>
Re-running the bot (from the page that it has reached) I have that:
filnik 31372 2.8 0.2 193564 21992 pts/113 Sl+ 11:58 0:17 python2.5 pynik <etc...>
That's A LOT less than before! So, what the hell is happening? why the Bot doesn't release the resources that it uses when they aren't need any more?
The same appends for interwiki.py, I have tried to use python2.5 instead of 2.4 but without any results. I've also tried to use "del" on python but no results also with that.
So, any Idea? Filnik
P.S. Fix that bug would improve the use of interwiki.py on the toolserver and of replace.py, I think the most used scripts, so I've set 8 as importance.
----------------------------------------------------------------------
Comment By: lusum (lusum)
Date: 2008-03-23 21:06
Message:
Logged In: YES
user_id=642982
Originator: NO
I have done some tests: the problem seem caused by some del statement in
code it seem mostly due to the del instruction in def replaceLinks(self,
page, newPages, bot):
I have tried to fix it, using a new.remove instead del new... but without
success
----------------------------------------------------------------------
You can respond by visiting:
https://sourceforge.net/tracker/?func=detail&atid=603138&aid=1914786&group_…
Revision: 5155
Author: russblau
Date: 2008-03-23 20:01:02 +0000 (Sun, 23 Mar 2008)
Log Message:
-----------
Improve title parsing (.strip() without arguments may remove some Unicode chars that are valid in wiki page titles).
Modified Paths:
--------------
trunk/pywikipedia/wikipedia.py
Modified: trunk/pywikipedia/wikipedia.py
===================================================================
--- trunk/pywikipedia/wikipedia.py 2008-03-23 14:45:18 UTC (rev 5154)
+++ trunk/pywikipedia/wikipedia.py 2008-03-23 20:01:02 UTC (rev 5155)
@@ -331,7 +331,7 @@
while u" " in t:
t = t.replace(u" ", u" ")
# Strip spaces at both ends
- t = t.strip()
+ t = t.strip(u" ")
# Remove left-to-right and right-to-left markers.
t = t.replace(u'\u200e', '').replace(u'\u200f', '')
# leading colon implies main namespace instead of the default
@@ -403,21 +403,20 @@
sectionStart = t.find(u'#')
if sectionStart >= 0:
- self._section = t[sectionStart+1 : ].strip()
+ self._section = t[sectionStart+1 : ].lstrip(" ")
self._section = sectionencode(self._section,
self.site().encoding())
if not self._section:
self._section = None
- t = t[ : sectionStart].strip()
+ t = t[ : sectionStart].rstrip(" ")
else:
self._section = None
if t:
if not self.site().nocapitalize:
- t = t[0].upper() + t[1:]
+ t = t[:1].upper() + t[1:]
# reassemble the title from its parts
-
if self._namespace != 0:
t = self.site().namespace(self._namespace) + u':' + t
if self._section:
@@ -1518,7 +1517,8 @@
for match in Rlink.finditer(thistxt):
title = match.group('title')
- if title.strip().startswith("#"):
+ title = title.replace("_", " ").strip(" ")
+ if title.startswith("#"):
# this is an internal section link
continue
if not self.site().isInterwikiLink(title):
@@ -4892,12 +4892,12 @@
of the link refers to this site's own family and/or language.
"""
- s = s.strip().lstrip(":")
+ s = s.replace("_", " ").strip(" ").lstrip(":")
if not ':' in s:
return False
first, rest = s.split(':',1)
# interwiki codes are case-insensitive
- first = first.lower().strip()
+ first = first.lower().strip(" ")
# commons: forwards interlanguage links to wikipedia:, etc.
if self.family.interwiki_forward:
interlangTargetFamily = Family(self.family.interwiki_forward)
Revision: 5154
Author: btongminh
Date: 2008-03-23 14:45:18 +0000 (Sun, 23 Mar 2008)
Log Message:
-----------
Extension of r5151: Fix a regression introduced in r5037: Fetching an edit token when we already had an edit token did not work, even when the edit token has changed.
Modified Paths:
--------------
trunk/pywikipedia/wikipedia.py
Modified: trunk/pywikipedia/wikipedia.py
===================================================================
--- trunk/pywikipedia/wikipedia.py 2008-03-23 12:37:07 UTC (rev 5153)
+++ trunk/pywikipedia/wikipedia.py 2008-03-23 14:45:18 UTC (rev 5154)
@@ -4165,7 +4165,7 @@
return text
- def _getUserData(self, text, sysop = False):
+ def _getUserData(self, text, sysop = False, force = True):
"""
Get the user data from a wiki page data.
@@ -4181,7 +4181,7 @@
# Check for blocks - but only if version is 1.11 (userinfo is available)
# and the user data was not yet loaded
- if self.versionnumber() >= 11 and not self._userData[index]:
+ if self.versionnumber() >= 11 and (not self._userData[index] or force):
blocked = self.isBlocked(sysop = sysop)
if blocked and not self._isBlocked[index]:
# Write a warning if not shown earlier
@@ -4205,7 +4205,7 @@
self._messages[index] = False
# Don't perform other checks if the data was already loaded
- if self._userData[index]:
+ if self._userData[index] and not force:
return
# Search for the the user page link at the top.
@@ -4363,7 +4363,7 @@
text = self.getUrl(url, sysop = sysop)
# Parse data
- self._getUserData(text, sysop = sysop)
+ self._getUserData(text, sysop = sysop, force = force)
def search(self, query, number = 10, namespaces = None):
"""Yield search results (using Special:Search page) for query."""
Revision: 5151
Author: btongminh
Date: 2008-03-22 09:46:36 +0000 (Sat, 22 Mar 2008)
Log Message:
-----------
Fix a regression introduced in r5037: Fetching an edit token when we already had an edit token did not work, even when the edit token has changed.
Modified Paths:
--------------
trunk/pywikipedia/wikipedia.py
Modified: trunk/pywikipedia/wikipedia.py
===================================================================
--- trunk/pywikipedia/wikipedia.py 2008-03-21 21:06:51 UTC (rev 5150)
+++ trunk/pywikipedia/wikipedia.py 2008-03-22 09:46:36 UTC (rev 5151)
@@ -4342,7 +4342,7 @@
except KeyError:
return False
- def _load(self, sysop = False):
+ def _load(self, sysop = False, force = False):
"""
Loads user data.
This is only done if we didn't do get any page yet and the information
@@ -4352,7 +4352,7 @@
* sysop - Get sysop user data?
"""
index = self._userIndex(sysop)
- if self._userData[index]:
+ if self._userData[index] and not force:
return
if verbose:
@@ -5374,7 +5374,7 @@
index = self._userIndex(sysop)
if getagain or (getalways and self._token[index] is None):
output(u'Getting a token.')
- self._load(sysop = sysop)
+ self._load(sysop = sysop, force = True)
if self._token[index] is not None:
return self._token[index]
else:
Bugs item #1909559, was opened at 2008-03-07 05:59
Message generated for change (Comment added) made by sf-robot
You can respond by visiting:
https://sourceforge.net/tracker/?func=detail&atid=603138&aid=1909559&group_…
Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: category
Group: None
>Status: Closed
Resolution: None
Priority: 5
Private: No
Submitted By: shizhao (wikishizhao)
Assigned to: Nobody/Anonymous (nobody)
Summary: wikipedia.getCategoryLinks NOT work
Initial Comment:
source code:
pg=wikipedia.Page(site,title)
text=pg.get()
plist=wikipedia.getCategoryLinks(text, site)
print plist
Return:
Checked for running processes. 1 processes currently running, including the current process.
[]
----------------------------------------------------------------------
>Comment By: SourceForge Robot (sf-robot)
Date: 2008-03-21 19:20
Message:
Logged In: YES
user_id=1312539
Originator: NO
This Tracker item was closed automatically by the system. It was
previously set to a Pending status, and the original submitter
did not respond within 14 days (the time period specified by
the administrator of this Tracker).
----------------------------------------------------------------------
Comment By: Rotem Liss (rotemliss)
Date: 2008-03-07 06:11
Message:
Logged In: YES
user_id=1327030
Originator: NO
Works for me in several pages. Which site and page did you use?
----------------------------------------------------------------------
You can respond by visiting:
https://sourceforge.net/tracker/?func=detail&atid=603138&aid=1909559&group_…
Revision: 5150
Author: filnik
Date: 2008-03-21 21:06:51 +0000 (Fri, 21 Mar 2008)
Log Message:
-----------
Patch by harriv - adding functionality
Modified Paths:
--------------
trunk/pywikipedia/wikipedia.py
Modified: trunk/pywikipedia/wikipedia.py
===================================================================
--- trunk/pywikipedia/wikipedia.py 2008-03-21 20:06:36 UTC (rev 5149)
+++ trunk/pywikipedia/wikipedia.py 2008-03-21 21:06:51 UTC (rev 5150)
@@ -1572,17 +1572,20 @@
"""
return [template for (template, param) in self.templatesWithParams()]
- def templatesWithParams(self):
+ def templatesWithParams(self, thistxt=None):
"""Return a list of templates used on this Page.
Return value is a list of tuples. There is one tuple for each use of
a template in the page, with the template title as the first entry
and a list of parameters as the second entry.
+
+ If thistxt is set, it is used instead of current page content.
"""
- try:
- thistxt = self.get()
- except (IsRedirectPage, NoPage):
- return []
+ if not thistxt:
+ try:
+ thistxt = self.get()
+ except (IsRedirectPage, NoPage):
+ return []
# remove commented-out stuff etc.
thistxt = removeDisabledParts(thistxt)