Revision: 6604
Author: nicdumz
Date: 2009-04-14 16:05:46 +0000 (Tue, 14 Apr 2009)
Log Message:
-----------
[1783487] "Interwiki crash on deleted page" :
getting the pagetext before, to avoid raising exceptions.
If (race condition) the page gets deleted in between,
an EditConflict Error will be raised later on save.
Modified Paths:
--------------
trunk/pywikipedia/interwiki.py
Modified: trunk/pywikipedia/interwiki.py
===================================================================
--- trunk/pywikipedia/interwiki.py 2009-04-14 06:00:54 UTC (rev 6603)
+++ trunk/pywikipedia/interwiki.py 2009-04-14 16:05:46 UTC (rev 6604)
@@ -1174,8 +1174,9 @@
# This is not a page, but a subpage. Do not edit it.
wikipedia.output(u"Not editing %s: not doing interwiki on subpages" % page.aslink(True))
raise SaveError
-
- if not page.exists():
+ try:
+ pagetext = page.get()
+ except wikipedia.NoPage:
wikipedia.output(u"Not editing %s: page does not exist" % page.aslink(True))
raise SaveError
@@ -1186,7 +1187,7 @@
new = dict(newPages)
# remove interwiki links to ignore
- for iw in re.finditer('<!-- *\[\[(.*?:.*?)\]\] *-->', page.get()):
+ for iw in re.finditer('<!-- *\[\[(.*?:.*?)\]\] *-->', pagetext):
try:
ignorepage = wikipedia.Page(page.site(), iw.groups()[0])
except (wikipedia.NoSuchSite, wikipedia.InvalidTitle):
Bugs item #2114223, was opened at 2008-09-16 15:46
Message generated for change (Comment added) made by nicdumz
You can respond by visiting:
https://sourceforge.net/tracker/?func=detail&atid=603138&aid=2114223&group_…
Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: interwiki
Group: None
>Status: Closed
>Resolution: Fixed
Priority: 5
Private: No
Submitted By: André Malafaya Baptista (malafaya)
Assigned to: Nobody/Anonymous (nobody)
Summary: Socket timeout breaks out
Initial Comment:
VERSION.PY
==========
Pywikipedia [svn+ssh] wikimedia/svnroot/pywikipedia/trunk/pywikipedia (r5898, Se
p 16 2008, 11:50:17)
Python 2.5.2 (r252:60911, Feb 21 2008, 13:11:45) [MSC v.1310 32 bit (Intel)]
DESCRIPTION
===========
It's been happening in the past days that a socket timeout interrupts the bot. I believe the stack trace below is self-explanatory.
I used the command line:
interwiki.py -family:wiktionary -autonomous -start:Category:! -lang:io
OUTPUT
======
NOTE: The first unfinished subject is [[io:Kategorio:Albaniana vorti]]
NOTE: Number of pages queued is 59, trying to add 60 more.
Sleeping for 4.1 seconds, 2008-09-16 14:31:06
Dump io (wiktionary) saved
Traceback (most recent call last):
File "D:\Work\pywikipediabot-HEAD\pywikipedia\interwiki.py", line 1735, in <module>
bot.run()
File "D:\Work\pywikipediabot-HEAD\pywikipedia\interwiki.py", line 1486, in run
self.queryStep()
File "D:\Work\pywikipediabot-HEAD\pywikipedia\interwiki.py", line 1460, in queryStep
self.oneQuery()
File "D:\Work\pywikipediabot-HEAD\pywikipedia\interwiki.py", line 1428, in oneQuery
site = self.selectQuerySite()
File "D:\Work\pywikipediabot-HEAD\pywikipedia\interwiki.py", line 1402, in selectQuerySite
self.generateMore(globalvar.maxquerysize - mycount)
File "D:\Work\pywikipediabot-HEAD\pywikipedia\interwiki.py", line 1336, in generateMore
page = self.pageGenerator.next()
File "D:\Work\pywikipediabot-HEAD\pywikipedia\pagegenerators.py", line 688, in
DuplicateFilterPageGenerator
for page in generator:
File "D:\Work\pywikipediabot-HEAD\pywikipedia\pagegenerators.py", line 239, in
AllpagesPageGenerator
for page in site.allpages(start = start, namespace = namespace, includeredir
ects = includeredirects):
File "D:\Work\pywikipediabot-HEAD\pywikipedia\wikipedia.py", line 5166, in allpages
text = self.getUrl(api_url)
File "D:\Work\pywikipediabot-HEAD\pywikipedia\wikipedia.py", line 4485, in getUrl
text = f.read()
File "D:\Program Files\Python\lib\socket.py", line 291, in read
data = self._sock.recv(recv_size)
socket.timeout: timed out
----------------------------------------------------------------------
>Comment By: NicDumZ — Nicolas Dumazet (nicdumz)
Date: 2009-04-14 17:58
Message:
This has been fixed in r6586.
----------------------------------------------------------------------
Comment By: Mikko Silvonen (silvonen)
Date: 2008-11-24 20:31
Message:
My autonomous run was interrupted twice today because of a socket timeout.
I think the problem is server-related, as I have a 110 Mbps / 5 Mbps
connection.
Traceback (most recent call last):
File "interwiki.py", line 1769, in <module>
bot.run()
File "interwiki.py", line 1518, in run
self.queryStep()
File "interwiki.py", line 1492, in queryStep
self.oneQuery()
File "interwiki.py", line 1488, in oneQuery
subject.workDone(self)
File "interwiki.py", line 792, in workDone
iw = page.interwiki()
File "c:\svn\pywikipedia\wikipedia.py", line 1691, in interwiki
ll = getLanguageLinks(self.get(), insite=self.site(),
File "c:\svn\pywikipedia\wikipedia.py", line 668, in get
self._contents = self._getEditPage(get_redirect = get_redirect,
throttle = throttle, sysop = sysop)
File "c:\svn\pywikipedia\wikipedia.py", line 712, in _getEditPage
text = self.site().getUrl(path, sysop = sysop)
File "c:\svn\pywikipedia\wikipedia.py", line 4589, in getUrl
text = f.read()
File "C:\Python25\lib\socket.py", line 291, in read
data = self._sock.recv(recv_size)
socket.timeout: timed out
C:\svn\pywikipedia>python version.py
Pywikipedia [http] trunk/pywikipedia (r6114, Nov 23 2008, 12:41:02)
Python 2.5.1 (r251:54863, May 1 2007, 17:47:05) [MSC v.1310 32 bit
(Intel)]
----------------------------------------------------------------------
Comment By: NicDumZ — Nicolas Dumazet (nicdumz)
Date: 2008-09-20 03:41
Message:
the pagegenerator, even with the new api implementation, seems to be
working, I'm currently listing the pages of eo.wikt without any timeout.
Your connection might just be slower than usual ? Or does it timeout when
the WM websites are under heavy load ?
You can tweak the socket timeout in user-config.py, setting socket_timeout
to the number of seconds to wait (default is 120 seconds, quite long...)
----------------------------------------------------------------------
You can respond by visiting:
https://sourceforge.net/tracker/?func=detail&atid=603138&aid=2114223&group_…
Bugs item #2664941, was opened at 2009-03-05 13:16
Message generated for change (Comment added) made by nicdumz
You can respond by visiting:
https://sourceforge.net/tracker/?func=detail&atid=603138&aid=2664941&group_…
Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: interwiki
Group: None
>Status: Closed
>Resolution: Invalid
Priority: 5
Private: No
Submitted By: Nobody/Anonymous (nobody)
Assigned to: Nobody/Anonymous (nobody)
Summary: Automatic ignoring
Initial Comment:
Hello,
there is one problem with some "stupid editors" (meaning software) which changes special characters to ?
Unfortunatelly this change in interwiki sometimes change e.g [[zh:水]] to [[zh:?]] which is redirect to article about quetion mark
http://lmo.wikipedia.org/w/index.php?title=Aqua&diff=342464&oldid=342114
MAybe implement hard "-ignore:zh:? -ignore:ja:? -ignore:ko?"
This case is the most often one
----------------------------------------------------------------------
>Comment By: NicDumZ — Nicolas Dumazet (nicdumz)
Date: 2009-04-14 17:56
Message:
invalid
----------------------------------------------------------------------
Comment By: Nobody/Anonymous (nobody)
Date: 2009-03-05 13:55
Message:
so, his problem is that non ASCII interwikis are seldomly lost on a page
(here, because of a bad copy past
http://lmo.wikipedia.org/w/index.php?title=Aqua&diff=342113&oldid=13741 )
and he want any interwiki to be ignored if they're of the form of
[[xx:?]], [[xx:????]] etc. because they cause the page not to be done by
interwiki.py on -autonomous mode : the script think there an interwiki
conflict between the normal article interwikis and the interwikis of the
[[?]] articles that gets mixed in .
----------------------------------------------------------------------
Comment By: Nobody/Anonymous (nobody)
Date: 2009-03-05 13:55
Message:
Yes, bot corrected it, because was running with -ignore:zh:?
But before someone runs its bot to any article like this can be long long
time
----------------------------------------------------------------------
Comment By: NicDumZ — Nicolas Dumazet (nicdumz)
Date: 2009-03-05 13:42
Message:
I don't get it =)
>From what I see in the diff you provided, the bot behaves correctly.
The question marks were introduced by an user, here
http://lmo.wikipedia.org/w/index.php?title=Aqua&diff=prev&oldid=342113
Why would you ignore some wikis when working in interwikis ? Here, the bot
does perfectly its job: it fixes the errors introduced by the user. !???!!!
----------------------------------------------------------------------
You can respond by visiting:
https://sourceforge.net/tracker/?func=detail&atid=603138&aid=2664941&group_…
Bugs item #1790473, was opened at 2007-09-08 00:50
Message generated for change (Comment added) made by nicdumz
You can respond by visiting:
https://sourceforge.net/tracker/?func=detail&atid=603138&aid=1790473&group_…
Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: None
Group: None
>Status: Closed
>Resolution: Fixed
Priority: 7
Private: No
Submitted By: Daniel Herding (wikipedian)
Assigned to: Nobody/Anonymous (nobody)
Summary: Interwiki bot overwrites changes, no edit conflict
Initial Comment:
This has recently happened:
http://de.wikipedia.org/w/index.php?title=Wiki&diff=36448315&oldid=36447898
The only reason I can think of is some obscure error with starttime/edittime/tokens/stuff like that.
Maybe this assumption in GetAll doesn't always work as expected?
# There's no possibility to read the wpStarttime argument from the XML.
# It is this time that the MediaWiki software uses to check for edit
# conflicts. We take the earliest time later than the last edit, which
# seems to be the safest possible time.
page2._startTime = str(int(timestamp)+1)
----------------------------------------------------------------------
>Comment By: NicDumZ — Nicolas Dumazet (nicdumz)
Date: 2009-04-14 17:54
Message:
Fixed recently, see
https://sourceforge.net/tracker/?func=detail&atid=603138&aid=2119685&group_…
----------------------------------------------------------------------
Comment By: pipep (pipep)
Date: 2007-09-13 23:01
Message:
Logged In: YES
user_id=1889961
Originator: NO
Same problem here:
http://de.wikipedia.org/w/index.php?title=Vereinigte_Staaten&diff=prev&oldi…
Interwiki bot in autonomous mode reverted text to older version.
----------------------------------------------------------------------
You can respond by visiting:
https://sourceforge.net/tracker/?func=detail&atid=603138&aid=1790473&group_…
Bugs item #2164505, was opened at 2008-10-13 22:55
Message generated for change (Comment added) made by nicdumz
You can respond by visiting:
https://sourceforge.net/tracker/?func=detail&atid=603138&aid=2164505&group_…
Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: interwiki
Group: None
Status: Open
Resolution: None
Priority: 7
Private: No
Submitted By: Purodha B Blissenbach (purodha)
Assigned to: Nobody/Anonymous (nobody)
Summary: interwiki.py deletes comment lines
Initial Comment:
interwiki.py removes comment lines which should remain where they are, see http://nn.wikipedia.org/w/index.php?title=Kategori%3AAfghanistan&diff=75132…
AndersL of the Nynorsk Wikipedia tells that, there were multiple such sinstances, see
http://ksh.wikipedia.org/w/index.php?title=Metmaacher_Klaaf%3APurodha&diff=…
The program version of the above sample was of that day, or the day before. I am updating the program for Purbo T daily from svn.
----------------------------------------------------------------------
>Comment By: NicDumZ — Nicolas Dumazet (nicdumz)
Date: 2009-04-14 17:49
Message:
Purodha, have you seen that bug recently?
If not, given my failed test, I'll close this as Can't reproduce
----------------------------------------------------------------------
Comment By: NicDumZ — Nicolas Dumazet (nicdumz)
Date: 2008-10-20 05:41
Message:
I have tried to track the bug ; attached is my attempt to reproduce it, on
the exact same page.
Somehow I cannot reproduce it... Am I missing something here ?
----------------------------------------------------------------------
You can respond by visiting:
https://sourceforge.net/tracker/?func=detail&atid=603138&aid=2164505&group_…
Revision: 6568
Author: nicdumz
Date: 2009-04-01 11:18:59 +0000 (Wed, 01 Apr 2009)
Log Message:
-----------
Cleaning previous commit :
* Cleaning scrubxml() implementation
* Applying scrubxml AFTER decoding the string to unicode
Modified Paths:
--------------
trunk/pywikipedia/wikipedia.py
Modified: trunk/pywikipedia/wikipedia.py
===================================================================
--- trunk/pywikipedia/wikipedia.py 2009-04-01 10:45:17 UTC (rev 6567)
+++ trunk/pywikipedia/wikipedia.py 2009-04-01 11:18:59 UTC (rev 6568)
@@ -4893,23 +4893,15 @@
# Token not found
output(u'WARNING: Token not found on %s. You will not be able to edit any page.' % self)
- def scrubxml(self,xml):
+ def scrubxml(self, xml):
"""scrub the start of xml input, to make things work, even
- when crap is inserted ahead of the actual xml data. (such as when php reports strict
- warnings)"""
- xml2=""
- start=False
- warn=False
- for line in xml.split("\n"):
- if line.startswith("<?xml"):
- start=True
- else:
- warn=True
- if start:
- xml2+=line+"\n"
- if warn==True:
- pass #TODO: we could issue a warning for broken xml
- return xml2
+ when crap is inserted ahead of the actual xml data.
+ (such as when php reports strict warnings)"""
+ start = xml.find('<?xml')
+ if start < 0:
+ # '<?xml' not found ? Should not happen.
+ return ""
+ return xml[start:]
def mediawiki_message(self, key):
"""Return the MediaWiki message text for key "key" """
@@ -4957,7 +4949,6 @@
else:
xml = self.getUrl(self.get_address("Special:Allmessages")
+ "&ot=xml")
- xml=self.scrubxml(xml)
# xml structure is :
# <messages lang="fr">
# <message name="about">À propos</message>
@@ -4965,7 +4956,8 @@
# </messages>
if elementtree:
decode = xml.encode(self.encoding())
- tree = XML(decode)
+ clean = self.scrubxml(decode)
+ tree = XML(clean)
self._mediawiki_messages = _dict([(tag.get('name').lower(), tag.text)
for tag in tree.getiterator('message')])
else:
Bugs item #2709338, was opened at 2009-03-24 13:19
Message generated for change (Comment added) made by cosoleto
You can respond by visiting:
https://sourceforge.net/tracker/?func=detail&atid=603138&aid=2709338&group_…
Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: None
Group: None
>Status: Closed
>Resolution: Duplicate
Priority: 6
Private: No
Submitted By: Purodha B Blissenbach (purodha)
Assigned to: Nobody/Anonymous (nobody)
Summary: Weblinkchecker not working
Initial Comment:
Here is the error output:
python /.../pywikipedia/weblinkchecker.py -start:! -v
Checked for running processes. 1 processes currently running, including the current process.
Pywikipediabot (r6439 (wikipedia.py), Feb 24 2009, 21:48:26)
Python 2.5.2 (r252:60911, Jan 4 2009, 21:59:32)
[GCC 4.3.2]
Traceback (most recent call last):
File "/home/.../pywikipedia/pagegenerators.py", line 787, in __iter__
for page in self.wrapped_gen:
File "/home/.../pywikipedia/pagegenerators.py", line 709, in DuplicateFilterPageGenerator
for page in generator:
File "/home/.../pywikipedia/pagegenerators.py", line 248, in AllpagesPageGenerator
for page in site.allpages(start = start, namespace = namespace, includeredirects = includeredirects):
File "/home/.../pywikipedia/wikipedia.py", line 5502, in allpages
for p in soup.api.query.allpages:
AttributeError: 'NoneType' object has no attribute 'allpages'
'NoneType' object has no attribute 'allpages'
Saving history...
----------------------------------------------------------------------
>Comment By: Francesco Cosoleto (cosoleto)
Date: 2009-04-13 16:37
Message:
Duplicate of bug #2693183
----------------------------------------------------------------------
You can respond by visiting:
https://sourceforge.net/tracker/?func=detail&atid=603138&aid=2709338&group_…