Bugs item #2770568, was opened at 2009-04-17 14:40
Message generated for change (Tracker Item Submitted) made by tabazzz
You can respond by visiting:
https://sourceforge.net/tracker/?func=detail&atid=603138&aid=2770568&group_…
Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: interwiki
Group: None
Status: Open
Resolution: None
Priority: 5
Private: No
Submitted By: TaBaZzz (tabazzz)
Assigned to: Nobody/Anonymous (nobody)
Summary: interwiki.py Error messages & more
Initial Comment:
hello.
performing the following command on the hebrew wikipedia:
python interwiki.py -confirm תקשורת_נתונים
gives some error messages, and displaying lots of garbage (probably the coded contents of the article). here is the error message:
Getting 1 pages from wikipedia:he...
Traceback (most recent call last):
File "/home/tal/pywikipedia/pagegenerators.py", line 790, in __iter__
for loaded_page in self.preload(somePages):
File "/home/tal/pywikipedia/pagegenerators.py", line 809, in preload
wikipedia.getall(site, pagesThisSite)
File "/home/tal/pywikipedia/wikipedia.py", line 3141, in getall
_GetAll(site, pages, throttle, force).run()
File "/home/tal/pywikipedia/wikipedia.py", line 2952, in run
data = self.getData()
File "/home/tal/pywikipedia/wikipedia.py", line 3124, in getData
response, data = self.site.postForm(address, predata)
File "/home/tal/pywikipedia/wikipedia.py", line 4580, in postForm
cookies=self.cookies(sysop = sysop))
File "/home/tal/pywikipedia/wikipedia.py", line 4638, in postData
data = data.decode(self.encoding())
File "/usr/lib64/python2.5/encodings/utf_8.py", line 16, in decode
return codecs.utf_8_decode(input, errors, True)
UnicodeDecodeError: 'utf8' codec can't decode byte 0xd7 in position 3131: unexpected end of data
'utf8' codec can't decode byte 0xd7 in position 3131: unexpected end of data
version:
Pywikipedia nightly:pywikipedia (r6611, Apr 16 2009, 15:41:15)
Python 2.5.1 (r251:54863, Jun 15 2008, 18:24:56)
[GCC 4.3.0 20080428 (Red Hat 4.3.0-8)]
----------------------------------------------------------------------
You can respond by visiting:
https://sourceforge.net/tracker/?func=detail&atid=603138&aid=2770568&group_…
Patches item #2769314, was opened at 2009-04-16 21:49
Message generated for change (Tracker Item Submitted) made by mstmst
You can respond by visiting:
https://sourceforge.net/tracker/?func=detail&atid=603140&aid=2769314&group_…
Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: None
Group: None
Status: Open
Resolution: None
Priority: 5
Private: No
Submitted By: masti (mstmst)
Assigned to: Nobody/Anonymous (nobody)
Summary: noreferences.py pl update
Initial Comment:
updates for pl
----------------------------------------------------------------------
You can respond by visiting:
https://sourceforge.net/tracker/?func=detail&atid=603140&aid=2769314&group_…
Patches item #2762697, was opened at 2009-04-14 20:14
Message generated for change (Comment added) made by drtrigon
You can respond by visiting:
https://sourceforge.net/tracker/?func=detail&atid=603140&aid=2762697&group_…
Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: None
Group: None
Status: Open
Resolution: None
Priority: 5
Private: No
Submitted By: Dr. Trigon (drtrigon)
Assigned to: Nobody/Anonymous (nobody)
Summary: Stability in wikipedia.py
Initial Comment:
Recently I've had some problems with the stability of 'wikipedia.put(...)' sometimes this method had some issues and that killed my bot.
I was able to track the problem down to 'wikipedia._getEditPage(...)' and following code (in 'wikipedia.py' around line 725):
********************************
while not textareaFound:
text = self.site().getUrl(path, sysop = sysop)
if text.find("<title>Wiki does not exist</title>") != -1:
raise NoSuchSite(u'Wiki %s does not exist yet' % self.site())
********************************
and I have changed it that way, to hold my bot/script running:
********************************
while not textareaFound:
try:
text = self.site().getUrl(path, sysop = sysop)
except:
time.sleep(1)
continue
if text.find("<title>Wiki does not exist</title>") != -1:
raise NoSuchSite(u'Wiki %s does not exist yet' % self.site())
********************************
and I am "pretty" sure that this solved my problem. :)
Probably you are also intressted in this solution?! I would assume that the delay of 1sec is neither critical nor needed.
Greetings
DrTrigon
----------------------------------------------------------------------
Comment By: Dr. Trigon (drtrigon)
Date: 2009-04-16 20:46
Message:
here is the error:
(<class 'socket.error'>, error(104, 'Connection reset by peer'),
<traceback object at 0x3595248>)
hope this helps you...?!
greetings
----------------------------------------------------------------------
Comment By: Dr. Trigon (drtrigon)
Date: 2009-04-15 11:21
Message:
...I could add some code to print the exception the next time, my bot is
down...?!
----------------------------------------------------------------------
Comment By: Dr. Trigon (drtrigon)
Date: 2009-04-15 11:19
Message:
Hello back!
I have to apologize because I don't have the error message anymore (I had
it once...) and the error is a bit hard to reproduce, since it is not
thrown very often... :(
You are right; my solution is a kind of "brute-force"... :) Yesterday,
after my first postings, I had a look into 'getUrl', since I was
remembering that it should catch (and did this in the past accordingly)
such kind of problems. What was strange; the error I had occurred
instantaneously after calling 'getUrl', without any time delay... And it
was (most of the time) on the toolserver, which might have another kind of
internet connection and therefore another failure behaviour (and throw
other exceptions)...?!
----------------------------------------------------------------------
Comment By: NicDumZ — Nicolas Dumazet (nicdumz)
Date: 2009-04-14 20:17
Message:
Hello !
Are you able to elaborate on what error was raised? There is probably a
nicer way to catch the error, at a lower level, instead of bluntly retrying
on error =)
----------------------------------------------------------------------
Comment By: Dr. Trigon (drtrigon)
Date: 2009-04-14 20:15
Message:
Sorry was NOT 'wikipedia.put(...)' was 'wikipedia.get(...)' !
----------------------------------------------------------------------
You can respond by visiting:
https://sourceforge.net/tracker/?func=detail&atid=603140&aid=2762697&group_…
Bugs item #2744221, was opened at 2009-04-08 15:44
Message generated for change (Comment added) made by cosoleto
You can respond by visiting:
https://sourceforge.net/tracker/?func=detail&atid=603138&aid=2744221&group_…
Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: General
Group: None
>Status: Closed
>Resolution: Fixed
Priority: 5
Private: No
Submitted By: Marcin Cieslak (saperski)
Assigned to: Nobody/Anonymous (nobody)
Summary: Get rid of "Checked for running processes. "
Initial Comment:
Whenever the pywikipedia bot is started, or "wikipedia" module imported, one gets the following message:
% python version.py
Checked for running processes. 1 processes currently running, including the current process.
Pywikipedia (r6577 (wikipedia.py), kwi 05 2009, 11:19:22)
Python 2.5.1 (r251:54863, Oct 18 2007, 01:42:40)
[GCC 3.3.2]
This is very annoying when using for example 'pydoc'.
Also I expect my bot running in cron to be quiet when for example no work is to be done.
Enabling logging in the bot does help, and one gets mailed every time when the cron executes the job.
(No, redirecting stderr is not an option since getting mail from cron is useful when things go south).
Can we make this off by default?
----------------------------------------------------------------------
>Comment By: Francesco Cosoleto (cosoleto)
Date: 2009-04-16 17:44
Message:
Changed in r6611.
----------------------------------------------------------------------
Comment By: NicDumZ — Nicolas Dumazet (nicdumz)
Date: 2009-04-12 14:30
Message:
I would agree with the idea, the information is not really relevant for
users.
But let's wait for a few more days, for some other opinions, shall we? =)
----------------------------------------------------------------------
Comment By: Francesco Cosoleto (cosoleto)
Date: 2009-04-12 14:18
Message:
It's annoying me too. Good candidate for verbose output. Any objection /
alternative?
----------------------------------------------------------------------
You can respond by visiting:
https://sourceforge.net/tracker/?func=detail&atid=603138&aid=2744221&group_…
Revision: 6611
Author: cosoleto
Date: 2009-04-16 15:41:15 +0000 (Thu, 16 Apr 2009)
Log Message:
-----------
Print 'Checked for running processes...' message only if verbose mode is enabled, as it isn't so important. See also bug #2744221
Modified Paths:
--------------
trunk/pywikipedia/wikipedia.py
Modified: trunk/pywikipedia/wikipedia.py
===================================================================
--- trunk/pywikipedia/wikipedia.py 2009-04-16 10:26:26 UTC (rev 6610)
+++ trunk/pywikipedia/wikipedia.py 2009-04-16 15:41:15 UTC (rev 6611)
@@ -3241,7 +3241,8 @@
f.write(str(p)+' '+str(processes[p])+'\n')
f.close()
self.process_multiplicity = count
- output(u"Checked for running processes. %s processes currently running, including the current process." % count)
+ if verbose:
+ output(u"Checked for running processes. %s processes currently running, including the current process." % count)
finally:
self.lock.release()
Support Requests item #2768412, was opened at 2009-04-16 13:49
Message generated for change (Tracker Item Submitted) made by xqt
You can respond by visiting:
https://sourceforge.net/tracker/?func=detail&atid=603139&aid=2768412&group_…
Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: None
Group: None
Status: Open
Priority: 5
Private: No
Submitted By: xqt (xqt)
Assigned to: Nobody/Anonymous (nobody)
Summary: SaxParsBugs
Initial Comment:
I got a lot of SaxParsBug_wikipedia_<..>.dump files but I don't know, what to do with this stuff.
----------------------------------------------------------------------
You can respond by visiting:
https://sourceforge.net/tracker/?func=detail&atid=603139&aid=2768412&group_…
Revision: 6610
Author: cosoleto
Date: 2009-04-16 10:26:26 +0000 (Thu, 16 Apr 2009)
Log Message:
-----------
Removed a 'scrubxml' function incorrectly added to Site class from r6567. Changed mediawiki_message, instead.
Modified Paths:
--------------
trunk/pywikipedia/wikipedia.py
Modified: trunk/pywikipedia/wikipedia.py
===================================================================
--- trunk/pywikipedia/wikipedia.py 2009-04-16 08:30:47 UTC (rev 6609)
+++ trunk/pywikipedia/wikipedia.py 2009-04-16 10:26:26 UTC (rev 6610)
@@ -4901,16 +4901,6 @@
# Token not found
output(u'WARNING: Token not found on %s. You will not be able to edit any page.' % self)
- def scrubxml(self, xml):
- """scrub the start of xml input, to make things work, even
- when crap is inserted ahead of the actual xml data.
- (such as when php reports strict warnings)"""
- start = xml.find('<?xml')
- if start < 0:
- # '<?xml' not found ? Should not happen.
- return ""
- return xml[start:]
-
def mediawiki_message(self, key):
"""Return the MediaWiki message text for key "key" """
# Allmessages is retrieved once for all per created Site object
@@ -4964,8 +4954,14 @@
# </messages>
if elementtree:
decode = xml.encode(self.encoding())
- clean = self.scrubxml(decode)
- tree = XML(clean)
+
+ # Skip extraneous data such as PHP warning or extra
+ # whitespaces added from some MediaWiki extensions
+ xml_dcl_pos = decode.find('<?xml')
+ if xml_dcl_pos > 0:
+ decode = decode[xml_dcl_pos:]
+
+ tree = XML(decode)
self._mediawiki_messages = _dict([(tag.get('name').lower(), tag.text)
for tag in tree.getiterator('message')])
else:
Bugs item #2765250, was opened at 2009-04-15 11:53
Message generated for change (Comment added) made by nobody
You can respond by visiting:
https://sourceforge.net/tracker/?func=detail&atid=603138&aid=2765250&group_…
Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: interwiki
Group: None
Status: Open
Resolution: None
Priority: 5
Private: No
Submitted By: Nobody/Anonymous (nobody)
Assigned to: Nobody/Anonymous (nobody)
Summary: interwiki goes over only one new file
Initial Comment:
Pywikipedia nightly:pywikipedia (r6605, Apr 14 2009, 16:55:58)
Python 2.5.1 (r251:54863, Jun 15 2008, 18:24:56)
[GCC 4.3.0 20080428 (Red Hat 4.3.0-8)]
command:
python interwiki.py -namespace:6 -new
runs only on one new file, not 100.
trying to specify other number (eg. -new:10) will also run on one and not according to the number specified.
doesn't occure on other namespaces.
----------------------------------------------------------------------
Comment By: Nobody/Anonymous (nobody)
Date: 2009-04-16 08:14
Message:
sorry.
probably an error with the mediawiki server.
please close this bug.
----------------------------------------------------------------------
You can respond by visiting:
https://sourceforge.net/tracker/?func=detail&atid=603138&aid=2765250&group_…
Bugs item #2767772, was opened at 2009-04-16 14:58
Message generated for change (Tracker Item Submitted) made by wikishizhao
You can respond by visiting:
https://sourceforge.net/tracker/?func=detail&atid=603138&aid=2767772&group_…
Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: None
Group: None
Status: Open
Resolution: None
Priority: 5
Private: No
Submitted By: shizhao (wikishizhao)
Assigned to: Nobody/Anonymous (nobody)
Summary: weblinkchecker.py bug
Initial Comment:
weblinkchecker.py can't to distinguish like: <ref>{{cite web|url=http://xxx.xxx/xxx.html|| title= XXXXXXX}}</ref> ( in between http://xxx.xxx/xxx.html and "|" not have blank space).
see http://zh.wikipedia.org/wiki/Talk:Wii_Sports
Pywikipedia [http] trunk/pywikipedia (r6568, Apr 01 2009, 11:18:59)
Python 2.5.2 (r252:60911, Oct 5 2008, 19:29:17)
[GCC 4.3.2]
----------------------------------------------------------------------
You can respond by visiting:
https://sourceforge.net/tracker/?func=detail&atid=603138&aid=2767772&group_…