Patches item #1798753, was opened at 2007-09-20 15:23
Message generated for change (Comment added) made by wikipedian
You can respond by visiting:
https://sourceforge.net/tracker/?func=detail&atid=603140&aid=1798753&group_…
Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: None
Group: None
Status: Open
>Resolution: Rejected
Priority: 5
Private: No
Submitted By: Filnik (filnik)
Assigned to: Nobody/Anonymous (nobody)
Summary: Fixed the \n problem
Initial Comment:
Some times ago, I've asked to fix the bug in replace.py that doesn't replace \n with a newline. I've add two lines on wikipedia.py to prevent this bug (it works and shouldn't make problems also if it is done in a simple way). The patch for wikipedia.py is attached. Filnik
----------------------------------------------------------------------
>Comment By: Daniel Herding (wikipedian)
Date: 2007-09-20 16:45
Message:
Logged In: YES
user_id=880694
Originator: NO
This is not safe. See:
http://de.wikipedia.org/w/index.php?title=Benutzer:Head/Spielwiese&diff=369…
----------------------------------------------------------------------
You can respond by visiting:
https://sourceforge.net/tracker/?func=detail&atid=603140&aid=1798753&group_…
Revision: 4335
Author: wikipedian
Date: 2007-09-20 14:41:33 +0000 (Thu, 20 Sep 2007)
Log Message:
-----------
Fixed NoPage bug when there are duplicate pages in a getall() call.
See bug [ 1787776 ] loading the same page twice leads to NoPage
Modified Paths:
--------------
trunk/pywikipedia/wikipedia.py
Modified: trunk/pywikipedia/wikipedia.py
===================================================================
--- trunk/pywikipedia/wikipedia.py 2007-09-20 13:51:30 UTC (rev 4334)
+++ trunk/pywikipedia/wikipedia.py 2007-09-20 14:41:33 UTC (rev 4335)
@@ -2210,11 +2210,11 @@
self.throttle = throttle
self.force = force
- for pl in pages:
- if (not hasattr(pl,'_contents') and not hasattr(pl,'_getexception')) or force:
- self.pages.append(pl)
+ for page in pages:
+ if (not hasattr(page, '_contents') and not hasattr(page, '_getexception')) or force:
+ self.pages.append(page)
elif verbose:
- output(u"BUGWARNING: %s already done!" % pl.aslink())
+ output(u"BUGWARNING: %s already done!" % page.aslink())
def run(self):
dt=15
@@ -2278,49 +2278,51 @@
editRestriction = entry.editRestriction
moveRestriction = entry.moveRestriction
page = Page(self.site, title)
+ successful = False
for page2 in self.pages:
if page2.sectionFreeTitle() == page.sectionFreeTitle():
- if (hasattr(page2,'_contents') or hasattr(page2,'_getexception')) and not self.force:
- return
- break
- else:
+ if not (hasattr(page2,'_contents') or hasattr(page2,'_getexception')) or self.force:
+ page2.editRestriction = entry.editRestriction
+ page2.moveRestriction = entry.moveRestriction
+ if editRestriction == 'autoconfirmed':
+ page2._editrestriction = True
+ page2._permalink = entry.revisionid
+ page2._userName = username
+ page2._ipedit = ipedit
+ page2._editTime = timestamp
+ section = page2.section()
+ m = self.site.redirectRegex().match(text)
+ if m:
+ ## output(u"%s is a redirect" % page2.aslink())
+ redirectto = m.group(1)
+ if section and redirectto.find("#") == -1:
+ redirectto = redirectto+"#"+section
+ page2._getexception = IsRedirectPage
+ page2._redirarg = redirectto
+ # There's no possibility to read the wpStarttime argument from the XML.
+ # It is this time that the MediaWiki software uses to check for edit
+ # conflicts. We take the earliest time later than the last edit, which
+ # seems to be the safest possible time.
+ page2._startTime = str(int(timestamp)+1)
+ if section:
+ m = re.search("\.3D\_*(\.27\.27+)?(\.5B\.5B)?\_*%s\_*(\.5B\.5B)?(\.27\.27+)?\_*\.3D" % re.escape(section), sectionencode(text,page2.site().encoding()))
+ if not m:
+ try:
+ page2._getexception
+ output(u"WARNING: Section not found: %s" % page2.aslink(forceInterwiki = True))
+ except AttributeError:
+ # There is no exception yet
+ page2._getexception = SectionError
+ # Store the content
+ page2._contents = text
+ successful = True
+ # Note that there is no break here. The reason is that there
+ # might be duplicates in the pages list.
+ if not successful:
output(u"BUG>> title %s (%s) not found in list" % (title, page.aslink(forceInterwiki=True)))
output(u'Expected one of: %s' % u','.join([page2.aslink(forceInterwiki=True) for page2 in self.pages]))
raise PageNotFound
- page2.editRestriction = entry.editRestriction
- page2.moveRestriction = entry.moveRestriction
- if editRestriction == 'autoconfirmed':
- page2._editrestriction = True
- page2._permalink = entry.revisionid
- page2._userName = username
- page2._ipedit = ipedit
- page2._editTime = timestamp
- section = page2.section()
- m = self.site.redirectRegex().match(text)
- if m:
-## output(u"%s is a redirect" % page2.aslink())
- redirectto = m.group(1)
- if section and redirectto.find("#") == -1:
- redirectto = redirectto+"#"+section
- page2._getexception = IsRedirectPage
- page2._redirarg = redirectto
- # There's no possibility to read the wpStarttime argument from the XML.
- # It is this time that the MediaWiki software uses to check for edit
- # conflicts. We take the earliest time later than the last edit, which
- # seems to be the safest possible time.
- page2._startTime = str(int(timestamp)+1)
- if section:
- m = re.search("\.3D\_*(\.27\.27+)?(\.5B\.5B)?\_*%s\_*(\.5B\.5B)?(\.27\.27+)?\_*\.3D" % re.escape(section), sectionencode(text,page2.site().encoding()))
- if not m:
- try:
- page2._getexception
- output(u"WARNING: Section not found: %s" % page2.aslink(forceInterwiki = True))
- except AttributeError:
- # There is no exception yet
- page2._getexception = SectionError
- # Store the content
- page2._contents = text
def headerDone(self, header):
# Verify our family data
Bugs item #1787776, was opened at 2007-09-04 16:03
Message generated for change (Settings changed) made by wikipedian
You can respond by visiting:
https://sourceforge.net/tracker/?func=detail&atid=603138&aid=1787776&group_…
Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: None
Group: None
>Status: Closed
>Resolution: Fixed
Priority: 6
Private: No
Submitted By: Andre Engels (a_engels)
Assigned to: Nobody/Anonymous (nobody)
Summary: loading the same page twice leads to NoPage
Initial Comment:
Note: This describes my impression of what is going on in this bug, I might be wrong as to what goes wrong.
When there are two pages A and B, which both have an interwiki link to the same page C in another language, and those two pages are dealt with by interwiki.py at once, the second one will not find C and think that it is a non-existing page. I guess that points toward a mistake in the parsing of the output of [[Special:Export]].
----------------------------------------------------------------------
Comment By: Andre Engels (a_engels)
Date: 2007-09-05 09:53
Message:
Logged In: YES
user_id=843018
Originator: YES
As a test case:
interwiki.py -site:wikipedia -lang:nl -cat:Andretest -localonly
This will try to work on two pages, both having an interwiki to the same
(existing) page on en:. The first page will not be changed, but the second
will report the English page as not existing.
----------------------------------------------------------------------
Comment By: Merlijn S. van Deen (valhallasw)
Date: 2007-09-04 16:05
Message:
Logged In: YES
user_id=687283
Originator: NO
Any test case or a 'how to reproduce'? ;)
----------------------------------------------------------------------
You can respond by visiting:
https://sourceforge.net/tracker/?func=detail&atid=603138&aid=1787776&group_…
Bugs item #1798800, was opened at 2007-09-20 16:37
Message generated for change (Tracker Item Submitted) made by Item Submitter
You can respond by visiting:
https://sourceforge.net/tracker/?func=detail&atid=603138&aid=1798800&group_…
Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: None
Group: None
Status: Open
Resolution: None
Priority: 5
Private: No
Submitted By: Roberto Zanon (qualc1)
Assigned to: Nobody/Anonymous (nobody)
Summary: pagefromfile.py: bug with -notitle and -summary arguments
Initial Comment:
The patch make working -summary (if specified -summary it uses it) and -notitle (it doesn't do anything if "-notitle" is used without this path).
----------------------------------------------------------------------
You can respond by visiting:
https://sourceforge.net/tracker/?func=detail&atid=603138&aid=1798800&group_…
Bugs item #1795683, was opened at 2007-09-16 10:05
Message generated for change (Comment added) made by cosoleto
You can respond by visiting:
https://sourceforge.net/tracker/?func=detail&atid=603138&aid=1795683&group_…
Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: General
Group: None
Status: Open
Resolution: None
Priority: 5
Private: No
Submitted By: Jani Patokallio (jpatokal)
Assigned to: Nobody/Anonymous (nobody)
Summary: No error message if download is interrupted
Initial Comment:
I'm using imagetransfer.py to download some images off a Mediawiki site, using the following very straightforward code:
uo = wikipedia.MyURLopener()
img = uo.open(url)
file = open(targetFile, "w");
file.write(img.read())
file.close()
img.close()
However, I found out the hard way that there is no warning of any kind if the download is interrupted halfway through for any reason. Worse yet, there is no practical way to check if the file was downloaded successfully: the MD5 checksum function requires downloading the image and is thus subject to the same bug! The only way to determine even the file's actual size requires hacking through the file version history, and the getFileVersionHistory() command seems to break against the Mediawiki version of Wikitravel anyway.
----------------------------------------------------------------------
Comment By: Francesco Cosoleto (cosoleto)
Date: 2007-09-20 16:14
Message:
Logged In: YES
user_id=181280
Originator: NO
*in r4334, not r3443.
----------------------------------------------------------------------
Comment By: Francesco Cosoleto (cosoleto)
Date: 2007-09-20 16:08
Message:
Logged In: YES
user_id=181280
Originator: NO
getFileVersionHistory() problem fixed in revision 4305.
upload_image() problems related to interrupted download improved in r3443.
----------------------------------------------------------------------
Comment By: Francesco Cosoleto (cosoleto)
Date: 2007-09-17 18:23
Message:
Logged In: YES
user_id=181280
Originator: NO
Try this:
http://sourceforge.net/tracker/index.php?func=detail&aid=1796316&group_id=9….
----------------------------------------------------------------------
Comment By: Nobody/Anonymous (nobody)
Date: 2007-09-17 04:11
Message:
Logged In: NO
So I managed to fix getFileVersionHistory by borrowing the code from an
older version of pywikipediabot that works with Mediawiki 1.10.1, and now I
can at least compare sizes. Can you make your patch available?
----------------------------------------------------------------------
Comment By: Francesco Cosoleto (cosoleto)
Date: 2007-09-16 19:05
Message:
Logged In: YES
user_id=181280
Originator: NO
I have tried to fix your getVersionHistory() problem. You can read file
length in HTTP headers too:
f.info().getheader('Content-Length')
I have written a improvised and fully untested patch to wikipedia.getUrl()
with check length and transfer resume feature, but I am not sure it's
useful.
----------------------------------------------------------------------
You can respond by visiting:
https://sourceforge.net/tracker/?func=detail&atid=603138&aid=1795683&group_…
Bugs item #1795683, was opened at 2007-09-16 10:05
Message generated for change (Comment added) made by cosoleto
You can respond by visiting:
https://sourceforge.net/tracker/?func=detail&atid=603138&aid=1795683&group_…
Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: General
Group: None
Status: Open
Resolution: None
Priority: 5
Private: No
Submitted By: Jani Patokallio (jpatokal)
Assigned to: Nobody/Anonymous (nobody)
Summary: No error message if download is interrupted
Initial Comment:
I'm using imagetransfer.py to download some images off a Mediawiki site, using the following very straightforward code:
uo = wikipedia.MyURLopener()
img = uo.open(url)
file = open(targetFile, "w");
file.write(img.read())
file.close()
img.close()
However, I found out the hard way that there is no warning of any kind if the download is interrupted halfway through for any reason. Worse yet, there is no practical way to check if the file was downloaded successfully: the MD5 checksum function requires downloading the image and is thus subject to the same bug! The only way to determine even the file's actual size requires hacking through the file version history, and the getFileVersionHistory() command seems to break against the Mediawiki version of Wikitravel anyway.
----------------------------------------------------------------------
Comment By: Francesco Cosoleto (cosoleto)
Date: 2007-09-20 16:08
Message:
Logged In: YES
user_id=181280
Originator: NO
getFileVersionHistory() problem fixed in revision 4305.
upload_image() problems related to interrupted download improved in r3443.
----------------------------------------------------------------------
Comment By: Francesco Cosoleto (cosoleto)
Date: 2007-09-17 18:23
Message:
Logged In: YES
user_id=181280
Originator: NO
Try this:
http://sourceforge.net/tracker/index.php?func=detail&aid=1796316&group_id=9….
----------------------------------------------------------------------
Comment By: Nobody/Anonymous (nobody)
Date: 2007-09-17 04:11
Message:
Logged In: NO
So I managed to fix getFileVersionHistory by borrowing the code from an
older version of pywikipediabot that works with Mediawiki 1.10.1, and now I
can at least compare sizes. Can you make your patch available?
----------------------------------------------------------------------
Comment By: Francesco Cosoleto (cosoleto)
Date: 2007-09-16 19:05
Message:
Logged In: YES
user_id=181280
Originator: NO
I have tried to fix your getVersionHistory() problem. You can read file
length in HTTP headers too:
f.info().getheader('Content-Length')
I have written a improvised and fully untested patch to wikipedia.getUrl()
with check length and transfer resume feature, but I am not sure it's
useful.
----------------------------------------------------------------------
You can respond by visiting:
https://sourceforge.net/tracker/?func=detail&atid=603138&aid=1795683&group_…
Patches item #1796316, was opened at 2007-09-17 18:20
Message generated for change (Comment added) made by cosoleto
You can respond by visiting:
https://sourceforge.net/tracker/?func=detail&atid=603140&aid=1796316&group_…
Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: None
Group: None
>Status: Closed
>Resolution: Accepted
Priority: 5
Private: No
Submitted By: Francesco Cosoleto (cosoleto)
Assigned to: Nobody/Anonymous (nobody)
Summary: upload.py, upload_image(), check length and resume feature
Initial Comment:
Untested.
----------------------------------------------------------------------
>Comment By: Francesco Cosoleto (cosoleto)
Date: 2007-09-20 15:52
Message:
Logged In: YES
user_id=181280
Originator: YES
Fixed again, tested and applied in 4334.
----------------------------------------------------------------------
Comment By: Francesco Cosoleto (cosoleto)
Date: 2007-09-18 19:06
Message:
Logged In: YES
user_id=181280
Originator: YES
Updated. It seems work and more safe than previous code. Not fully tested
with real server.
File Added: upload.py.diff
----------------------------------------------------------------------
You can respond by visiting:
https://sourceforge.net/tracker/?func=detail&atid=603140&aid=1796316&group_…
Patches item #1798753, was opened at 2007-09-20 13:23
Message generated for change (Tracker Item Submitted) made by Item Submitter
You can respond by visiting:
https://sourceforge.net/tracker/?func=detail&atid=603140&aid=1798753&group_…
Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: None
Group: None
Status: Open
Resolution: None
Priority: 5
Private: No
Submitted By: Filnik (filnik)
Assigned to: Nobody/Anonymous (nobody)
Summary: Fixed the \n problem
Initial Comment:
Some times ago, I've asked to fix the bug in replace.py that doesn't replace \n with a newline. I've add two lines on wikipedia.py to prevent this bug (it works and shouldn't make problems also if it is done in a simple way). The patch for wikipedia.py is attached. Filnik
----------------------------------------------------------------------
You can respond by visiting:
https://sourceforge.net/tracker/?func=detail&atid=603140&aid=1798753&group_…