Patches item #2783431, was opened at 2009-04-29 02:54
Message generated for change (Comment added) made by sigmaoctantis
You can respond by visiting:
https://sourceforge.net/tracker/?func=detail&atid=603140&aid=2783431&group_…
Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: None
Group: None
Status: Open
Resolution: None
Priority: 5
Private: No
Submitted By: sigmaoctantis (sigmaoctantis)
Assigned to: Nobody/Anonymous (nobody)
Summary: titles from file for pagegenerators.py
Initial Comment:
Additional option "-plainfile" reads titles from text files without square brackets ("-file")
----------------------------------------------------------------------
Comment By: sigmaoctantis (sigmaoctantis)
Date: 2009-04-30 14:11
Message:
It is better than two separate options. I tested it and it works with files
in both formats.
----------------------------------------------------------------------
Comment By: Francesco Cosoleto (cosoleto)
Date: 2009-04-30 13:40
Message:
A preliminary patch to enhance -file option as described is attached. Any
comments?
----------------------------------------------------------------------
Comment By: sigmaoctantis (sigmaoctantis)
Date: 2009-04-30 12:14
Message:
Thanks, I updated it accordingly. Feel free to combine the two.
----------------------------------------------------------------------
Comment By: Francesco Cosoleto (cosoleto)
Date: 2009-04-30 12:09
Message:
f.readline() should be used instead of re.findall().
I think we have a too populated list of options. Probably it's better to
change -file to interpret line as page titles when no [[title]] is found,
instead of adding yet another option.
----------------------------------------------------------------------
You can respond by visiting:
https://sourceforge.net/tracker/?func=detail&atid=603140&aid=2783431&group_…
Patches item #2783431, was opened at 2009-04-29 08:54
Message generated for change (Comment added) made by cosoleto
You can respond by visiting:
https://sourceforge.net/tracker/?func=detail&atid=603140&aid=2783431&group_…
Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: None
Group: None
Status: Open
Resolution: None
Priority: 5
Private: No
Submitted By: sigmaoctantis (sigmaoctantis)
Assigned to: Nobody/Anonymous (nobody)
Summary: titles from file for pagegenerators.py
Initial Comment:
Additional option "-plainfile" reads titles from text files without square brackets ("-file")
----------------------------------------------------------------------
>Comment By: Francesco Cosoleto (cosoleto)
Date: 2009-04-30 19:40
Message:
A preliminary patch to enhance -file option as described is attached. Any
comments?
----------------------------------------------------------------------
Comment By: sigmaoctantis (sigmaoctantis)
Date: 2009-04-30 18:14
Message:
Thanks, I updated it accordingly. Feel free to combine the two.
----------------------------------------------------------------------
Comment By: Francesco Cosoleto (cosoleto)
Date: 2009-04-30 18:09
Message:
f.readline() should be used instead of re.findall().
I think we have a too populated list of options. Probably it's better to
change -file to interpret line as page titles when no [[title]] is found,
instead of adding yet another option.
----------------------------------------------------------------------
You can respond by visiting:
https://sourceforge.net/tracker/?func=detail&atid=603140&aid=2783431&group_…
Bugs item #1760759, was opened at 2007-07-26 01:28
Message generated for change (Comment added) made by sigmaoctantis
You can respond by visiting:
https://sourceforge.net/tracker/?func=detail&atid=603138&aid=1760759&group_…
Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: interwiki
Group: None
Status: Open
Resolution: None
Priority: 4
Private: No
Submitted By: Byrial Ole Jensen (byrial)
Assigned to: Nobody/Anonymous (nobody)
Summary: Getting 60 + 1 pages from the same project
Initial Comment:
When I use interwiki.py I often see things like:
Getting 60 pages from wikipedia:fr...
Getting 1 pages from wikipedia:fr...
where it first gets 60 pages immediately followed by a fetch of 1 page from the same project. That seems strange so I suspect that there is off-by-one error or similar some place.
----------------------------------------------------------------------
Comment By: sigmaoctantis (sigmaoctantis)
Date: 2009-04-30 13:25
Message:
I tried to duplicate this with -file, but it read just 60, even if the
first entry was a redirect.
----------------------------------------------------------------------
Comment By: Francesco Cosoleto (cosoleto)
Date: 2009-04-30 09:11
Message:
Anyone can confirm this bug is still reproducible?
----------------------------------------------------------------------
Comment By: Nobody/Anonymous (nobody)
Date: 2007-08-05 12:19
Message:
Logged In: NO
If the first loaded page is redirect, bot will load this page once more.
----------------------------------------------------------------------
You can respond by visiting:
https://sourceforge.net/tracker/?func=detail&atid=603138&aid=1760759&group_…
Patches item #2784482, was opened at 2009-04-30 12:47
Message generated for change (Tracker Item Submitted) made by sigmaoctantis
You can respond by visiting:
https://sourceforge.net/tracker/?func=detail&atid=603140&aid=2784482&group_…
Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: None
Group: None
Status: Open
Resolution: None
Priority: 5
Private: No
Submitted By: sigmaoctantis (sigmaoctantis)
Assigned to: Nobody/Anonymous (nobody)
Summary: Additional options for pagegenerators.py
Initial Comment:
Two other generators for pagegenerators.py
-xlink (similar to -link): reads links on a webpage rather than a file
-check : for WikiProject Check Wikipedia reports on toolserver
----------------------------------------------------------------------
You can respond by visiting:
https://sourceforge.net/tracker/?func=detail&atid=603140&aid=2784482&group_…
Bugs item #2784477, was opened at 2009-04-30 12:34
Message generated for change (Tracker Item Submitted) made by sigmaoctantis
You can respond by visiting:
https://sourceforge.net/tracker/?func=detail&atid=603138&aid=2784477&group_…
Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: other
Group: None
Status: Open
Resolution: None
Priority: 5
Private: No
Submitted By: sigmaoctantis (sigmaoctantis)
Assigned to: Nobody/Anonymous (nobody)
Summary: TypeError with "-new" option in pagegenerators.py
Initial Comment:
Error message: "TypeError: newpages() got an unexpected keyword argument 'namespace'
newpages() got an unexpected keyword argument 'namespace'"
Removing ", namespace=namespace" fixes it, but might break it, if this a namespace other than 0 is available somewhere else.
----------------------------------------------------------------------
You can respond by visiting:
https://sourceforge.net/tracker/?func=detail&atid=603138&aid=2784477&group_…
Patches item #2783431, was opened at 2009-04-29 02:54
Message generated for change (Comment added) made by sigmaoctantis
You can respond by visiting:
https://sourceforge.net/tracker/?func=detail&atid=603140&aid=2783431&group_…
Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: None
Group: None
Status: Open
Resolution: None
Priority: 5
Private: No
Submitted By: sigmaoctantis (sigmaoctantis)
Assigned to: Nobody/Anonymous (nobody)
Summary: titles from file for pagegenerators.py
Initial Comment:
Additional option "-plainfile" reads titles from text files without square brackets ("-file")
----------------------------------------------------------------------
Comment By: sigmaoctantis (sigmaoctantis)
Date: 2009-04-30 12:14
Message:
Thanks, I updated it accordingly. Feel free to combine the two.
----------------------------------------------------------------------
Comment By: Francesco Cosoleto (cosoleto)
Date: 2009-04-30 12:09
Message:
f.readline() should be used instead of re.findall().
I think we have a too populated list of options. Probably it's better to
change -file to interpret line as page titles when no [[title]] is found,
instead of adding yet another option.
----------------------------------------------------------------------
You can respond by visiting:
https://sourceforge.net/tracker/?func=detail&atid=603140&aid=2783431&group_…
Patches item #2783431, was opened at 2009-04-29 08:54
Message generated for change (Comment added) made by cosoleto
You can respond by visiting:
https://sourceforge.net/tracker/?func=detail&atid=603140&aid=2783431&group_…
Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: None
Group: None
Status: Open
Resolution: None
Priority: 5
Private: No
Submitted By: sigmaoctantis (sigmaoctantis)
Assigned to: Nobody/Anonymous (nobody)
Summary: titles from file for pagegenerators.py
Initial Comment:
Additional option "-plainfile" reads titles from text files without square brackets ("-file")
----------------------------------------------------------------------
>Comment By: Francesco Cosoleto (cosoleto)
Date: 2009-04-30 18:09
Message:
f.readline() should be used instead of re.findall().
I think we have a too populated list of options. Probably it's better to
change -file to interpret line as page titles when no [[title]] is found,
instead of adding yet another option.
----------------------------------------------------------------------
You can respond by visiting:
https://sourceforge.net/tracker/?func=detail&atid=603140&aid=2783431&group_…
Bugs item #2771272, was opened at 2009-04-17 21:24
Message generated for change (Comment added) made by cosoleto
You can respond by visiting:
https://sourceforge.net/tracker/?func=detail&atid=603138&aid=2771272&group_…
Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: network
Group: None
>Status: Closed
>Resolution: Fixed
Priority: 5
Private: No
Submitted By: Nobody/Anonymous (nobody)
Assigned to: Nobody/Anonymous (nobody)
Summary: 44 Error Dump Files
Initial Comment:
python interwiki.py -autonomous -new:1000
Generated 44 SaxParseBug_wikipedia_...dump files as in attached zip file.. Nightly version 14th April. Ran on 17th April.
----------------------------------------------------------------------
>Comment By: Francesco Cosoleto (cosoleto)
Date: 2009-04-30 16:08
Message:
Looks as fixed. Closing...
----------------------------------------------------------------------
Comment By: NicDumZ — Nicolas Dumazet (nicdumz)
Date: 2009-04-30 11:02
Message:
Fair enough :)
I went ahead and committed in r6767 a check for '</mediawiki>' that should
prevent some, if not all, of these errors.
----------------------------------------------------------------------
Comment By: Francesco Cosoleto (cosoleto)
Date: 2009-04-28 15:08
Message:
It's a so peculiar behaviour replacing the proper page contents with a HTML
error message. It shouldn't surprise you I haven't noticed that. So
probably fixing the problem reported by me not resolve this bug, as HTTP
server sends a 'Content-Length' header value that matches the length of
recevied data.
Anyway, if am not wrong again, data received should be terminated with
'</mediawiki>', so, probably, it's better check this than mutable and
placed somewhere English strings.
----------------------------------------------------------------------
Comment By: NicDumZ — Nicolas Dumazet (nicdumz)
Date: 2009-04-28 05:16
Message:
Actually, to clarify, there are several reasons that might cause
SaxErrors:
* a xmlparser bug (unlikely)
* communication issues: retrieving invalid or incomplete data as cosoleto
mentions belows
* a server outage, like it happened very recently: these days, some
Wikimedia servers were being taken out of rotation for upgrade, resulting
in temporary Database Slave outage : this http://pastebin.com/f220d5ece
message was printed from time to times. For edit actions, it doesnt matter:
_get detects an invalid content, and retries. SaxErrors only happen in
GetAll, when using Special:Export to retrieve content. In this case,
Special:Export return revisions one by one, and at a point during the query
result generation, encounters a DB error and cannot fetch a revision: the
data returned by postData is then the beginning of an xml file, containing
the namespace information, a few revisions... and at the end the HTML error
message. This is the issue that tieump tries to fix here.
----------------------------------------------------------------------
Comment By: Tieum P (tieump)
Date: 2009-04-28 04:55
Message:
This happens when some wikis send an error page. I posted a patch at
http://pastebin.com/m597b90e8 BUT there is a risk that if the string "No
working slave server" is a valid part of the article, we will be caught in
an infinite loop
----------------------------------------------------------------------
Comment By: Francesco Cosoleto (cosoleto)
Date: 2009-04-19 09:32
Message:
postData() doesn't check the length of data sent from the server, unluke
getUrl() so the framework tries to parse truncated date then you get
errors.
----------------------------------------------------------------------
Comment By: Mikko Silvonen (silvonen)
Date: 2009-04-19 07:15
Message:
These dump files are generated more frequently when the Wikipedia servers
have database problems (as they have had for the last few days).
----------------------------------------------------------------------
You can respond by visiting:
https://sourceforge.net/tracker/?func=detail&atid=603138&aid=2771272&group_…
Bugs item #2783407, was opened at 2009-04-29 07:32
Message generated for change (Comment added) made by nicdumz
You can respond by visiting:
https://sourceforge.net/tracker/?func=detail&atid=603138&aid=2783407&group_…
Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: interwiki
Group: None
>Status: Closed
>Resolution: Fixed
Priority: 5
Private: No
Submitted By: Nobody/Anonymous (nobody)
Assigned to: Nobody/Anonymous (nobody)
Summary: -limittwo option does not work correctly
Initial Comment:
When run interwiki.py with "-limittwo" option, it does not update topmost site.
----------------------------------------------------------------------
>Comment By: NicDumZ — Nicolas Dumazet (nicdumz)
Date: 2009-04-30 15:52
Message:
Should be fixed in r6777
----------------------------------------------------------------------
You can respond by visiting:
https://sourceforge.net/tracker/?func=detail&atid=603138&aid=2783407&group_…
Bugs item #1760759, was opened at 2007-07-26 07:28
Message generated for change (Comment added) made by cosoleto
You can respond by visiting:
https://sourceforge.net/tracker/?func=detail&atid=603138&aid=1760759&group_…
Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: interwiki
Group: None
Status: Open
Resolution: None
>Priority: 4
Private: No
Submitted By: Byrial Ole Jensen (byrial)
Assigned to: Nobody/Anonymous (nobody)
Summary: Getting 60 + 1 pages from the same project
Initial Comment:
When I use interwiki.py I often see things like:
Getting 60 pages from wikipedia:fr...
Getting 1 pages from wikipedia:fr...
where it first gets 60 pages immediately followed by a fetch of 1 page from the same project. That seems strange so I suspect that there is off-by-one error or similar some place.
----------------------------------------------------------------------
>Comment By: Francesco Cosoleto (cosoleto)
Date: 2009-04-30 15:11
Message:
Anyone can confirm this bug is still reproducible?
----------------------------------------------------------------------
Comment By: Nobody/Anonymous (nobody)
Date: 2007-08-05 18:19
Message:
Logged In: NO
If the first loaded page is redirect, bot will load this page once more.
----------------------------------------------------------------------
You can respond by visiting:
https://sourceforge.net/tracker/?func=detail&atid=603138&aid=1760759&group_…