---------- Forwarded message ----------
From: <pywikipediabot-users-owner(a)lists.sourceforge.net>
Date: Wed, Feb 27, 2008 at 8:03 PM
Subject: How do I make interwiki.py go over all the pages in a given namespace?
To: avarab(a)gmail.com
You are not allowed to post to this mailing list, and your message has
been automatically rejected. If you think that your messages are
being rejected in error, contact the mailing list owner at
pywikipediabot-users-owner(a)lists.sourceforge.net.
---------- Forwarded message ----------
From: "Ævar Arnfjörð Bjarmason" <avarab(a)gmail.com>
To: pywikipediabot-users(a)lists.sourceforge.net
Date: Wed, 27 Feb 2008 20:03:50 +0000
Subject: How do I make interwiki.py go over all the pages in a given namespace?
I was trying to go over all the pages in the category namespace, I
couldn't get -namespace:14 or -namespace:Category,
-namespace:Category:'!' or -namespace:Flokkur (this was on iswiki) to
work.
I ended up running it as `python interwiki.py -autonomous -skipauto
-namespace:14 -continue' with the following hack, this seems to be
working but is there a proper way to do this?
Index: interwiki.py
===================================================================
--- interwiki.py (revision 5080)
+++ interwiki.py (working copy)
@@ -1620,7 +1620,10 @@
except NameError:
wikipedia.output(u"Dump file is empty?! Starting
at the beginning.")
nextPage = "!"
- namespace = 0
+ if namespaces:
+ namespace = namespaces[0]
+ else:
+ namespace = 0
# old generator is used up, create a new one
hintlessPageGen =
pagegenerators.CombinedPageGenerator([pagegenerators.TextfilePageGenerator(dumpFileName),\
pagegenerators.AllpagesPageGenerator(nextPage, namespace,
includeredirects = False)])
Hello
I use pywikipediabot for years and i've written some patches for it. Since i
get fresh version from svn i have some annoying conflicts and .mine files
spamming my bot directory. I would like to commit my work, is it here the
place for asking such permission ?
In attached file an example of the work i've done on fixing_redirects.py :
before the fixes the script was bogus on ~5% of pages (case artifacts, image
and categories not handled...), after the fixes, it was 100% success from
"-start:!" to end of allpages (yeah i'm quite perfectionist so i like
spotting *every* link leading to a redirect on the wiki i work on :
fr.wikibooks where there is 5000+ pages). I've also fixed several bugs
listed on sourceforge bug tracking system. Could i also have the permission
to tag them as closed ? (maybe should i ask elsewhere ?)
Marvus
Feature Requests item #1877143, was opened at 2008-01-22 10:31
Message generated for change (Comment added) made by purodha
You can respond by visiting:
https://sourceforge.net/tracker/?func=detail&atid=603141&aid=1877143&group_…
Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: None
Group: None
Status: Open
Priority: 5
Private: No
Submitted By: Purodha B Blissenbach (purodha)
Assigned to: Nobody/Anonymous (nobody)
Summary: make interwiki.py accept a hint at namespace warning
Initial Comment:
WARNING: [[ksh:Metmaacher:Purbo T]] is in namespace 2, but [[dsb:Benutzer:Purbo T]] is in namespace 0. Follow it anyway? ([y]es, [n]o) n
should better be:
WARNING: [[ksh:Metmaacher:Purbo T]] is in namespace 2, but [[dsb:Benutzer:Purbo T]] is in namespace 0. Follow it anyway? ([y]es, [n]o, [a]dd replacement) n
Reason of the warning is an update of the language-file in the dsb wiki, which alters Benutzer -> Wužywar
(away from the German fallback)
Thus many previously set intwerwiki links need a namespace change. While this could be dealt with using a general search/replace, interwiki.py should not loose the interwiki links while it is operating before all links were adjusted. A manual replacement (hint) would be the ideal solution here.
If possible, it would be nice to have a typein "user:" automatically expanded to "User:Purbo T" in such cases, but that is rather a gimmick.
----------------------------------------------------------------------
>Comment By: Purodha B Blissenbach (purodha)
Date: 2008-02-29 13:49
Message:
Logged In: YES
user_id=46450
Originator: YES
See also these two edits:
http://mi.wikipedia.org/w/index.php?title=Category:M%C4%81tauranga_huaota&d…http://mi.wikipedia.org/w/index.php?title=Category%3AM%C4%81tauranga_huaota…
This gives another motivation, or justification, for the suggested
feature.
With the feature added: When seeing a possible solution, you could enter
that solution immediately, saving time, server load and, more often than
not, another bot run.
----------------------------------------------------------------------
You can respond by visiting:
https://sourceforge.net/tracker/?func=detail&atid=603141&aid=1877143&group_…
Patches item #1904587, was opened at 2008-02-29 10:55
Message generated for change (Tracker Item Submitted) made by Item Submitter
You can respond by visiting:
https://sourceforge.net/tracker/?func=detail&atid=603140&aid=1904587&group_…
Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: None
Group: None
Status: Open
Resolution: None
Priority: 5
Private: No
Submitted By: Purodha B Blissenbach (purodha)
Assigned to: Nobody/Anonymous (nobody)
Summary: Interwiki.py - better language fallbacks: dsb, hsb, stq
Initial Comment:
svn diff wikipedia.py
Index: wikipedia.py
===================================================================
--- wikipedia.py (revision 5095)
+++ wikipedia.py (working copy)
@@ -5525,10 +5527,14 @@
return ['ar','tr']
if code=='sk':
return ['cs']
- if code in ['bar','hsb','ksh']:
+ if code in ['bar','ksh','stq']:
return ['de']
if code in ['als','lb']:
return ['de','fr']
+ if code=='dsb':
+ return ['hsb','de']
+ if code=='hsb':
+ return ['dsb','de']
if code=='io':
return ['eo']
if code in ['an','ast','ay','ca','gn','nah','qu']:
----
Adds Saterlandic Frisian (Seeltersk)
Makes Upper/Lower Sorbian being fallbacks for each other before resorting to German.
----------------------------------------------------------------------
You can respond by visiting:
https://sourceforge.net/tracker/?func=detail&atid=603140&aid=1904587&group_…
Bugs item #1792829, was opened at 2007-09-11 19:28
Message generated for change (Comment added) made by russblau
You can respond by visiting:
https://sourceforge.net/tracker/?func=detail&atid=603138&aid=1792829&group_…
Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: None
Group: None
>Status: Closed
>Resolution: Fixed
Priority: 5
Private: No
Submitted By: Nobody/Anonymous (nobody)
Assigned to: Nobody/Anonymous (nobody)
Summary: Bug in wikipedia.py and workaround
Initial Comment:
I am using snapshot 2007-08-11:
Bug #1733835 appears in this snapshot with a different error message:
"Changing page [[de:Aventurischer Index: Buchstabe J/fehlt noch]]
WARNING: No text area found on www.wiki-aventurica.de/index.php?title=MediaWiki:
viewsource&action=edit.
Maybe the server is down. Retrying in 1 minutes..."
We don't even have a page named "MediaWiki:viewsource" in our Wiki.
After I changed line 1176 from
"if data != u'':"
to
"if data != u'' and re.search(r'[^\n]', data) != None:"
again, it works properly.
I did not have this bug in snapshot-2007-06-19, but in snapshot-20070605 and now.
----------------------------------------------------------------------
>Comment By: Russell Blau (russblau)
Date: 2008-02-28 18:16
Message:
Logged In: YES
user_id=855050
Originator: NO
Should be fixed by r5095.
----------------------------------------------------------------------
Comment By: Bernhard Mayr (falk_steinhauer)
Date: 2007-09-11 19:30
Message:
Logged In: YES
user_id=1810075
Originator: NO
I reported the bug. Guess login-cookie was to old.
----------------------------------------------------------------------
You can respond by visiting:
https://sourceforge.net/tracker/?func=detail&atid=603138&aid=1792829&group_…
Bugs item #1766974, was opened at 2007-08-03 10:24
Message generated for change (Settings changed) made by russblau
You can respond by visiting:
https://sourceforge.net/tracker/?func=detail&atid=603138&aid=1766974&group_…
Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: None
Group: None
>Status: Closed
>Resolution: Invalid
Priority: 5
Private: No
Submitted By: Nobody/Anonymous (nobody)
Assigned to: Nobody/Anonymous (nobody)
Summary: colon omitted in link to category page
Initial Comment:
Consider a page X containing only the following wiki markup:
[[:Category:some category]]
Let page be a Page object for X. Then
links = page.linkedPages()
for link in links:
wikipedia.output(page.title())
wikipedia.output(page.aslink())
outputs:
Category:some category
[[Category:some category]]
i.e. the leading colon has been swallowed. I suggest the colon be kept due to the semantic difference between [[:Category:some category]] and [[Category:some category]].
(A similar problem might exist with interlanguage links; I haven't tested, though)
----------------------------------------------------------------------
Comment By: Russell Blau (russblau)
Date: 2007-08-03 11:01
Message:
Logged In: YES
user_id=855050
Originator: NO
Page.aslink() has an optional parameter "textlink" (defaults to False); if
True, the leading colon will be output. This is a relatively recent
addition to the framework, and many bots have not yet been updated to use
it.
This parameter does not yet work for interwiki links.
----------------------------------------------------------------------
You can respond by visiting:
https://sourceforge.net/tracker/?func=detail&atid=603138&aid=1766974&group_…
Bugs item #1733835, was opened at 2007-06-08 18:24
Message generated for change (Comment added) made by russblau
You can respond by visiting:
https://sourceforge.net/tracker/?func=detail&atid=603138&aid=1733835&group_…
Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: None
Group: None
>Status: Closed
>Resolution: Fixed
Priority: 5
Private: No
Submitted By: Nobody/Anonymous (nobody)
Assigned to: Nobody/Anonymous (nobody)
Summary: Bug in wikipedia.py and unsatisfying fix
Initial Comment:
I am using snapshot-20070605.
I recognized a bug in wikipedia.py (line 1146, the last one in the code segment below):
# Submit the prepared information
if self.site().hostname() in config.authenticate.keys():
predata.append(("Content-type","application/x-www-form-urlencoded"))
predata.append(("User-agent", useragent))
data = self.site().urlEncode(predata)
response = urllib2.urlopen(urllib2.Request('http://' + self.site().hostname() + address, data))
# I'm not sure what to check in this case, so I just assume things went ok.
# Very naive, I agree.
data = u''
else:
try:
response, data = self.site().postForm(address, predata, sysop)
except httplib.BadStatusLine, line:
raise PageNotSaved('Bad status line: %s' % line)
if data != u'' and re.search(r'[^\n]', data) != None:
I added the condition behind the "and", because after I accepted changes for a given page "data" was unequal to ''. Instead "data" was a string of 4 newlines.
Hope It'll help you.
Best regards,
Falk Steinhauer
Wiki Aventurica
----------------------------------------------------------------------
>Comment By: Russell Blau (russblau)
Date: 2008-02-28 18:03
Message:
Logged In: YES
user_id=855050
Originator: NO
Addressed in r5095.
----------------------------------------------------------------------
You can respond by visiting:
https://sourceforge.net/tracker/?func=detail&atid=603138&aid=1733835&group_…
Revision: 5095
Author: russblau
Date: 2008-02-28 23:03:10 +0000 (Thu, 28 Feb 2008)
Log Message:
-----------
Bug #1733835
Modified Paths:
--------------
trunk/pywikipedia/wikipedia.py
Modified: trunk/pywikipedia/wikipedia.py
===================================================================
--- trunk/pywikipedia/wikipedia.py 2008-02-28 22:58:33 UTC (rev 5094)
+++ trunk/pywikipedia/wikipedia.py 2008-02-28 23:03:10 UTC (rev 5095)
@@ -1364,10 +1364,9 @@
if retry_delay > 30:
retry_delay = 30
continue
-
-
+
# We are expecting a 302 to the action=view page. I'm not sure why this was removed in r5019
- if data != u"":
+ if data.strip() != u"":
# Something went wrong, and we don't know what. Show the
# HTML code that hopefully includes some error message.
output(u"ERROR: Unexpected response from wiki server.")
@@ -1375,11 +1374,11 @@
output(data)
# Unexpected responses should raise an error and not pass,
# be it silently or loudly. This should raise an error
-
+
if 'name="wpTextbox1"' in data and 'var wgAction = "submit"' in data:
# We are on the preview page, so the page was not saved
raise PageNotSaved
-
+
return response.status, response.reason, data
def canBeEdited(self):
@@ -2223,12 +2222,12 @@
if duration == 'none' or duration == None: duration = 'infinite'
if cascading == False: cascading = '0'
else: cascading = '1'
-
+
if edit != 'sysop' or move != 'sysop':
# You can't block a page as autoconfirmed and cascading, prevent the error
cascading = '0'
output(u"NOTE: The page can't be blocked with cascading and not also with only-sysop. Set cascading \"off\"")
-
+
predata = {
'mwProtect-cascade': cascading,
'mwProtect-level-edit': edit,
@@ -2860,7 +2859,7 @@
def __call__(self, requestsize=1):
"""
Block the calling program if the throttle time has not expired.
-
+
Parameter requestsize is the number of Pages to be read/written;
multiply delay time by an appropriate factor.
"""
@@ -3077,7 +3076,7 @@
def getLanguageLinks(text, insite = None, pageLink = "[[]]"):
"""
Return a dict of interlanguage links found in text.
-
+
Dict uses language codes as keys and Page objects as values.
Do not call this routine directly, use Page.interwiki() method
instead.
Revision: 5094
Author: russblau
Date: 2008-02-28 22:58:33 +0000 (Thu, 28 Feb 2008)
Log Message:
-----------
Improve screening for malformed redirect targets, and don't use "dict" as a local variable name.
Modified Paths:
--------------
trunk/pywikipedia/redirect.py
Modified: trunk/pywikipedia/redirect.py
===================================================================
--- trunk/pywikipedia/redirect.py 2008-02-28 18:56:51 UTC (rev 5093)
+++ trunk/pywikipedia/redirect.py 2008-02-28 22:58:33 UTC (rev 5094)
@@ -114,7 +114,7 @@
targets are the values.
'''
xmlFilename = self.xmlFilename
- dict = {}
+ redict = {}
# open xml dump and read page titles out of it
dump = xmlreader.XmlDump(xmlFilename)
site = wikipedia.getSite()
@@ -151,23 +151,24 @@
source = entry.title.replace(' ', '_')
target = target.replace(' ', '_')
# remove leading and trailing whitespace
- target = target.strip()
+ target = target.strip('_')
# capitalize the first letter
if not wikipedia.getSite().nocapitalize:
- source = source[0].upper() + source[1:]
- target = target[0].upper() + target[1:]
+ source = source[:1].upper() + source[1:]
+ target = target[:1].upper() + target[1:]
if '#' in target:
- target = target[:target.index('#')]
+ target = target[:target.index('#')].rstrip("_")
if '|' in target:
wikipedia.output(
u'HINT: %s is a redirect with a pipelink.'
% entry.title)
- target = target[:target.index('|')]
- dict[source] = target
+ target = target[:target.index('|')].rstrip("_")
+ if target: # in case preceding steps left nothing
+ redict[source] = target
if alsoGetPageTitles:
- return dict, pageTitles
+ return redict, pageTitles
else:
- return dict
+ return redict
def retrieve_broken_redirects(self):
if self.xmlFilename == None:
@@ -216,16 +217,16 @@
for redir_name in redir_names:
yield redir_name
else:
- dict = self.get_redirects_from_dump()
+ redict = self.get_redirects_from_dump()
num = 0
- for (key, value) in dict.iteritems():
+ for (key, value) in redict.iteritems():
num += 1
# check if the value - that is, the redirect target - is a
# redirect as well
- if num > self.offset and dict.has_key(value):
+ if num > self.offset and redict.has_key(value):
yield key
wikipedia.output(u'\nChecking redirect %i of %i...'
- % (num + 1, len(dict)))
+ % (num + 1, len(redict)))
class RedirectRobot:
def __init__(self, action, generator, always = False):
Bugs item #1725373, was opened at 2007-05-25 04:22
Message generated for change (Settings changed) made by russblau
You can respond by visiting:
https://sourceforge.net/tracker/?func=detail&atid=603138&aid=1725373&group_…
Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: None
Group: None
>Status: Closed
>Resolution: Fixed
Priority: 5
Private: No
Submitted By: Byrial Ole Jensen (byrial)
Assigned to: Nobody/Anonymous (nobody)
Summary: redirect.py double -xml fails to find all double redirects
Initial Comment:
redirect.py double -xml fails to find all double redirects. For example dawiki-20070522-pages-meta-current.xml contains 99 double redirects, redirect.py could only find 6 of these and correct 5 (The 6. was a redirect directly to itself).
The full list of the 99 double redirects is at http://da.wikipedia.org/wiki/Wikipedia:Dobbelte_omdirigeringer
(Permanent link in case the page is edited: http://da.wikipedia.org/w/index.php?title=Wikipedia:Dobbelte_omdirigeringer…).
PS. It would also be nice to an option to read the double redirects from a file.
----------------------------------------------------------------------
>Comment By: Russell Blau (russblau)
Date: 2008-02-28 17:57
Message:
Logged In: YES
user_id=855050
Originator: NO
Not sure when it was done, but the current version of redirect.py contains
code that should have fixed this bug.
----------------------------------------------------------------------
Comment By: Byrial Ole Jensen (byrial)
Date: 2007-05-25 13:42
Message:
Logged In: YES
user_id=23252
Originator: YES
I found that all the not found double redirects have a target which
contain spaces and therefore made this patch to fix the problem:
RCS file: /cvsroot/pywikipediabot/pywikipedia/redirect.py,v
retrieving revision 1.56
diff -u -r1.56 redirect.py
--- redirect.py 11 May 2007 11:42:27 -0000 1.56
+++ redirect.py 25 May 2007 17:37:26 -0000
@@ -110,9 +110,9 @@
break
# if the redirect does not link to another wiki
if target:
- target = target.replace(' ', '_')
# remove leading and trailing whitespace
target = target.strip()
+ target = target.replace('_', ' ')
# capitalize the first letter
if not wikipedia.getSite().nocapitalize:
target = target[0].upper() + target[1:]
It solves the problem when you get double redirects from an XML dump.
However I guess that the patch as is will break fixing double redirects
fetched from [[Special:DoubleRedirects]], but this is not tested.
----------------------------------------------------------------------
You can respond by visiting:
https://sourceforge.net/tracker/?func=detail&atid=603138&aid=1725373&group_…