Bugs item #2539701, was opened at 2009-01-27 02:32
Message generated for change (Comment added) made by russblau
You can respond by visiting:
https://sourceforge.net/tracker/?func=detail&atid=603138&aid=2539701&group_…
Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: other
Group: None
Status: Open
Resolution: None
Priority: 5
Private: No
Submitted By: Nobody/Anonymous (nobody)
Assigned to: Nobody/Anonymous (nobody)
Summary: redirect.py Exception handing
Initial Comment:
Running through yi-wiki I've found that there should be an additional exception handling for the get_moved_pages_redirects - Method in redirect.py as follows:
except wikipedia.NoPage:
# original title must have been deleted after move
continue
+ except wikipedia.IsNotRedirectPage:
+ continue
----------------------------------------------------------------------
>Comment By: Russell Blau (russblau)
Date: 2009-01-29 17:54
Message:
r6313
----------------------------------------------------------------------
Comment By: Russell Blau (russblau)
Date: 2009-01-29 17:54
Message:
Fixed in r...
----------------------------------------------------------------------
Comment By: Nobody/Anonymous (nobody)
Date: 2009-01-29 00:41
Message:
There is an other Error which should be handled by exception in redirect.py
in the fix_double_redirect - method. If one page forms a redirect loop
and the self link directs to a nonexistent section, a SectionError would be
raised:
>>>Earhworm Jim PSP<<<
Links to [[Earthworm Jim PSP#Earthworm Jim PSP]].
Warning: Redirect target [[Earthworm Jim PSP#Earthworm Jim PSP]] forms a
redirect loop.
Traceback (most recent last):
...
File "C:\..\redirect.py in fix_double_redirects
content=targetPage.get(get_redirect=True)
File "C:\..\wikipedia.py in get self._contents = self._getEditpage(...)
File "C:\..\wikipedia.py in _getEditpage
raise SectionError # Page has no section by this name
wikipedia.SectionError
[de:User:Xqt]
----------------------------------------------------------------------
You can respond by visiting:
https://sourceforge.net/tracker/?func=detail&atid=603138&aid=2539701&group_…
Revision: 6313
Author: russblau
Date: 2009-01-29 22:53:53 +0000 (Thu, 29 Jan 2009)
Log Message:
-----------
Fixing bug 2539701 Exception handling
Modified Paths:
--------------
trunk/pywikipedia/redirect.py
Modified: trunk/pywikipedia/redirect.py
===================================================================
--- trunk/pywikipedia/redirect.py 2009-01-29 22:47:54 UTC (rev 6312)
+++ trunk/pywikipedia/redirect.py 2009-01-29 22:53:53 UTC (rev 6313)
@@ -22,8 +22,9 @@
-namespace:n Namespace to process. Works only with an XML dump.
--offset:n Number of redirect to restart with (see progress). Works only
- with an XML dump or with -moves.
+-offset:n With -xml, the number of the redirect to restart with (see
+ progress). With -moves, the number of hours ago to start
+ scanning moved pages. Otherwise, ignored.
-moves Instead of using Special:Doubleredirects, use the page move
log to find double-redirect candidates (only works with
@@ -291,30 +292,39 @@
# this will run forever, until user interrupts it
import datetime
+ if not self.offset:
+ self.offset = 1
offsetpattern = re.compile(
r"""\(<a href="/w/index\.php\?title=Special:Log&offset=(\d+)&limit=500&type=move" title="Special:Log" rel="next">older 500</a>\)""")
- start = datetime.datetime.utcnow() - datetime.timedelta(0, 3600)
- # one hour ago
- offset = start.strftime("%Y%m%d%H%M%S")
+ start = datetime.datetime.utcnow() \
+ - datetime.timedelta(0, self.offset*3600)
+ # self.offset hours ago
+ offset_time = start.strftime("%Y%m%d%H%M%S")
site = wikipedia.getSite()
while True:
move_url = \
site.path() + "?title=Special:Log&limit=500&offset=%s&type=move"\
- % offset
+ % offset_time
try:
move_list = site.getUrl(move_url)
-# wikipedia.output(u"[%s]" % offset)
+ if wikipedia.verbose:
+ wikipedia.output(u"[%s]" % offset)
except:
import traceback
- traceback.print_exc()
+ wikipedia.output(unicode(traceback.format_exc()))
return
- for moved_page in self.move_regex.findall(move_list):
+ g = self.move_regex.findall(move_list)
+ if wikipedia.verbose:
+ wikipedia.output(u"%s moved pages" % len(g))
+ for moved_title in g:
+ moved_page = wikipedia.Page(site, moved_title)
+ if not moved_page.isRedirectPage():
+ continue
# moved_page is now a redirect, so any redirects pointing
# to it need to be changed
try:
- for page in wikipedia.Page(site, moved_page
- ).getReferences(follow_redirects=True,
- redirectsOnly=True):
+ for page in moved_page.getReferences(follow_redirects=True,
+ redirectsOnly=True):
yield page
except wikipedia.NoPage:
# original title must have been deleted after move
@@ -322,7 +332,7 @@
m = offsetpattern.search(move_list)
if not m:
break
- offset = m.group(1)
+ offset_time = m.group(1)
class RedirectRobot:
@@ -444,13 +454,21 @@
wikipedia.output(
u'Warning: Redirect target %s forms a redirect loop.'
% targetPage.aslink())
-
- content=targetPage.get(get_redirect=True)
- if sd_template.has_key(targetPage.site().lang) and sd_tagging_sum.has_key(targetPage.site().lang):
+ try:
+ content = targetPage.get(get_redirect=True)
+ except wikipedia.SectionError:
+ content = wikipedia.Page(
+ targetPage.site(),
+ targetPage.sectionFreeTitle()
+ ).get(get_redirect=True)
+ if sd_template.has_key(targetPage.site().lang) \
+ and sd_tagging_sum.has_key(targetPage.site().lang):
wikipedia.output(u"Tagging redirect for deletion")
# Delete the two redirects
- content = wikipedia.translate(targetPage.site().lang,sd_template)+"\n"+content
- summary = wikipedia.translate(targetPage.site().lang,sd_tagging_sum)
+ content = wikipedia.translate(targetPage.site().lang,
+ sd_template)+"\n"+content
+ summary = wikipedia.translate(targetPage.site().lang,
+ sd_tagging_sum)
targetPage.put(content, summary)
redir.put(content, summary)
else:
@@ -462,8 +480,8 @@
text = mysite.redirectRegex().sub(
'#%s %s' %
(mysite.redirect( True ),
- targetPage.aslink()),
- oldText)
+ targetPage.aslink()),
+ oldText)
if text == oldText:
break
wikipedia.showDiff(oldText, text)
Feature Requests item #2531112, was opened at 2009-01-23 13:51
Message generated for change (Comment added) made by purodha
You can respond by visiting:
https://sourceforge.net/tracker/?func=detail&atid=603141&aid=2531112&group_…
Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: interwiki
Group: None
>Status: Closed
Priority: 6
Private: No
Submitted By: Purodha B Blissenbach (purodha)
Assigned to: Purodha B Blissenbach (purodha)
Summary: interwiki.py - allow to combine -new with -namespace
Initial Comment:
Since the list of new pages supports selection of name spaces now, the api does it anyways, please allow interwiki.py the same.
While I believe, a page generator of new pages should be flexible enough to return pages of all name spaces upon 1 request, I suggest not to use this functionallity in interwiki.py for securing against unwanted interwikilinking of talk pages, and creating (usually wrong) interwiki links of template pages (where <noinclude> wold be usually required but is not added automatically, only left in place and used, if it is aready there)
----------------------------------------------------------------------
>Comment By: Purodha B Blissenbach (purodha)
Date: 2009-01-29 22:51
Message:
Solved with revision 6312
http://svn.wikimedia.org/viewvc/pywikipedia?view=rev&revision=6312
----------------------------------------------------------------------
Comment By: Purodha B Blissenbach (purodha)
Date: 2009-01-29 22:39
Message:
Since Special:NewPages can either select a single namespace, or all
namespaces,
selecting mutiple namespaces needs to be implemented via filtering after
reading
the llist of "all namespaces". Securing the page count with -new: thus
needs a bit
redoing, when mutiple namespaces have been selected.
I am not going to do that now, but document that, -new:N is an upper limit
for the
actual page count when multiple namespaces are wanted, else (with 1
namespace,
or all namespaces) everything remains as it was.
For future enhancement I suggest to extend the -new paramater by another
figure like such: -new:from:to
-new:to:from
both from, and to, being entry numbers in the list
(since from<=to, order is arbitrary, and current use remains valid)
----------------------------------------------------------------------
You can respond by visiting:
https://sourceforge.net/tracker/?func=detail&atid=603141&aid=2531112&group_…
Revision: 6312
Author: purodha
Date: 2009-01-29 22:47:54 +0000 (Thu, 29 Jan 2009)
Log Message:
-----------
Add honoring -namespace parameters ro -new processing.
(Multiple -namespace parameters still have a glitch with page count)
This solves the request at:
https://sourceforge.net/tracker2/index.php?func=detail&aid=2531112&group_id…
Modified Paths:
--------------
trunk/pywikipedia/family.py
trunk/pywikipedia/interwiki.py
trunk/pywikipedia/pagegenerators.py
trunk/pywikipedia/wikipedia.py
Modified: trunk/pywikipedia/family.py
===================================================================
--- trunk/pywikipedia/family.py 2009-01-29 20:08:52 UTC (rev 6311)
+++ trunk/pywikipedia/family.py 2009-01-29 22:47:54 UTC (rev 6312)
@@ -3465,8 +3465,8 @@
def log_address(self, code, limit=50, mode = ''):
return "%s?useskin=monobook&title=Special:Log&type=%s&user=&page=&limit=%d" % (self.path(code), mode, limit)
- def newpages_address(self, code, limit=50):
- return "%s?useskin=monobook&title=%s:Newpages&limit=%d" % (self.path(code), self.special_namespace_url(code), limit)
+ def newpages_address(self, code, limit=50, namespace=0):
+ return "%s?useskin=monobook&title=%s:Newpages&limit=%d&namespace=%s" % (self.path(code), self.special_namespace_url(code), limit, namespace)
def longpages_address(self, code, limit=500):
return "%s?useskin=monobook&title=%s:Longpages&limit=%d" % (self.path(code), self.special_namespace_url(code), limit)
Modified: trunk/pywikipedia/interwiki.py
===================================================================
--- trunk/pywikipedia/interwiki.py 2009-01-29 20:08:52 UTC (rev 6311)
+++ trunk/pywikipedia/interwiki.py 2009-01-29 22:47:54 UTC (rev 6312)
@@ -31,6 +31,10 @@
-new: Work on the 100 newest pages. If given as -new:x, will work
on the x newest pages.
+ When multiple -namespace parameters are given, x pages are
+ inspected, and only the ones in the selected name spaces are
+ processed. Use -namespace:all for all namespaces. Without
+ -namespace, only article pages are processed.
This implies -noredirect.
@@ -1600,6 +1604,7 @@
hintlessPageGen = None
optContinue = False
optRestore = False
+ newPages = None
# This factory is responsible for processing command line arguments
# that are also used by other scripts and that determine on which pages
# to work on.
@@ -1694,7 +1699,6 @@
newPages = int(arg[5:])
else:
newPages = 100
- hintlessPageGen = pagegenerators.NewpagesPageGenerator(newPages)
elif arg.startswith('-skipfile:'):
skipfile = arg[10:]
skipPageGen = pagegenerators.TextfilePageGenerator(skipfile)
@@ -1753,6 +1757,22 @@
except:
wikipedia.output(u'Missing main page name')
+ if newPages != None:
+ if len(namespaces) == 0:
+ ns = 0
+ if len(namespaces) == 1:
+ ns = namespaces[0]
+ if ns != 'all':
+ if isinstance(ns, unicode) or isinstance(ns, str):
+ index = site.getNamespaceIndex(ns)
+ if index is None:
+ raise ValueError(u'Unknown namespace: %s' % ns)
+ ns = index
+ namespaces = []
+ else:
+ ns = 'all'
+ hintlessPageGen = pagegenerators.NewpagesPageGenerator(newPages, namespace=ns)
+
if optRestore or optContinue:
site = wikipedia.getSite()
dumpFileName = wikipedia.config.datafilepath(
Modified: trunk/pywikipedia/pagegenerators.py
===================================================================
--- trunk/pywikipedia/pagegenerators.py 2009-01-29 20:08:52 UTC (rev 6311)
+++ trunk/pywikipedia/pagegenerators.py 2009-01-29 22:47:54 UTC (rev 6312)
@@ -249,10 +249,10 @@
for page in site.prefixindex(prefix = title, namespace = namespace, includeredirects = includeredirects):
yield page
-def NewpagesPageGenerator(number = 100, get_redirect = False, repeat = False, site = None):
+def NewpagesPageGenerator(number = 100, get_redirect = False, repeat = False, site = None, namespace = 0):
if site is None:
site = wikipedia.getSite()
- for page in site.newpages(number=number, get_redirect=get_redirect, repeat=repeat):
+ for page in site.newpages(number=number, get_redirect=get_redirect, repeat=repeat, namespace=namespace):
yield page[0]
def FileLinksGenerator(referredImagePage):
Modified: trunk/pywikipedia/wikipedia.py
===================================================================
--- trunk/pywikipedia/wikipedia.py 2009-01-29 20:08:52 UTC (rev 6311)
+++ trunk/pywikipedia/wikipedia.py 2009-01-29 22:47:54 UTC (rev 6312)
@@ -5010,7 +5010,7 @@
yield page, match, relevance, '', '', ''
# TODO: avoid code duplication for the following methods
- def newpages(self, number = 10, get_redirect = False, repeat = False):
+ def newpages(self, number = 10, get_redirect = False, repeat = False, namespace = 0):
"""Yield new articles (as Page objects) from Special:Newpages.
Starts with the newest article and fetches the number of articles
@@ -5029,9 +5029,10 @@
# TODO: Repeat mechanism doesn't make much sense as implemented;
# should use both offset and limit parameters, and have an
# option to fetch older rather than newer pages
+ # TODO: extract and return edit comment.
seen = set()
while True:
- path = self.newpages_address(n=number)
+ path = self.newpages_address(n=number, namespace=namespace)
# The throttling is important here, so always enabled.
get_throttle()
html = self.getUrl(path)
@@ -5856,9 +5857,9 @@
"""Return path to Special:Log."""
return self.family.log_address(self.lang, n, mode)
- def newpages_address(self, n=50):
+ def newpages_address(self, n=50, namespace=0):
"""Return path to Special:Newpages."""
- return self.family.newpages_address(self.lang, n)
+ return self.family.newpages_address(self.lang, n, namespace)
def longpages_address(self, n=500):
"""Return path to Special:Longpages."""
Feature Requests item #2531112, was opened at 2009-01-23 13:51
Message generated for change (Comment added) made by purodha
You can respond by visiting:
https://sourceforge.net/tracker/?func=detail&atid=603141&aid=2531112&group_…
Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
>Category: interwiki
Group: None
>Status: Pending
Priority: 6
Private: No
Submitted By: Purodha B Blissenbach (purodha)
>Assigned to: Purodha B Blissenbach (purodha)
Summary: interwiki.py - allow to combine -new with -namespace
Initial Comment:
Since the list of new pages supports selection of name spaces now, the api does it anyways, please allow interwiki.py the same.
While I believe, a page generator of new pages should be flexible enough to return pages of all name spaces upon 1 request, I suggest not to use this functionallity in interwiki.py for securing against unwanted interwikilinking of talk pages, and creating (usually wrong) interwiki links of template pages (where <noinclude> wold be usually required but is not added automatically, only left in place and used, if it is aready there)
----------------------------------------------------------------------
>Comment By: Purodha B Blissenbach (purodha)
Date: 2009-01-29 22:39
Message:
Since Special:NewPages can either select a single namespace, or all
namespaces,
selecting mutiple namespaces needs to be implemented via filtering after
reading
the llist of "all namespaces". Securing the page count with -new: thus
needs a bit
redoing, when mutiple namespaces have been selected.
I am not going to do that now, but document that, -new:N is an upper limit
for the
actual page count when multiple namespaces are wanted, else (with 1
namespace,
or all namespaces) everything remains as it was.
For future enhancement I suggest to extend the -new paramater by another
figure like such: -new:from:to
-new:to:from
both from, and to, being entry numbers in the list
(since from<=to, order is arbitrary, and current use remains valid)
----------------------------------------------------------------------
You can respond by visiting:
https://sourceforge.net/tracker/?func=detail&atid=603141&aid=2531112&group_…
Bugs item #2498068, was opened at 2009-01-10 17:48
Message generated for change (Comment added) made by malafaya
You can respond by visiting:
https://sourceforge.net/tracker/?func=detail&atid=603138&aid=2498068&group_…
Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: interwiki
Group: None
>Status: Deleted
Resolution: None
Priority: 5
Private: No
Submitted By: André Malafaya Baptista (malafaya)
Assigned to: Nobody/Anonymous (nobody)
Summary: Page editing requires full admin privileges in Windows Vista
Initial Comment:
Pywikipedia [http] trunk/pywikipedia (r6242, Jan 09 2009, 20:23:10)
Python 2.6.1 (r261:67517, Dec 4 2008, 16:51:00) [MSC v.1500 32 bit (Intel)]
Running interwiki.py without the "Run as administrator" option in Windows Vista gives a strange behaviour:
The bot asks for bot account password, tries to login, and apparently it succeeds.
It tries to edit the page and it fails and then tries again.
Then either it succeeds in editing or I get a CAPTCHA request depending on the site it is editing.
In any case, the bot's edit is anonymous, and not using the bot account.
For the next edit, I get a bot request to login again.
Over and over again.
I experimented running it under the Vista administrator privileges (my account is an "admin" but Vista normal admin privileges are more restricted than in previous versions of Windows. You have to explicitely say you want to run in fully unsafe admin mode.
My perception is that this shouldn't affect page editing by the bot in any way, so I'm filing it as a bug, although I'm not sure it's not a Python framework problem.
Thanks.
----------------------------------------------------------------------
>Comment By: André Malafaya Baptista (malafaya)
Date: 2009-01-29 19:01
Message:
Curiously enough, I just tried to replicate the problem and could not.
No admin privileges are needed apparently (anymore?)
I'll consider this dropped for now.
Thanks.
----------------------------------------------------------------------
Comment By: siebrand (siebrand)
Date: 2009-01-27 08:40
Message:
Would you be able to narrow this issue down a bit using Process Monitor[1]
and see which files or folders are the issue? That would be a great help.
[1] http://technet.microsoft.com/en-us/sysinternals/bb896645.aspx
----------------------------------------------------------------------
You can respond by visiting:
https://sourceforge.net/tracker/?func=detail&atid=603138&aid=2498068&group_…
Revision: 6310
Author: purodha
Date: 2009-01-29 11:59:34 +0000 (Thu, 29 Jan 2009)
Log Message:
-----------
Documentation change only.
Modified Paths:
--------------
trunk/pywikipedia/interwiki.py
Modified: trunk/pywikipedia/interwiki.py
===================================================================
--- trunk/pywikipedia/interwiki.py 2009-01-28 19:41:32 UTC (rev 6309)
+++ trunk/pywikipedia/interwiki.py 2009-01-29 11:59:34 UTC (rev 6310)
@@ -86,13 +86,14 @@
* 10: The 10 largest languages (sites with most
articles). Analogous for any other natural
number.
+ * arab: All languages useing the Arabic alphabet.
* cyril: All languages that use the Cyrillic alphabet.
* chinese: All Chinese dialects.
* latin: All languages using the Latin script.
* scand: All Scandinavian languages.
- -hintfile: similar to -hint, except that the hints are taken from
- the given file, one per line, instead of the command line.
+ -hintfile: similar to -hint, except that hints are taken from the given
+ file, enclosed in [[]] each, instead of the command line.
-askhints: for each page one or more hints are asked. See hint: above
for the format, one can for example give "en:something" or
@@ -150,10 +151,10 @@
alternatives actually exists.
(note: without ending colon)
- -select ask for each link whether it should be include before
+ -select ask for each link whether it should be included before
changing any page. This is useful if you want to remove
- invalid interwiki and if you do multiple hints of which
- some might be correct and others incorrect. Combining
+ invalid interwiki links and if you do multiple hints of
+ which some might be correct and others incorrect. Combining
-select and -confirm is possible, but seems like overkill.
(note: without ending colon)
@@ -161,8 +162,8 @@
-noredirect do not follow redirects. (note: without ending colon)
- -initialredirect work on target if a redirect is entered on the command
- line. (note: without ending colon)
+ -initialredirect work on its target if a redirect is entered on the
+ command line. (note: without ending colon)
-neverlink: used as -neverlink:xx where xx is a language code:
Disregard any links found to language xx. You can also