Bugs item #1855071, was opened at 2007-12-20 19:10
Message generated for change (Tracker Item Submitted) made by Item Submitter
You can respond by visiting:
https://sourceforge.net/tracker/?func=detail&atid=603138&aid=1855071&group_…
Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: None
Group: None
Status: Open
Resolution: None
Priority: 5
Private: No
Submitted By: Nicolas Dumazet (nicdumz)
Assigned to: Nobody/Anonymous (nobody)
Summary: "redirect.py double -xml:xx -namespace:x" crashing
Initial Comment:
It is not even loading the XML, i get an instant error :
python redirect.py double -namespace:0 -xml:frwiki-20071203-pages-articles.xml
Checked for running processes. 1 processes currently running, including the current process.
Reading XML dump...
Traceback (most recent call last):
File "redirect.py", line 377, in <module>
main()
File "redirect.py", line 373, in main
bot.run()
File "redirect.py", line 328, in run
self.fix_double_redirects()
File "redirect.py", line 255, in fix_double_redirects
for redir_name in self.generator.retrieve_double_redirects():
File "redirect.py", line 199, in retrieve_double_redirects
dict = self.get_redirects_from_dump()
File "redirect.py", line 123, in get_redirects_from_dump
if self.namespace and self.namespace != entry.namespace:
AttributeError: XmlEntry instance has no attribute 'namespace'
----------------------------------------------------------------------
You can respond by visiting:
https://sourceforge.net/tracker/?func=detail&atid=603138&aid=1855071&group_…
Bugs item #1855044, was opened at 2007-12-20 18:20
Message generated for change (Tracker Item Submitted) made by Item Submitter
You can respond by visiting:
https://sourceforge.net/tracker/?func=detail&atid=603138&aid=1855044&group_…
Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: None
Group: None
Status: Open
Resolution: None
Priority: 5
Private: No
Submitted By: Nicolas Dumazet (nicdumz)
Assigned to: Nobody/Anonymous (nobody)
Summary: redirect.py crashes when finding a bad page
Initial Comment:
When doing a simple "redirect.py double -xml:xx.xml", I got :
Traceback (most recent call last):
File "redirect.py", line 377, in <module>
main()
File "redirect.py", line 373, in main
bot.run()
File "redirect.py", line 328, in run
self.fix_double_redirects()
File "redirect.py", line 273, in fix_double_redirects
secondTargetPage = secondRedir.getRedirectTarget()
File "/home/nico/projets/pywikipedia/wikipedia.py", line 1576, in
getRedirectTarget
self.get()
File "/home/nico/projets/pywikipedia/wikipedia.py", line 595, in get
self._contents, self._isWatched, self.editRestriction =
self._getEditPage(get_redirect = get_redirect, throttle = throttle, sysop =
sysop, nofollow_redirects=nofollow_redirects)
File "/home/nico/projets/pywikipedia/wikipedia.py", line 679, in
_getEditPage
raise BadTitle('BadTitle: %s' % self)
wikipedia.BadTitle: BadTitle: [[../Projet/Sciences/Champs magnétiques B et
H]]
Now, the redirect page was :
#REDIRECT [[../Projet/Sciences/Champs magnétiques B et H]], which is not
correct.
But I think that redirect.py is supposed to ask the user if not skipping
it, instead of crashing ;)
Thanks,
Nicolas Dumazet.
----------------------------------------------------------------------
You can respond by visiting:
https://sourceforge.net/tracker/?func=detail&atid=603138&aid=1855044&group_…
Revision: 4740
Author: russblau
Date: 2007-12-20 17:09:58 +0000 (Thu, 20 Dec 2007)
Log Message:
-----------
Add another replaceExcept exception for <ref></ref> tags, and remove duplicate comment
Modified Paths:
--------------
trunk/pywikipedia/wikipedia.py
Modified: trunk/pywikipedia/wikipedia.py
===================================================================
--- trunk/pywikipedia/wikipedia.py 2007-12-20 17:03:45 UTC (rev 4739)
+++ trunk/pywikipedia/wikipedia.py 2007-12-20 17:09:58 UTC (rev 4740)
@@ -2768,10 +2768,11 @@
'noinclude': re.compile(r'(?is)<noinclude>.*?</noinclude>'),
# wiki tags are ignored inside nowiki tags.
'nowiki': re.compile(r'(?is)<nowiki>.*?</nowiki>'),
+ # preformatted text
+ 'pre': re.compile(r'(?ism)<pre>.*?</pre>'),
+ # inline references
+ 'ref': re.compile(r'(?ism)<ref[ >].*?</ref>'),
# lines that start with a space are shown in a monospace font and
- # have whitespace preserved, with wiki tags being ignored.
- 'pre': re.compile(r'(?is)<pre>.*?</pre>'),
- # lines that start with a space are shown in a monospace font and
# have whitespace preserved.
'startspace': re.compile(r'(?m)^ (.*?)$'),
# tables often have whitespace that is used to improve wiki
Bugs item #1854624, was opened at 2007-12-20 06:47
Message generated for change (Comment added) made by rotemliss
You can respond by visiting:
https://sourceforge.net/tracker/?func=detail&atid=603138&aid=1854624&group_…
Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: None
Group: None
Status: Open
Resolution: None
Priority: 5
Private: No
Submitted By: Nobody/Anonymous (nobody)
Assigned to: Nobody/Anonymous (nobody)
Summary: delete.py -cat follows subcats
Initial Comment:
when using -cat on delete.py, it follows sub cats and deletes them also.
can this possibly changed to only delete pages in the cat pointed to in in -cat
and the current code be moved to the logical "-subcat" to follow the pattern on other scripts?
----------------------------------------------------------------------
Comment By: Rotem Liss (rotemliss)
Date: 2007-12-20 09:20
Message:
Logged In: YES
user_id=1327030
Originator: NO
It is now possible not to delete pages in subcategories, using the new
parameter "-nosubcats". The behavior of the "-cat" parameter was not
changed, to avoid breaking backwards compatibility.
----------------------------------------------------------------------
You can respond by visiting:
https://sourceforge.net/tracker/?func=detail&atid=603138&aid=1854624&group_…
Bugs item #1854624, was opened at 2007-12-19 20:47
Message generated for change (Tracker Item Submitted) made by Item Submitter
You can respond by visiting:
https://sourceforge.net/tracker/?func=detail&atid=603138&aid=1854624&group_…
Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: None
Group: None
Status: Open
Resolution: None
Priority: 5
Private: No
Submitted By: Nobody/Anonymous (nobody)
Assigned to: Nobody/Anonymous (nobody)
Summary: delete.py -cat follows subcats
Initial Comment:
when using -cat on delete.py, it follows sub cats and deletes them also.
can this possibly changed to only delete pages in the cat pointed to in in -cat
and the current code be moved to the logical "-subcat" to follow the pattern on other scripts?
----------------------------------------------------------------------
You can respond by visiting:
https://sourceforge.net/tracker/?func=detail&atid=603138&aid=1854624&group_…
Revision: 4736
Author: rotem
Date: 2007-12-19 17:55:43 +0000 (Wed, 19 Dec 2007)
Log Message:
-----------
Add a site property to all the generators that may need it.
Modified Paths:
--------------
trunk/pywikipedia/pagegenerators.py
Modified: trunk/pywikipedia/pagegenerators.py
===================================================================
--- trunk/pywikipedia/pagegenerators.py 2007-12-19 13:56:39 UTC (rev 4735)
+++ trunk/pywikipedia/pagegenerators.py 2007-12-19 17:55:43 UTC (rev 4736)
@@ -108,21 +108,23 @@
import wikipedia, date, catlib
import config
-def AllpagesPageGenerator(start ='!', namespace = None, includeredirects = True):
+def AllpagesPageGenerator(start ='!', namespace = None, includeredirects = True, site = None):
"""
Using the Allpages special page, retrieve all articles' titles, and yield
page objects.
If includeredirects is False, redirects are not included. If
includeredirects equals the string 'only', only redirects are added.
"""
- if namespace==None:
- namespace = wikipedia.Page(wikipedia.getSite(), start).namespace()
- title = wikipedia.Page(wikipedia.getSite(), start).titleWithoutNamespace()
- for page in wikipedia.getSite().allpages(start=title, namespace=namespace, includeredirects = includeredirects):
+ if site is None:
+ site = wikipedia.getSite()
+ if namespace is None:
+ namespace = wikipedia.Page(site, start).namespace()
+ title = wikipedia.Page(site, start).titleWithoutNamespace()
+ for page in site.allpages(start=title, namespace=namespace, includeredirects = includeredirects):
yield page
-def PrefixingPageGenerator(prefix, namespace=None, includeredirects=True):
- for page in AllpagesPageGenerator(prefix, namespace, includeredirects):
+def PrefixingPageGenerator(prefix, namespace=None, includeredirects=True, site = None):
+ for page in AllpagesPageGenerator(prefix, namespace, includeredirects, site):
if page.titleWithoutNamespace().startswith(prefix):
yield page
else:
@@ -260,7 +262,7 @@
for page in linkingPage.linkedPages():
yield page
-def TextfilePageGenerator(filename=None):
+def TextfilePageGenerator(filename=None, site=None):
'''
Read a file of page links between double-square-brackets, and return
them as a list of Page objects. filename is the name of the file that
@@ -268,11 +270,11 @@
'''
if filename is None:
filename = wikipedia.input(u'Please enter the filename:')
- site = wikipedia.getSite()
+ if site is None:
+ site = wikipedia.getSite()
f = codecs.open(filename, 'r', config.textfile_encoding)
R = re.compile(ur'\[\[(.+?)(?:\]\]|\|)') # title ends either before | or before ]]
for pageTitle in R.findall(f.read()):
- site = wikipedia.getSite()
# If the link doesn't refer to this site, the Page constructor
# will automatically choose the correct site.
# This makes it possible to work on different wikis using a single
@@ -281,12 +283,14 @@
yield wikipedia.Page(site, pageTitle)
f.close()
-def PagesFromTitlesGenerator(iterable):
+def PagesFromTitlesGenerator(iterable, site = None):
"""Generates pages from the titles (unicode strings) yielded by iterable"""
+ if site is None:
+ site = wikipedia.getSite()
for title in iterable:
if not isinstance(title, basestring):
break
- yield wikipedia.Page(wikipedia.getSite(), title)
+ yield wikipedia.Page(site, title)
def LinksearchPageGenerator(link, step=500, site = None):
"""Yields all pages that include a specified link, according to
@@ -328,9 +332,12 @@
'''
To use this generator, install pYsearch
'''
- def __init__(self, query = None, count = 100): # values larger than 100 fail
+ def __init__(self, query = None, count = 100, site = None): # values larger than 100 fail
self.query = query or wikipedia.input(u'Please enter the search query:')
- self.count = count;
+ self.count = count
+ if site is None:
+ site = wikipedia.getSite()
+ self.site = site
def queryYahoo(self, query):
from yahoo.search.web import WebSearch
@@ -343,14 +350,13 @@
yield url
def __iter__(self):
- site = wikipedia.getSite()
# restrict query to local site
- localQuery = '%s site:%s' % (self.query, site.hostname())
- base = 'http://%s%s' % (site.hostname(), site.nice_get_address(''))
+ localQuery = '%s site:%s' % (self.query, self.site.hostname())
+ base = 'http://%s%s' % (self.site.hostname(), self.site.nice_get_address(''))
for url in self.queryYahoo(localQuery):
if url[:len(base)] == base:
title = url[len(base):]
- page = wikipedia.Page(site, title)
+ page = wikipedia.Page(self.site, title)
yield page
class GoogleSearchPageGenerator:
@@ -360,8 +366,11 @@
http://www.google.com/apis/index.html . The google_key must be set to your
license key in your configuration.
'''
- def __init__(self, query = None):
+ def __init__(self, query = None, site = None):
self.query = query or wikipedia.input(u'Please enter the search query:')
+ if site is None:
+ site = wikipedia.getSite()
+ self.site = site
#########
# partially commented out because it is probably not in compliance with Google's "Terms of
@@ -441,22 +450,22 @@
#########
def __iter__(self):
- site = wikipedia.getSite()
# restrict query to local site
- localQuery = '%s site:%s' % (self.query, site.hostname())
- base = 'http://%s%s' % (site.hostname(), site.nice_get_address(''))
+ localQuery = '%s site:%s' % (self.query, self.site.hostname())
+ base = 'http://%s%s' % (self.site.hostname(), self.site.nice_get_address(''))
for url in self.queryGoogle(localQuery):
if url[:len(base)] == base:
title = url[len(base):]
- page = wikipedia.Page(site, title)
+ page = wikipedia.Page(self.site, title)
yield page
-def MySQLPageGenerator(query):
+def MySQLPageGenerator(query, site = None):
'''
'''
import MySQLdb as mysqldb
- site = wikipedia.getSite()
+ if site is None:
+ site = wikipedia.getSite()
conn = mysqldb.connect(config.db_hostname, db = site.dbName(),
user = config.db_username,
passwd = config.db_password)
@@ -482,25 +491,29 @@
page = wikipedia.Page(site, pageTitle)
yield page
-def YearPageGenerator(start = 1, end = 2050):
+def YearPageGenerator(start = 1, end = 2050, site = None):
+ if site is None:
+ site = wikipedia.getSite()
wikipedia.output(u"Starting with year %i" % start)
for i in xrange(start, end + 1):
if i % 100 == 0:
wikipedia.output(u'Preparing %i...' % i)
# There is no year 0
if i != 0:
- current_year = date.formatYear(wikipedia.getSite().lang, i )
- yield wikipedia.Page(wikipedia.getSite(), current_year)
+ current_year = date.formatYear(site.lang, i )
+ yield wikipedia.Page(site, current_year)
-def DayPageGenerator(startMonth=1, endMonth=12):
- fd = date.FormatDate(wikipedia.getSite())
- firstPage = wikipedia.Page(wikipedia.getSite(), fd(startMonth, 1))
+def DayPageGenerator(startMonth=1, endMonth=12, site=None):
+ if site is None:
+ site = wikipedia.getSite()
+ fd = date.FormatDate(site)
+ firstPage = wikipedia.Page(site, fd(startMonth, 1))
wikipedia.output(u"Starting with %s" % firstPage.aslink())
for month in xrange(startMonth, endMonth+1):
for day in xrange(1, date.getNumberOfDaysInMonth(month)+1):
- yield wikipedia.Page(wikipedia.getSite(), fd(month, day))
+ yield wikipedia.Page(site, fd(month, day))
-def NamespaceFilterPageGenerator(generator, namespaces):
+def NamespaceFilterPageGenerator(generator, namespaces, site = None):
"""
Wraps around another generator. Yields only those pages that are in one
of the given namespaces.
@@ -509,10 +522,12 @@
strings/unicode strings (namespace names).
"""
# convert namespace names to namespace numbers
+ if site is None:
+ site = wikipedia.getSite()
for i in xrange(len(namespaces)):
ns = namespaces[i]
if isinstance(ns, unicode) or isinstance(ns, str):
- index = wikipedia.getSite().getNamespaceIndex(ns)
+ index = site.getNamespaceIndex(ns)
if index is None:
raise ValueError(u'Unknown namespace: %s' % ns)
namespaces[i] = index
Bugs item #1850347, was opened at 2007-12-13 19:07
Message generated for change (Settings changed) made by leogregianin
You can respond by visiting:
https://sourceforge.net/tracker/?func=detail&atid=603138&aid=1850347&group_…
Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: None
Group: None
>Status: Closed
>Resolution: Fixed
Priority: 5
Private: No
Submitted By: Nobody/Anonymous (nobody)
Assigned to: Nobody/Anonymous (nobody)
Summary: Problems with images in categories
Initial Comment:
I needed to change line 248 of catlib.py from
yield ARTICLE, wikipedia.ImagePage(self.site(), "Image:%s" % title)
to
yield ARTICLE, wikipedia.ImagePage(self.site(), "%s" % title)
because otherwise the call of anycat.articles() yielded pages like [[Bild:Bild:Nettes_Bild.jpg]]
I don't know whether this was the right place to fix the problem.
My view of the sources was updated today.
----------------------------------------------------------------------
Comment By: Rotem Liss (rotemliss)
Date: 2007-12-18 14:36
Message:
Logged In: YES
user_id=1327030
Originator: NO
You are right. Fixed in r4730.
----------------------------------------------------------------------
Comment By: Bernhard Mayr (falk_steinhauer)
Date: 2007-12-16 10:07
Message:
Logged In: YES
user_id=1810075
Originator: NO
MediaWiki: 1.9.4
PHP: 5.2.4 (apache2handler)
MySQL: 4.1.20
----------------------------------------------------------------------
Comment By: Rotem Liss (rotemliss)
Date: 2007-12-15 10:03
Message:
Logged In: YES
user_id=1327030
Originator: NO
In the latest version of MediaWiki, this works properly. Which version of
MediaWiki do you use?
----------------------------------------------------------------------
Comment By: Bernhard Mayr (falk_steinhauer)
Date: 2007-12-13 19:08
Message:
Logged In: YES
user_id=1810075
Originator: NO
I reported the bug.
----------------------------------------------------------------------
You can respond by visiting:
https://sourceforge.net/tracker/?func=detail&atid=603138&aid=1850347&group_…