Pywikipedia-l December 2007

pywikipedia-l@lists.wikimedia.org

26 participants
320 discussions

[ pywikipediabot-Bugs-1855071 ] "redirect.py double -xml:xx -namespace:x" crashing
by SourceForge.net 21 Dec '07

21 Dec '07

Bugs item #1855071, was opened at 2007-12-20 19:10 Message generated for change (Tracker Item Submitted) made by Item Submitter You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=603138&aid=1855071&group_… Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. Category: None Group: None Status: Open Resolution: None Priority: 5 Private: No Submitted By: Nicolas Dumazet (nicdumz) Assigned to: Nobody/Anonymous (nobody) Summary: "redirect.py double -xml:xx -namespace:x" crashing Initial Comment: It is not even loading the XML, i get an instant error : python redirect.py double -namespace:0 -xml:frwiki-20071203-pages-articles.xml Checked for running processes. 1 processes currently running, including the current process. Reading XML dump... Traceback (most recent call last): File "redirect.py", line 377, in <module> main() File "redirect.py", line 373, in main bot.run() File "redirect.py", line 328, in run self.fix_double_redirects() File "redirect.py", line 255, in fix_double_redirects for redir_name in self.generator.retrieve_double_redirects(): File "redirect.py", line 199, in retrieve_double_redirects dict = self.get_redirects_from_dump() File "redirect.py", line 123, in get_redirects_from_dump if self.namespace and self.namespace != entry.namespace: AttributeError: XmlEntry instance has no attribute 'namespace' ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=603138&aid=1855071&group_…

1 0

[ pywikipediabot-Bugs-1855044 ] redirect.py crashes when finding a bad page
by SourceForge.net 21 Dec '07

21 Dec '07

Bugs item #1855044, was opened at 2007-12-20 18:20 Message generated for change (Tracker Item Submitted) made by Item Submitter You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=603138&aid=1855044&group_… Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. Category: None Group: None Status: Open Resolution: None Priority: 5 Private: No Submitted By: Nicolas Dumazet (nicdumz) Assigned to: Nobody/Anonymous (nobody) Summary: redirect.py crashes when finding a bad page Initial Comment: When doing a simple "redirect.py double -xml:xx.xml", I got : Traceback (most recent call last): File "redirect.py", line 377, in <module> main() File "redirect.py", line 373, in main bot.run() File "redirect.py", line 328, in run self.fix_double_redirects() File "redirect.py", line 273, in fix_double_redirects secondTargetPage = secondRedir.getRedirectTarget() File "/home/nico/projets/pywikipedia/wikipedia.py", line 1576, in getRedirectTarget self.get() File "/home/nico/projets/pywikipedia/wikipedia.py", line 595, in get self._contents, self._isWatched, self.editRestriction = self._getEditPage(get_redirect = get_redirect, throttle = throttle, sysop = sysop, nofollow_redirects=nofollow_redirects) File "/home/nico/projets/pywikipedia/wikipedia.py", line 679, in _getEditPage raise BadTitle('BadTitle: %s' % self) wikipedia.BadTitle: BadTitle: [[../Projet/Sciences/Champs magnétiques B et H]] Now, the redirect page was : #REDIRECT [[../Projet/Sciences/Champs magnétiques B et H]], which is not correct. But I think that redirect.py is supposed to ask the user if not skipping it, instead of crashing ;) Thanks, Nicolas Dumazet. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=603138&aid=1855044&group_…

1 0

SVN: [4740] trunk/pywikipedia/wikipedia.py
by russblau＠svn.wikimedia.org 21 Dec '07

21 Dec '07

Revision: 4740 Author: russblau Date: 2007-12-20 17:09:58 +0000 (Thu, 20 Dec 2007) Log Message: ----------- Add another replaceExcept exception for <ref></ref> tags, and remove duplicate comment Modified Paths: -------------- trunk/pywikipedia/wikipedia.py Modified: trunk/pywikipedia/wikipedia.py =================================================================== --- trunk/pywikipedia/wikipedia.py 2007-12-20 17:03:45 UTC (rev 4739) +++ trunk/pywikipedia/wikipedia.py 2007-12-20 17:09:58 UTC (rev 4740) @@ -2768,10 +2768,11 @@ 'noinclude': re.compile(r'(?is)<noinclude>.*?</noinclude>'), # wiki tags are ignored inside nowiki tags. 'nowiki': re.compile(r'(?is)<nowiki>.*?</nowiki>'), + # preformatted text + 'pre': re.compile(r'(?ism)<pre>.*?</pre>'), + # inline references + 'ref': re.compile(r'(?ism)<ref[ >].*?</ref>'), # lines that start with a space are shown in a monospace font and - # have whitespace preserved, with wiki tags being ignored. - 'pre': re.compile(r'(?is)<pre>.*?</pre>'), - # lines that start with a space are shown in a monospace font and # have whitespace preserved. 'startspace': re.compile(r'(?m)^ (.*?)$'), # tables often have whitespace that is used to improve wiki

1 0

SVN: [4739] trunk/pywikipedia/pagegenerators.py
by rotem＠svn.wikimedia.org 21 Dec '07

21 Dec '07

Revision: 4739 Author: rotem Date: 2007-12-20 17:03:45 +0000 (Thu, 20 Dec 2007) Log Message: ----------- Spacing. Modified Paths: -------------- trunk/pywikipedia/pagegenerators.py Modified: trunk/pywikipedia/pagegenerators.py =================================================================== --- trunk/pywikipedia/pagegenerators.py 2007-12-20 07:19:29 UTC (rev 4738) +++ trunk/pywikipedia/pagegenerators.py 2007-12-20 17:03:45 UTC (rev 4739) @@ -123,7 +123,7 @@ for page in site.allpages(start=title, namespace=namespace, includeredirects = includeredirects): yield page -def PrefixingPageGenerator(prefix, namespace=None, includeredirects=True, site = None): +def PrefixingPageGenerator(prefix, namespace = None, includeredirects = True, site = None): for page in AllpagesPageGenerator(prefix, namespace, includeredirects, site): if page.titleWithoutNamespace().startswith(prefix): yield page @@ -292,7 +292,7 @@ break yield wikipedia.Page(site, title) -def LinksearchPageGenerator(link, step=500, site = None): +def LinksearchPageGenerator(link, step=500, site=None): """Yields all pages that include a specified link, according to [[Special:Linksearch]]. Retrieves in chunks of size "step" (default 500). @@ -503,7 +503,7 @@ current_year = date.formatYear(site.lang, i ) yield wikipedia.Page(site, current_year) -def DayPageGenerator(startMonth=1, endMonth=12, site=None): +def DayPageGenerator(startMonth = 1, endMonth = 12, site = None): if site is None: site = wikipedia.getSite() fd = date.FormatDate(site) @@ -647,7 +647,7 @@ wikipedia.output(unicode(e)) self.queue.put(None) # to signal end of list -def PreloadingGenerator(generator, pageNumber=60): +def PreloadingGenerator(generator, pageNumber = 60): """ Yields the same pages as generator generator. Retrieves 60 pages (or another number specified by pageNumber), loads them using

1 0

[ pywikipediabot-Bugs-1854624 ] delete.py -cat follows subcats
by SourceForge.net 20 Dec '07

20 Dec '07

Bugs item #1854624, was opened at 2007-12-20 06:47 Message generated for change (Comment added) made by rotemliss You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=603138&aid=1854624&group_… Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. Category: None Group: None Status: Open Resolution: None Priority: 5 Private: No Submitted By: Nobody/Anonymous (nobody) Assigned to: Nobody/Anonymous (nobody) Summary: delete.py -cat follows subcats Initial Comment: when using -cat on delete.py, it follows sub cats and deletes them also. can this possibly changed to only delete pages in the cat pointed to in in -cat and the current code be moved to the logical "-subcat" to follow the pattern on other scripts? ---------------------------------------------------------------------- Comment By: Rotem Liss (rotemliss) Date: 2007-12-20 09:20 Message: Logged In: YES user_id=1327030 Originator: NO It is now possible not to delete pages in subcategories, using the new parameter "-nosubcats". The behavior of the "-cat" parameter was not changed, to avoid breaking backwards compatibility. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=603138&aid=1854624&group_…

1 0

SVN: [4738] trunk/pywikipedia/delete.py
by rotem＠svn.wikimedia.org 20 Dec '07

20 Dec '07

Revision: 4738 Author: rotem Date: 2007-12-20 07:19:29 +0000 (Thu, 20 Dec 2007) Log Message: ----------- A possibility not to delete pages in subcategories. Modified Paths: -------------- trunk/pywikipedia/delete.py Modified: trunk/pywikipedia/delete.py =================================================================== --- trunk/pywikipedia/delete.py 2007-12-20 07:16:52 UTC (rev 4737) +++ trunk/pywikipedia/delete.py 2007-12-20 07:19:29 UTC (rev 4738) @@ -9,6 +9,7 @@ -page: Delete specified page -cat: Delete all pages in the given category. +-nosubcats: Don't delete pages in the subcategories. -links: Delete all pages linked from a given page. -file: Delete all pages listed in a text file. -ref: Delete all pages referring from a given page. @@ -112,6 +113,7 @@ always = False doSinglePage = False doCategory = False + deleteSubcategories = True doRef = False doLinks = False doImages = False @@ -139,6 +141,8 @@ pageName = wikipedia.input(u'Enter the category to delete from:') else: pageName = arg[len('-cat:'):] + elif arg.startswith('-nosubcats'): + deleteSubcategories = False elif arg.startswith('-links'): doLinks = True if len(arg) == len('-links'): @@ -178,7 +182,7 @@ summary = wikipedia.translate(mysite, msg_delete_category) % pageName ns = mysite.category_namespace() categoryPage = catlib.Category(mysite, ns + ':' + pageName) - gen = pagegenerators.CategorizedPageGenerator(categoryPage, recurse=True) + gen = pagegenerators.CategorizedPageGenerator(categoryPage, recurse = deleteSubcategories) elif doLinks: if not summary: summary = wikipedia.translate(mysite, msg_delete_links) % pageName

1 0

SVN: [4737] trunk/pywikipedia/delete.py
by rotem＠svn.wikimedia.org 20 Dec '07

20 Dec '07

Revision: 4737 Author: rotem Date: 2007-12-20 07:16:52 +0000 (Thu, 20 Dec 2007) Log Message: ----------- Use variable. Modified Paths: -------------- trunk/pywikipedia/delete.py Modified: trunk/pywikipedia/delete.py =================================================================== --- trunk/pywikipedia/delete.py 2007-12-19 17:55:43 UTC (rev 4736) +++ trunk/pywikipedia/delete.py 2007-12-20 07:16:52 UTC (rev 4737) @@ -171,24 +171,24 @@ if doSinglePage: if not summary: summary = wikipedia.input(u'Enter a reason for the deletion:') - page = wikipedia.Page(wikipedia.getSite(), pageName) + page = wikipedia.Page(mysite, pageName) gen = iter([page]) elif doCategory: if not summary: summary = wikipedia.translate(mysite, msg_delete_category) % pageName - ns = wikipedia.getSite().category_namespace() - categoryPage = catlib.Category(wikipedia.getSite(), ns + ':' + pageName) + ns = mysite.category_namespace() + categoryPage = catlib.Category(mysite, ns + ':' + pageName) gen = pagegenerators.CategorizedPageGenerator(categoryPage, recurse=True) elif doLinks: if not summary: summary = wikipedia.translate(mysite, msg_delete_links) % pageName wikipedia.setAction(summary) - linksPage = wikipedia.Page(wikipedia.getSite(), pageName) + linksPage = wikipedia.Page(mysite, pageName) gen = pagegenerators.LinkedPageGenerator(linksPage) elif doRef: if not summary: summary = wikipedia.translate(mysite, msg_delete_ref) % pageName - refPage = wikipedia.Page(wikipedia.getSite(), pageName) + refPage = wikipedia.Page(mysite, pageName) gen = pagegenerators.ReferringPageGenerator(refPage) elif fileName: if not summary: @@ -197,7 +197,7 @@ elif doImages: if not summary: summary = wikipedia.translate(mysite, msg_delete_images) - gen = pagegenerators.ImagesPageGenerator(wikipedia.Page(wikipedia.getSite(), pageName)) + gen = pagegenerators.ImagesPageGenerator(wikipedia.Page(mysite, pageName)) if gen: wikipedia.setAction(summary)

1 0

[ pywikipediabot-Bugs-1854624 ] delete.py -cat follows subcats
by SourceForge.net 20 Dec '07

20 Dec '07

Bugs item #1854624, was opened at 2007-12-19 20:47 Message generated for change (Tracker Item Submitted) made by Item Submitter You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=603138&aid=1854624&group_… Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. Category: None Group: None Status: Open Resolution: None Priority: 5 Private: No Submitted By: Nobody/Anonymous (nobody) Assigned to: Nobody/Anonymous (nobody) Summary: delete.py -cat follows subcats Initial Comment: when using -cat on delete.py, it follows sub cats and deletes them also. can this possibly changed to only delete pages in the cat pointed to in in -cat and the current code be moved to the logical "-subcat" to follow the pattern on other scripts? ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=603138&aid=1854624&group_…

1 0

SVN: [4736] trunk/pywikipedia/pagegenerators.py
by rotem＠svn.wikimedia.org 20 Dec '07

20 Dec '07

Revision: 4736 Author: rotem Date: 2007-12-19 17:55:43 +0000 (Wed, 19 Dec 2007) Log Message: ----------- Add a site property to all the generators that may need it. Modified Paths: -------------- trunk/pywikipedia/pagegenerators.py Modified: trunk/pywikipedia/pagegenerators.py =================================================================== --- trunk/pywikipedia/pagegenerators.py 2007-12-19 13:56:39 UTC (rev 4735) +++ trunk/pywikipedia/pagegenerators.py 2007-12-19 17:55:43 UTC (rev 4736) @@ -108,21 +108,23 @@ import wikipedia, date, catlib import config -def AllpagesPageGenerator(start ='!', namespace = None, includeredirects = True): +def AllpagesPageGenerator(start ='!', namespace = None, includeredirects = True, site = None): """ Using the Allpages special page, retrieve all articles' titles, and yield page objects. If includeredirects is False, redirects are not included. If includeredirects equals the string 'only', only redirects are added. """ - if namespace==None: - namespace = wikipedia.Page(wikipedia.getSite(), start).namespace() - title = wikipedia.Page(wikipedia.getSite(), start).titleWithoutNamespace() - for page in wikipedia.getSite().allpages(start=title, namespace=namespace, includeredirects = includeredirects): + if site is None: + site = wikipedia.getSite() + if namespace is None: + namespace = wikipedia.Page(site, start).namespace() + title = wikipedia.Page(site, start).titleWithoutNamespace() + for page in site.allpages(start=title, namespace=namespace, includeredirects = includeredirects): yield page -def PrefixingPageGenerator(prefix, namespace=None, includeredirects=True): - for page in AllpagesPageGenerator(prefix, namespace, includeredirects): +def PrefixingPageGenerator(prefix, namespace=None, includeredirects=True, site = None): + for page in AllpagesPageGenerator(prefix, namespace, includeredirects, site): if page.titleWithoutNamespace().startswith(prefix): yield page else: @@ -260,7 +262,7 @@ for page in linkingPage.linkedPages(): yield page -def TextfilePageGenerator(filename=None): +def TextfilePageGenerator(filename=None, site=None): ''' Read a file of page links between double-square-brackets, and return them as a list of Page objects. filename is the name of the file that @@ -268,11 +270,11 @@ ''' if filename is None: filename = wikipedia.input(u'Please enter the filename:') - site = wikipedia.getSite() + if site is None: + site = wikipedia.getSite() f = codecs.open(filename, 'r', config.textfile_encoding) R = re.compile(ur'\[\[(.+?)(?:\]\]|\|)') # title ends either before | or before ]] for pageTitle in R.findall(f.read()): - site = wikipedia.getSite() # If the link doesn't refer to this site, the Page constructor # will automatically choose the correct site. # This makes it possible to work on different wikis using a single @@ -281,12 +283,14 @@ yield wikipedia.Page(site, pageTitle) f.close() -def PagesFromTitlesGenerator(iterable): +def PagesFromTitlesGenerator(iterable, site = None): """Generates pages from the titles (unicode strings) yielded by iterable""" + if site is None: + site = wikipedia.getSite() for title in iterable: if not isinstance(title, basestring): break - yield wikipedia.Page(wikipedia.getSite(), title) + yield wikipedia.Page(site, title) def LinksearchPageGenerator(link, step=500, site = None): """Yields all pages that include a specified link, according to @@ -328,9 +332,12 @@ ''' To use this generator, install pYsearch ''' - def __init__(self, query = None, count = 100): # values larger than 100 fail + def __init__(self, query = None, count = 100, site = None): # values larger than 100 fail self.query = query or wikipedia.input(u'Please enter the search query:') - self.count = count; + self.count = count + if site is None: + site = wikipedia.getSite() + self.site = site def queryYahoo(self, query): from yahoo.search.web import WebSearch @@ -343,14 +350,13 @@ yield url def __iter__(self): - site = wikipedia.getSite() # restrict query to local site - localQuery = '%s site:%s' % (self.query, site.hostname()) - base = 'http://%s%s' % (site.hostname(), site.nice_get_address('')) + localQuery = '%s site:%s' % (self.query, self.site.hostname()) + base = 'http://%s%s' % (self.site.hostname(), self.site.nice_get_address('')) for url in self.queryYahoo(localQuery): if url[:len(base)] == base: title = url[len(base):] - page = wikipedia.Page(site, title) + page = wikipedia.Page(self.site, title) yield page class GoogleSearchPageGenerator: @@ -360,8 +366,11 @@ http://www.google.com/apis/index.html . The google_key must be set to your license key in your configuration. ''' - def __init__(self, query = None): + def __init__(self, query = None, site = None): self.query = query or wikipedia.input(u'Please enter the search query:') + if site is None: + site = wikipedia.getSite() + self.site = site ######### # partially commented out because it is probably not in compliance with Google's "Terms of @@ -441,22 +450,22 @@ ######### def __iter__(self): - site = wikipedia.getSite() # restrict query to local site - localQuery = '%s site:%s' % (self.query, site.hostname()) - base = 'http://%s%s' % (site.hostname(), site.nice_get_address('')) + localQuery = '%s site:%s' % (self.query, self.site.hostname()) + base = 'http://%s%s' % (self.site.hostname(), self.site.nice_get_address('')) for url in self.queryGoogle(localQuery): if url[:len(base)] == base: title = url[len(base):] - page = wikipedia.Page(site, title) + page = wikipedia.Page(self.site, title) yield page -def MySQLPageGenerator(query): +def MySQLPageGenerator(query, site = None): ''' ''' import MySQLdb as mysqldb - site = wikipedia.getSite() + if site is None: + site = wikipedia.getSite() conn = mysqldb.connect(config.db_hostname, db = site.dbName(), user = config.db_username, passwd = config.db_password) @@ -482,25 +491,29 @@ page = wikipedia.Page(site, pageTitle) yield page -def YearPageGenerator(start = 1, end = 2050): +def YearPageGenerator(start = 1, end = 2050, site = None): + if site is None: + site = wikipedia.getSite() wikipedia.output(u"Starting with year %i" % start) for i in xrange(start, end + 1): if i % 100 == 0: wikipedia.output(u'Preparing %i...' % i) # There is no year 0 if i != 0: - current_year = date.formatYear(wikipedia.getSite().lang, i ) - yield wikipedia.Page(wikipedia.getSite(), current_year) + current_year = date.formatYear(site.lang, i ) + yield wikipedia.Page(site, current_year) -def DayPageGenerator(startMonth=1, endMonth=12): - fd = date.FormatDate(wikipedia.getSite()) - firstPage = wikipedia.Page(wikipedia.getSite(), fd(startMonth, 1)) +def DayPageGenerator(startMonth=1, endMonth=12, site=None): + if site is None: + site = wikipedia.getSite() + fd = date.FormatDate(site) + firstPage = wikipedia.Page(site, fd(startMonth, 1)) wikipedia.output(u"Starting with %s" % firstPage.aslink()) for month in xrange(startMonth, endMonth+1): for day in xrange(1, date.getNumberOfDaysInMonth(month)+1): - yield wikipedia.Page(wikipedia.getSite(), fd(month, day)) + yield wikipedia.Page(site, fd(month, day)) -def NamespaceFilterPageGenerator(generator, namespaces): +def NamespaceFilterPageGenerator(generator, namespaces, site = None): """ Wraps around another generator. Yields only those pages that are in one of the given namespaces. @@ -509,10 +522,12 @@ strings/unicode strings (namespace names). """ # convert namespace names to namespace numbers + if site is None: + site = wikipedia.getSite() for i in xrange(len(namespaces)): ns = namespaces[i] if isinstance(ns, unicode) or isinstance(ns, str): - index = wikipedia.getSite().getNamespaceIndex(ns) + index = site.getNamespaceIndex(ns) if index is None: raise ValueError(u'Unknown namespace: %s' % ns) namespaces[i] = index

1 0

[ pywikipediabot-Bugs-1850347 ] Problems with images in categories
by SourceForge.net 20 Dec '07

20 Dec '07

Bugs item #1850347, was opened at 2007-12-13 19:07 Message generated for change (Settings changed) made by leogregianin You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=603138&aid=1850347&group_… Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. Category: None Group: None >Status: Closed >Resolution: Fixed Priority: 5 Private: No Submitted By: Nobody/Anonymous (nobody) Assigned to: Nobody/Anonymous (nobody) Summary: Problems with images in categories Initial Comment: I needed to change line 248 of catlib.py from yield ARTICLE, wikipedia.ImagePage(self.site(), "Image:%s" % title) to yield ARTICLE, wikipedia.ImagePage(self.site(), "%s" % title) because otherwise the call of anycat.articles() yielded pages like [[Bild:Bild:Nettes_Bild.jpg]] I don't know whether this was the right place to fix the problem. My view of the sources was updated today. ---------------------------------------------------------------------- Comment By: Rotem Liss (rotemliss) Date: 2007-12-18 14:36 Message: Logged In: YES user_id=1327030 Originator: NO You are right. Fixed in r4730. ---------------------------------------------------------------------- Comment By: Bernhard Mayr (falk_steinhauer) Date: 2007-12-16 10:07 Message: Logged In: YES user_id=1810075 Originator: NO MediaWiki: 1.9.4 PHP: 5.2.4 (apache2handler) MySQL: 4.1.20 ---------------------------------------------------------------------- Comment By: Rotem Liss (rotemliss) Date: 2007-12-15 10:03 Message: Logged In: YES user_id=1327030 Originator: NO In the latest version of MediaWiki, this works properly. Which version of MediaWiki do you use? ---------------------------------------------------------------------- Comment By: Bernhard Mayr (falk_steinhauer) Date: 2007-12-13 19:08 Message: Logged In: YES user_id=1810075 Originator: NO I reported the bug. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=603138&aid=1850347&group_…

1 0

← Newer
1
...
6
7
8
9
10
11
12
...
32
Older →

Jump to page:

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

Pywikipedia-l December 2007