Pywikipedia-l August 2008

pywikipedia-l@lists.wikimedia.org

29 participants
206 discussions

[ pywikipediabot-Bugs-2061186 ] interwiki.py doesn't work well
by SourceForge.net 20 Aug '08

20 Aug '08

Bugs item #2061186, was opened at 2008-08-20 03:26 Message generated for change (Comment added) made by a_engels You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=603138&aid=2061186&group_… Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. Category: General Group: None Status: Open Resolution: None Priority: 8 Private: No Submitted By: Woo-Jin Kim (kwj2772) Assigned to: Nobody/Anonymous (nobody) Summary: interwiki.py doesn't work well Initial Comment: I have found serious bug on interwiki.py I'm using Pywikipedia revision 5816. Error message: C:\Python25\pywikipedia\interwiki.py -autonomous -lang:en -start:! Checked for running processes. 1 process currently running, including currently process. NOTE:Number of pages queued is 0, trying to add 60 more. Retreiving Allpages special page for wikipedia:en from %21, namespace 0 NOTE:Nothing to left to do Why nothing to left to do? ---------------------------------------------------------------------- >Comment By: Andre Engels (a_engels) Date: 2008-08-20 10:56 Message: Logged In: YES user_id=843018 Originator: NO Yes, Special:Allpages has changed, and we already got the "fuck off, we don't care about your framework" response when complaining. ---------------------------------------------------------------------- Comment By: Multichill (multichill) Date: 2008-08-20 10:50 Message: Logged In: YES user_id=1777493 Originator: NO I first noticed this last night. I have this problem on different systems (WinXP/BSD/Linux), all running the latest version. I noticed it with imageuncat.py, but interwiki.py didnt work either. I doesnt seem to matter which wiki you want to work on (nl and commons both didnt work). Brion told me that Special:AllPages changed recently. python version.py Pywikipedia [http] trunk/pywikipedia (r5819, Aug 20 2008, 08:09:06) Python 2.5.2 (r252:60911, Aug 14 2008, 13:31:58) [GCC 4.3.1] python imageuncat.py -start:Image:AChecked for running processes. 1 processes currently running, including the current process. Retrieving Allpages special page for commons:commons from A, namespace 6 <done> ---------------------------------------------------------------------- Comment By: Mikko Silvonen (silvonen) Date: 2008-08-20 07:15 Message: Logged In: YES user_id=127947 Originator: NO Yep, the allpages method in wikipedia.py doesn't seem to find any pages, so the -start parameter doesn't work at all. Has the format of Special:Allpages changed, or what is causing this problem? But now it's time for my day job... ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=603138&aid=2061186&group_…

1 0

[ pywikipediabot-Bugs-2061186 ] interwiki.py doesn't work well
by SourceForge.net 20 Aug '08

20 Aug '08

Bugs item #2061186, was opened at 2008-08-20 03:26 Message generated for change (Comment added) made by multichill You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=603138&aid=2061186&group_… Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. >Category: General Group: None Status: Open Resolution: None Priority: 8 Private: No Submitted By: Woo-Jin Kim (kwj2772) Assigned to: Nobody/Anonymous (nobody) Summary: interwiki.py doesn't work well Initial Comment: I have found serious bug on interwiki.py I'm using Pywikipedia revision 5816. Error message: C:\Python25\pywikipedia\interwiki.py -autonomous -lang:en -start:! Checked for running processes. 1 process currently running, including currently process. NOTE:Number of pages queued is 0, trying to add 60 more. Retreiving Allpages special page for wikipedia:en from %21, namespace 0 NOTE:Nothing to left to do Why nothing to left to do? ---------------------------------------------------------------------- >Comment By: Multichill (multichill) Date: 2008-08-20 10:50 Message: Logged In: YES user_id=1777493 Originator: NO I first noticed this last night. I have this problem on different systems (WinXP/BSD/Linux), all running the latest version. I noticed it with imageuncat.py, but interwiki.py didnt work either. I doesnt seem to matter which wiki you want to work on (nl and commons both didnt work). Brion told me that Special:AllPages changed recently. python version.py Pywikipedia [http] trunk/pywikipedia (r5819, Aug 20 2008, 08:09:06) Python 2.5.2 (r252:60911, Aug 14 2008, 13:31:58) [GCC 4.3.1] python imageuncat.py -start:Image:AChecked for running processes. 1 processes currently running, including the current process. Retrieving Allpages special page for commons:commons from A, namespace 6 <done> ---------------------------------------------------------------------- Comment By: Mikko Silvonen (silvonen) Date: 2008-08-20 07:15 Message: Logged In: YES user_id=127947 Originator: NO Yep, the allpages method in wikipedia.py doesn't seem to find any pages, so the -start parameter doesn't work at all. Has the format of Special:Allpages changed, or what is causing this problem? But now it's time for my day job... ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=603138&aid=2061186&group_…

1 0

SVN: [5819] trunk/pywikipedia/family.py
by purodha＠svn.wikimedia.org 20 Aug '08

20 Aug '08

Revision: 5819 Author: purodha Date: 2008-08-20 08:09:06 +0000 (Wed, 20 Aug 2008) Log Message: ----------- Corrections to r5812 (New localisations of am (Amharic) and nah (Classic Nahuatl) namespacenames) Modified Paths: -------------- trunk/pywikipedia/family.py Modified: trunk/pywikipedia/family.py =================================================================== --- trunk/pywikipedia/family.py 2008-08-19 21:03:42 UTC (rev 5818) +++ trunk/pywikipedia/family.py 2008-08-20 08:09:06 UTC (rev 5819) @@ -593,7 +593,7 @@ 'ab': u'Обсуждение участника', 'af': u'Gebruikerbespreking', 'als': u'Benutzer Diskussion', - 'am': u'አባል_ውይይት', + 'am': u'አባል ውይይት', 'an': u'Descusión usuario', 'ar': u'نقاش المستخدم', 'as': u'सदस्य वार्ता', @@ -847,7 +847,7 @@ 'mr': u'चित्र', 'ms': u'Imej', 'mzn': u'تصویر', - 'nah':['Īxiptli', u'Imagen'], + 'nah':[u'Īxiptli', u'Imagen'], 'nap': u'Immagine', 'nds': u'Bild', 'nds-nl': u'Ofbeelding', @@ -907,7 +907,7 @@ 'ab': u'Обсуждение изображения', 'af': u'Beeldbespreking', 'als': u'Bild Diskussion', - 'am': u'ስዕል_ውይይት', + 'am': u'ስዕል ውይይት', 'an': u'Descusión imachen', 'ar': u'نقاش الصورة', 'as': u'चित्र वार्ता', @@ -1109,7 +1109,7 @@ 'ab': u'Обсуждение MediaWiki', 'af': u'MediaWikibespreking', 'als': u'MediaWiki Diskussion', - 'am': u'መልዕክት_ውይይት', + 'am': u'መልዕክት ውይይት', 'an': u'Descusión MediaWiki', 'ar': u'نقاش ميدياويكي', 'ast': u'MediaWiki alderique', @@ -1405,7 +1405,7 @@ 'ab': u'Обсуждение шаблона', 'af': u'Sjabloonbespreking', 'als': u'Vorlage Diskussion', - 'am': u'መልጠፊያ_ውይይት', + 'am': u'መልጠፊያ ውይይት', 'an': u'Descusión plantilla', 'ar': u'نقاش القالب', 'as': u'साँचा वार्ता', @@ -2014,7 +2014,7 @@ 'ab': u'Обсуждение категории', 'af': u'Kategoriebespreking', 'als': u'Kategorie Diskussion', - 'am': u'መደብ_ውይይት', + 'am': u'መደብ ውይይት', 'an': u'Descusión categoría', 'ar': u'نقاش التصنيف', 'as': u'श्रेणी वार्ता',

1 0

[ pywikipediabot-Bugs-2061186 ] interwiki.py doesn't work well
by SourceForge.net 20 Aug '08

20 Aug '08

Bugs item #2061186, was opened at 2008-08-20 04:26 Message generated for change (Comment added) made by silvonen You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=603138&aid=2061186&group_… Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. Category: interwiki Group: None Status: Open Resolution: None Priority: 8 Private: No Submitted By: Woo-Jin Kim (kwj2772) Assigned to: Nobody/Anonymous (nobody) Summary: interwiki.py doesn't work well Initial Comment: I have found serious bug on interwiki.py I'm using Pywikipedia revision 5816. Error message: C:\Python25\pywikipedia\interwiki.py -autonomous -lang:en -start:! Checked for running processes. 1 process currently running, including currently process. NOTE:Number of pages queued is 0, trying to add 60 more. Retreiving Allpages special page for wikipedia:en from %21, namespace 0 NOTE:Nothing to left to do Why nothing to left to do? ---------------------------------------------------------------------- Comment By: Mikko Silvonen (silvonen) Date: 2008-08-20 08:15 Message: Logged In: YES user_id=127947 Originator: NO Yep, the allpages method in wikipedia.py doesn't seem to find any pages, so the -start parameter doesn't work at all. Has the format of Special:Allpages changed, or what is causing this problem? But now it's time for my day job... ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=603138&aid=2061186&group_…

1 0

[ pywikipediabot-Bugs-2061186 ] interwiki.py doesn't work well
by SourceForge.net 19 Aug '08

19 Aug '08

Bugs item #2061186, was opened at 2008-08-20 10:26 Message generated for change (Settings changed) made by kwj2772 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=603138&aid=2061186&group_… Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. Category: interwiki Group: None Status: Open Resolution: None >Priority: 8 Private: No Submitted By: Woo-Jin Kim (kwj2772) Assigned to: Nobody/Anonymous (nobody) Summary: interwiki.py doesn't work well Initial Comment: I have found serious bug on interwiki.py I'm using Pywikipedia revision 5816. Error message: C:\Python25\pywikipedia\interwiki.py -autonomous -lang:en -start:! Checked for running processes. 1 process currently running, including currently process. NOTE:Number of pages queued is 0, trying to add 60 more. Retreiving Allpages special page for wikipedia:en from %21, namespace 0 NOTE:Nothing to left to do Why nothing to left to do? ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=603138&aid=2061186&group_…

1 0

[ pywikipediabot-Bugs-2061186 ] interwiki.py doesn't work well
by SourceForge.net 19 Aug '08

19 Aug '08

Bugs item #2061186, was opened at 2008-08-20 10:26 Message generated for change (Tracker Item Submitted) made by Item Submitter You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=603138&aid=2061186&group_… Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. Category: interwiki Group: None Status: Open Resolution: None Priority: 5 Private: No Submitted By: Woo-Jin Kim (kwj2772) Assigned to: Nobody/Anonymous (nobody) Summary: interwiki.py doesn't work well Initial Comment: I have found serious bug on interwiki.py I'm using Pywikipedia revision 5816. Error message: C:\Python25\pywikipedia\interwiki.py -autonomous -lang:en -start:! Checked for running processes. 1 process currently running, including currently process. NOTE:Number of pages queued is 0, trying to add 60 more. Retreiving Allpages special page for wikipedia:en from %21, namespace 0 NOTE:Nothing to left to do Why nothing to left to do? ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=603138&aid=2061186&group_…

1 0

SVN: [5818] trunk/pywikipedia/imagerecat.py
by multichill＠svn.wikimedia.org 19 Aug '08

19 Aug '08

Revision: 5818 Author: multichill Date: 2008-08-19 21:03:42 +0000 (Tue, 19 Aug 2008) Log Message: ----------- Fix template uncat Modified Paths: -------------- trunk/pywikipedia/imagerecat.py Modified: trunk/pywikipedia/imagerecat.py =================================================================== --- trunk/pywikipedia/imagerecat.py 2008-08-19 20:18:07 UTC (rev 5817) +++ trunk/pywikipedia/imagerecat.py 2008-08-19 21:03:42 UTC (rev 5818) @@ -33,7 +33,7 @@ commonshelperCats = getCommonshelperCats(imagepage) newcats = filterBlacklist(commonshelperCats+currentCats) newcats = filterDisambiguation(newcats) - newcats = filterRedirects(newcats) + newcats = followRedirects(newcats) #newcats = filterCountries(newcats) newcats = filterParents(newcats) if len(newcats) > 0: @@ -96,7 +96,7 @@ result.append(cat) return result -def filterRedirects(categories): +def followRedirects(categories): result = [] for cat in categories: categoryPage = wikipedia.Page(wikipedia.getSite(), u'Category:' + cat) @@ -150,7 +150,7 @@ result = u'' result = re.sub(u'\{\{\s*([Uu]ncat(egori[sz]ed( image)?)?|[Nn]ocat|[Nn]eedscategory)[^}]*\}\}', u'', oldtext) result = re.sub(u'', u'', result) - result = re.sub(u'\{\{\s*[Cc]heck categories[^}]*\}\}', u'', oldtext) + result = re.sub(u'\{\{\s*[Cc]heck categories[^}]*\}\}', u'', result) return result

1 0

SVN: [5817] trunk/pywikipedia/imagerecat.py
by multichill＠svn.wikimedia.org 19 Aug '08

19 Aug '08

Revision: 5817 Author: multichill Date: 2008-08-19 20:18:07 +0000 (Tue, 19 Aug 2008) Log Message: ----------- filterDisambiguation & filterRedirects Modified Paths: -------------- trunk/pywikipedia/imagerecat.py Modified: trunk/pywikipedia/imagerecat.py =================================================================== --- trunk/pywikipedia/imagerecat.py 2008-08-19 19:20:14 UTC (rev 5816) +++ trunk/pywikipedia/imagerecat.py 2008-08-19 20:18:07 UTC (rev 5817) @@ -20,7 +20,8 @@ import pagegenerators, StringIO import socket -category_blacklist = [u'Hidden categories'] +category_blacklist = [u'Hidden categories', + u'Stub pictures'] def categorizeImages(generator): for page in generator: @@ -31,8 +32,8 @@ currentCats = getCurrentCats(imagepage) commonshelperCats = getCommonshelperCats(imagepage) newcats = filterBlacklist(commonshelperCats+currentCats) - #newcats = filterDisambiguation(newcats) - #newcats = filterRedirects(newcats) + newcats = filterDisambiguation(newcats) + newcats = filterRedirects(newcats) #newcats = filterCountries(newcats) newcats = filterParents(newcats) if len(newcats) > 0: @@ -90,11 +91,21 @@ def filterDisambiguation(categories): result = [] + for cat in categories: + if(not wikipedia.Page(wikipedia.getSite(), u'Category:' + cat).isDisambig()): + result.append(cat) return result - def filterRedirects(categories): result = [] + for cat in categories: + categoryPage = wikipedia.Page(wikipedia.getSite(), u'Category:' + cat) + if u'Category redirect' in categoryPage.templates() or u'Seecat' in categoryPage.templates(): + for template in categoryPage.templatesWithParams(): + if ((template[0]==u'Category redirect' or template[0]==u'Seecat') and (len(template[1]) > 0)): + result.append(template[1][0]) + else: + result.append(cat) return result @@ -131,7 +142,7 @@ newtext = newtext + u'[[Category:' + category + u']]\n' wikipedia.showDiff(imagepage.get(), newtext) - #imagepage.put(newtext, u'Image is categorized by a bot using data from [[Commons:Tools#CommonSense|CommonSense]]') + imagepage.put(newtext, u'Image is categorized by a bot using data from [[Commons:Tools#CommonSense|CommonSense]]') return

1 0

SVN: [5816] trunk/pywikipedia/imagerecat.py
by multichill＠svn.wikimedia.org 19 Aug '08

19 Aug '08

Revision: 5816 Author: multichill Date: 2008-08-19 19:20:14 +0000 (Tue, 19 Aug 2008) Log Message: ----------- Rewrite, removed all the threading. Made some initial filters. Modified Paths: -------------- trunk/pywikipedia/imagerecat.py Modified: trunk/pywikipedia/imagerecat.py =================================================================== --- trunk/pywikipedia/imagerecat.py 2008-08-19 12:05:23 UTC (rev 5815) +++ trunk/pywikipedia/imagerecat.py 2008-08-19 19:20:14 UTC (rev 5816) @@ -2,29 +2,9 @@ """ Program to (re)categorize images at commons. -The program uses commonshelper for category suggestions. The program consists of three parts. +The program uses commonshelper for category suggestions. +It takes the suggestions and the current categories. Put the categories through some filters and add the result -1. prefetchThread - Fetches all the information -2. userThread - Gets input from the user -3. putThread - modifies the images - -You need to install the Python Imaging Library http://www.pythonware.com/products/pil/ to get this program working - -The program is far from finished. The framework is there, but still a lot has to be implemented: -1. The prefetch thread - * Mostly finished. - * Should add some error handling to cope with a slow toolserver - * Should check if images with special chars work alright - * Parameter to dont use commonshelper? -2. The user thread - * Tkinter layout is awful atm - * Tkinter have to implement most of the interaction - * Tkinter category webbrowser link - * Tkinter something with category auto completion (like the javascript in the search box) -3. The put thread - * Nothing much to put atm - * Should remove the Uncategorized template (+ redirects) - * Should check if something is actually changed (set operations?) """ # # (C) Multichill 2008 @@ -34,183 +14,135 @@ # import os, sys, re, codecs import urllib, httplib, urllib2 -import catlib, thread -import time, threading +import catlib +import time import wikipedia, config -import pagegenerators, add_text, Queue, StringIO +import pagegenerators, StringIO import socket -exitProgram = False -#autonomous = False - category_blacklist = [u'Hidden categories'] -class prefetchThread (threading.Thread): +def categorizeImages(generator): + for page in generator: + if page.exists() and (page.namespace() == 6) and (not page.isRedirectPage()): + imagepage = wikipedia.ImagePage(page.site(), page.title()) + #imagepage.get() + wikipedia.output(u'Working on ' + imagepage.title()); + currentCats = getCurrentCats(imagepage) + commonshelperCats = getCommonshelperCats(imagepage) + newcats = filterBlacklist(commonshelperCats+currentCats) + #newcats = filterDisambiguation(newcats) + #newcats = filterRedirects(newcats) + #newcats = filterCountries(newcats) + newcats = filterParents(newcats) + if len(newcats) > 0: + for cat in newcats: + wikipedia.output(u' Found new cat: ' + cat); + saveImagePage(imagepage, newcats) + + +def getCurrentCats(imagepage): ''' - Class to fetch al the info for the user. This thread gets the imagepage, the commonshelper suggestions and the image. - The thread puts this item in a queue. When there are no more pages left the thread puts a None object in the queue and exits. + Get the categories currently on the image ''' - def __init__ (self, generator, prefetchToPutQueue): - ''' - Get the thread ready - ''' - self.generator = generator - self.prefetchToPutQueue = prefetchToPutQueue - self.currentCats = [] - self.commonshelperCats = [] - self.image = None - self.imagepage = None - self.pregenerator = pagegenerators.PreloadingGenerator(self.generator) - threading.Thread.__init__ ( self ) - - def run(self): - global exitProgram - #global autonomous - for page in self.pregenerator: - if exitProgram: - break; - if page.exists() and (page.namespace() == 6) and (not page.isRedirectPage()) : - self.imagepage = wikipedia.ImagePage(page.site(), page.title()) - self.imagepage.get() - wikipedia.output(u'Working on ' + self.imagepage.title()); - self.currentCats = self.getCurrentCats(self.imagepage) - self.commonshelperCats = self.filterCats(self.currentCats, self.getCommonshelperCats(self.imagepage)) - - #if not autonomous: - # self.image = self.getImage(self.imagepage) - #self.prefetchToUserQueue.put((self.imagepage, self.currentCats, self.commonshelperCats, self.image)) + result = [] + for cat in imagepage.categories(): + result.append(cat.titleWithoutNamespace()) + return list(set(result)) - if len(self.commonshelperCats) > 0: - for cat in self.commonshelperCats: - wikipedia.output(u' Found new cat: ' + cat); - self.prefetchToPutQueue.put((self.imagepage, self.commonshelperCats)) - self.prefetchToPutQueue.put(None) - return - - def getCurrentCats(self, imagepage): - ''' - Get the categories currently on the image - ''' - result = [] - for cat in imagepage.categories(): - result.append(cat.titleWithoutNamespace()) - return result - - def getCommonshelperCats(self, imagepage): - ''' - Get category suggestions from commonshelper. Parse them and return a list of suggestions. - ''' - parameters = urllib.urlencode({'i' : imagepage.titleWithoutNamespace().encode('utf-8'), 'r' : 'on', 'go-clean' : 'Find+Categories', 'cl' : 'li'}) - commonsenseRe = re.compile('^#COMMONSENSE(.*)#USAGE(\s)+$(?P<usage>(\d)+)$(.*)#KEYWORDS(\s)+$(?P<keywords>(\d)+)$(.*)#CATEGORIES(\s)+$(?P<catnum>(\d)+)$\s(?P<cats>(.*))\s#GALLERIES(\s)+$(?P<galnum>(\d)+)$(.*)#EOF$', re.MULTILINE + re.DOTALL) +def getCommonshelperCats(imagepage): + ''' + Get category suggestions from commonshelper. Parse them and return a list of suggestions. + ''' + result = [] + parameters = urllib.urlencode({'i' : imagepage.titleWithoutNamespace().encode('utf-8'), 'r' : 'on', 'go-clean' : 'Find+Categories', 'cl' : 'li'}) + commonsenseRe = re.compile('^#COMMONSENSE(.*)#USAGE(\s)+$(?P<usage>(\d)+)$(.*)#KEYWORDS(\s)+$(?P<keywords>(\d)+)$(.*)#CATEGORIES(\s)+$(?P<catnum>(\d)+)$\s(?P<cats>(.*))\s#GALLERIES(\s)+$(?P<galnum>(\d)+)$(.*)#EOF$', re.MULTILINE + re.DOTALL) - gotInfo = False; + gotInfo = False; - while(not gotInfo): - try: - commonsHelperPage = urllib.urlopen("http://toolserver.org/~daniel/WikiSense/CommonSense.php?%s" % parameters) - matches = commonsenseRe.search(commonsHelperPage.read().decode('utf-8')) - gotInfo = True; - except IOError: - wikipedia.output(u'Got an IOError, let\'s try again') - except socket.timeout: - wikipedia.output(u'Got a timeout, let\'s try again') + while(not gotInfo): + try: + commonsHelperPage = urllib.urlopen("http://toolserver.org/~daniel/WikiSense/CommonSense.php?%s" % parameters) + matches = commonsenseRe.search(commonsHelperPage.read().decode('utf-8')) + gotInfo = True + except IOError: + wikipedia.output(u'Got an IOError, let\'s try again') + except socket.timeout: + wikipedia.output(u'Got a timeout, let\'s try again') - if matches: - if(matches.group('catnum') > 0): - return matches.group('cats').splitlines() - else: - return [] - - def filterCats(self, currentCats, commonshelperCats): - ''' - Remove the current categories from the suggestions and remove blacklisted cats. - ''' - result = [] - toFilter = "" + if matches: + if(matches.group('catnum') > 0): + categories = matches.group('cats').splitlines() + for cat in categories: + result.append(cat.replace('_',' ')) + + return list(set(result)) - for cat in currentCats: - cat = cat.replace('_',' ') - toFilter = toFilter + "[[Category:" + cat + "]]\n" - for cat in commonshelperCats: - cat = cat.replace('_',' ') - toFilter = toFilter + "[[Category:" + cat + "]]\n" - parameters = urllib.urlencode({'source' : toFilter.encode('utf-8'), 'bot' : '1'}) - filterCategoriesPage = urllib.urlopen("http://toolserver.org/~multichill/filtercats.php?%s" % parameters) - #print filterCategoriesPage.read().decode('utf-8') - filterCategoriesRe = re.compile('\[\[Category:([^\]]*)\]\]') - result = filterCategoriesRe.findall(filterCategoriesPage.read().decode('utf-8')) - #print matches - ''' - if matches: - print "Found matches" - if(matches.group('cats') > 0): - print matches.group('cats').splitlines() - ''' - ''' - - #currentCatsSet = set(currentCats) - for cat in commonshelperCats: - cat = cat.replace('_',' ') - if (cat not in currentCatsSet) and (cat not in category_blacklist): - result.append(cat) - ''' - return list(set(result)) - - def getImage(self, imagepage): - ''' - Get the image from the wiki - ''' - url = imagepage.fileUrl() - uo = wikipedia.MyURLopener() - - file = uo.open(url) - if 'text/html' in file.info().getheader('Content-Type'): - wikipedia.output(u'Couldn\'t download the image: the requested URL was not found on this server.') - return - - image = file.read() - file.close() - - return image +def filterBlacklist(categories): + result = [] + for cat in categories: + if (cat not in category_blacklist): + result.append(cat) + return list(set(result)) -class putThread (threading.Thread): + +def filterDisambiguation(categories): + result = [] + return result + + +def filterRedirects(categories): + result = [] + return result + + +def filterCountries(categories): + result = [] + return result + + +def filterParents(categories): ''' - class to do the actual changing of images + Remove the current categories from the suggestions and remove blacklisted cats. ''' - def __init__ (self, userToPutQueue): - self.userToPutQueue = userToPutQueue - self.item = None - self.imagepage = None - self.newcats = [] - self.newtext = u'' - threading.Thread.__init__ ( self ) - - def run(self): + result = [] + toFilter = u'' - while True: - self.item = self.userToPutQueue.get() - if self.item is None: - break - else: - (self.imagepage, self.newcats)=self.item - self.newtext = wikipedia.removeCategoryLinks(self.imagepage.get(), self.imagepage.site()) - self.newtext = self.removeUncat(self.newtext) + u'{{subst:chc}}\n' - for category in self.newcats: - self.newtext = self.newtext + u'[[Category:' + category + u']]\n' - - wikipedia.showDiff(self.imagepage.get(), self.newtext) - #Should change this for not autonomous operation. - self.imagepage.put(self.newtext, u'Image is categorized by a bot using data from [[Commons:Tools#CommonSense|CommonSense]]') - return - def removeUncat(self, oldtext = u''): - result = u'' - result = re.sub(u'\{\{\s*([Uu]ncat(egori[sz]ed( image)?)?|[Nn]ocat|[Nn]eedscategory)[^}]*\}\}', u'', oldtext) - result = re.sub(u'', u'', result) - #wikipedia.showDiff(oldtext, result) - return result + for cat in categories: + cat = cat.replace('_',' ') + toFilter = toFilter + "[[Category:" + cat + "]]\n" + #try: + parameters = urllib.urlencode({'source' : toFilter.encode('utf-8'), 'bot' : '1'}) + filterCategoriesPage = urllib.urlopen("http://toolserver.org/~multichill/filtercats.php?%s" % parameters) + #print filterCategoriesPage.read().decode('utf-8') + filterCategoriesRe = re.compile('\[\[Category:([^\]]*)\]\]') + result = filterCategoriesRe.findall(filterCategoriesPage.read().decode('utf-8')) + #except: + + return result + + +def saveImagePage(imagepage, newcats): + newtext = wikipedia.removeCategoryLinks(imagepage.get(), imagepage.site()) + newtext = removeTemplates(newtext) + u'{{subst:chc}}\n' + for category in newcats: + newtext = newtext + u'[[Category:' + category + u']]\n' + wikipedia.showDiff(imagepage.get(), newtext) + #imagepage.put(newtext, u'Image is categorized by a bot using data from [[Commons:Tools#CommonSense|CommonSense]]') + return + + +def removeTemplates(oldtext = u''): + result = u'' + result = re.sub(u'\{\{\s*([Uu]ncat(egori[sz]ed( image)?)?|[Nn]ocat|[Nn]eedscategory)[^}]*\}\}', u'', oldtext) + result = re.sub(u'', u'', result) + result = re.sub(u'\{\{\s*[Cc]heck categories[^}]*\}\}', u'', oldtext) + return result + + def main(args): ''' Main loop. Get a generator. Set up the 3 threads and the 2 queue's and fire everything up. @@ -218,7 +150,7 @@ generator = None; genFactory = pagegenerators.GeneratorFactory() - #global autonomous + site = wikipedia.getSite(u'commons', u'commons') wikipedia.setSite(site) for arg in wikipedia.handleArgs(): @@ -227,32 +159,15 @@ generator = [wikipedia.Page(site, wikipedia.input(u'What page do you want to use?'))] else: generator = [wikipedia.Page(site, arg[6:])] - elif arg == '-autonomous': - autonomous = True else: generator = genFactory.handleArg(arg) if not generator: - generator = pagegenerators.CategorizedPageGenerator(catlib.Category(site, u'Category:Media needing categories')) - #raise add_text.NoEnoughData('You have to specify the generator you want to use for the script!') + generator = pagegenerators.CategorizedPageGenerator(catlib.Category(site, u'Category:Media needing categories'), recurse=True) + categorizeImages(generator) + + wikipedia.output(u'All done') - prefetchToPutQueue=Queue.Queue() - - # Start the prefetch thread - prefetchThread(generator, prefetchToPutQueue).start() - - # Start the user thread - # userThread(prefetchToUserQueue, userToPutQueue).start() - - # Start the put thread - putThread(prefetchToPutQueue).start() - - # Wait for all threads to finish - for openthread in threading.enumerate(): - if openthread != threading.currentThread(): - openthread.join() - wikipedia.output(u'All threads are done') - if __name__ == "__main__": try: main(sys.argv[1:])

1 0

SVN: [5815] trunk/pywikipedia/wikipedia.py
by nicdumz＠svn.wikimedia.org 19 Aug '08

19 Aug '08

Revision: 5815 Author: nicdumz Date: 2008-08-19 12:05:23 +0000 (Tue, 19 Aug 2008) Log Message: ----------- Let's keep Unicode(De|En)codeError - do not raise another Error instead - but append to the Error messages custom reasons for user-friendliness Modified Paths: -------------- trunk/pywikipedia/wikipedia.py Modified: trunk/pywikipedia/wikipedia.py =================================================================== --- trunk/pywikipedia/wikipedia.py 2008-08-19 11:46:52 UTC (rev 5814) +++ trunk/pywikipedia/wikipedia.py 2008-08-19 12:05:23 UTC (rev 5815) @@ -1300,13 +1300,15 @@ """Encode an ascii string/Unicode string to the site's encoding""" try: return arg.encode(self.site().encoding()) - except UnicodeDecodeError: + except UnicodeDecodeError, e: # happens when arg is a non-ascii bytestring : # when reencoding bytestrings, python decodes first to ascii - raise PageNotSaved("An ascii string or unicode %s is expected" % msgForError) - except UnicodeEncodeError: + e.reason += ' (cannot convert input %s string to unicode)' % msgForError + raise e + except UnicodeEncodeError, e: # happens when arg is unicode - raise PageNotSaved("The %s could not be converted to the site's encoding (%s)" % (msgForError, self.site().encoding())) + e.reason += ' (cannot convert %s to wiki encoding %s)' % (msgForError, self.site().encoding()) + raise e def _putPage(self, text, comment=None, watchArticle=False, minorEdit=True, newPage=False, token=None, newToken=False, sysop=False,

1 0

← Newer
1
...
9
10
11
12
13
14
15
...
21
Older →

Jump to page:

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

Pywikipedia-l August 2008