Pywikipedia-l August 2007

pywikipedia-l@lists.wikimedia.org

26 participants
318 discussions

by valhallasw＠svn.wikimedia.org

Revision: 3953 Author: valhallasw Date: 2007-08-02 15:18:17 +0000 (Thu, 02 Aug 2007) Log Message: ----------- New version, now tests for <div id="mw-subcategories"> and, if not found, for <div id="mw-pages">. *** THIS MAY BREAK SUPPORT FOR OLDER VERSIONS OF MEDIAWIKI *** Modified Paths: -------------- trunk/pywikipedia/catlib.py Modified: trunk/pywikipedia/catlib.py =================================================================== --- trunk/pywikipedia/catlib.py 2007-08-02 15:07:16 UTC (rev 3952) +++ trunk/pywikipedia/catlib.py 2007-08-02 15:18:17 UTC (rev 3953) @@ -195,8 +195,14 @@ # save a copy of this text to find out self's supercategory. self_txt = txt # index where subcategory listing begins - # this only works for the current version of the MonoBook skin - ibegin = txt.index('Saved in parser cache') + try: + ibegin = txt.index('<div id="mw-subcategories">') + except ValueError: + try: + ibegin = txt.index('<div id="mw-pages">') + except ValueError: + wikipedia.output("\nCategory page detection is not bug free. Please report this error!") + raise # index where article listing ends try: iend = txt.index('<div class="printfooter">')

16 years, 9 months

SVN: [3952] trunk/pywikipedia/families/wikipedia_family.py

by valhallasw＠svn.wikimedia.org

Revision: 3952 Author: valhallasw Date: 2007-08-02 15:07:16 +0000 (Thu, 02 Aug 2007) Log Message: ----------- disambcatname['ca'] updated to Viquip?\195?\168dia-fake namespace. Byte order marker added Modified Paths: -------------- trunk/pywikipedia/families/wikipedia_family.py Modified: trunk/pywikipedia/families/wikipedia_family.py =================================================================== --- trunk/pywikipedia/families/wikipedia_family.py 2007-08-02 14:56:28 UTC (rev 3951) +++ trunk/pywikipedia/families/wikipedia_family.py 2007-08-02 15:07:16 UTC (rev 3952) @@ -1,4 +1,4 @@ -# -*- coding: utf-8 -*- +# -*- coding: utf-8 -*- import urllib import family, config @@ -456,7 +456,7 @@ 'be': u'Disambig', 'be-x-old': u'Вікіпэдыя:Неадназначнасьці', 'bg': u'Пояснителни страници', - 'ca': u'Registre de pàginas de desambiguació', + 'ca': u'Viquipèdia:Registre de pàgines de desambiguació', 'cs': u'Rozcestníky', 'cy': u'Gwahaniaethu', 'da': u'Flertdig',

16 years, 9 months

SVN: [3951] trunk/pywikipedia/wikipedia.py

by wikipedian＠svn.wikimedia.org

Revision: 3951 Author: wikipedian Date: 2007-08-02 14:56:28 +0000 (Thu, 02 Aug 2007) Log Message: ----------- heavily simplified Page.replaceImage() Modified Paths: -------------- trunk/pywikipedia/wikipedia.py Modified: trunk/pywikipedia/wikipedia.py =================================================================== --- trunk/pywikipedia/wikipedia.py 2007-08-02 12:11:29 UTC (rev 3950) +++ trunk/pywikipedia/wikipedia.py 2007-08-02 14:56:28 UTC (rev 3951) @@ -1,4 +1,4 @@ -# -*- coding: utf-8 -*- +## -*- coding: utf-8 -*- """ Library to get and put pages on a MediaWiki. @@ -2003,62 +2003,32 @@ return ur'(?:[%s%s]%s)' % (s[0].upper(), s[0].lower(), s[1:]) def create_regex_i(s): return ur'(?:%s)' % u''.join([u'[%s%s]' % (c.upper(), c.lower()) for c in s]) - + namespaces = ('Image', 'Media') + site.namespace(6, all = True) + site.namespace(-2, all = True) + # note that the colon is already included here r_namespace = ur'\s*(?:%s)\s*\:\s*' % u'|'.join(map(create_regex_i, namespaces)) r_image = u'(%s)' % create_regex(image).replace(r'\_', '[ _]') - def simple_replacer(match): + def simple_replacer(match, groupNumber = 1): if replacement == None: return u'' else: groups = list(match.groups()) - groups[1] = replacement + groups[groupNumber] = replacement return u''.join(groups) - - # Previously links in image descriptions will cause - # unexpected behaviour: [[Image:image.jpg|thumb|[[link]] in description]] - # will truncate at the first occurence of ]]. This cannot be - # fixed using one regular expression. - # This means that all ]] after the start of the image - # must be located. If it then does not have an associated - # [[, this one is the closure of the image. - - r_simple_s = u'(\[\[%s)%s' % (r_namespace, r_image) - r_s = '\[\[' - r_e = '\]\]' - # First determine where wikilinks start and end - image_starts = [match.start() for match in re.finditer(r_simple_s, text)] - link_starts = [match.start() for match in re.finditer(r_s, text)] - link_ends = [match.end() for match in re.finditer(r_e, text)] - - r_simple = u'(\[\[%s)%s(.*)' % (r_namespace, r_image) - replacements = [] - for image_start in image_starts: - current_link_starts = [link_start for link_start in link_starts - if link_start > image_start] - current_link_ends = [link_end for link_end in link_ends - if link_end > image_start] - end = image_start - if current_link_ends: end = current_link_ends[0] - - while current_link_starts and current_link_ends: - start = current_link_starts.pop(0) - end = current_link_ends.pop(0) - if end <= start and end > image_start: - # Found the end of the image - break - - # Add the replacement to the todo list. Doing the - # replacement right know would alter the indices. - replacements.append((new_text[image_start:end], - re.sub(r_simple, simple_replacer, - new_text[image_start:end]))) - - # Perform the replacements - for old, new in replacements: - if old: new_text = new_text.replace(old, new) - + + # The group params contains parameters such as thumb and 200px, as well + # as the image caption. The caption can contain wiki links, but each + # link has to be closed properly. + r_param = r'(?:\|(?:(?!\[\[).|\[\[.*?\]\])*?)' + rImage = re.compile(ur'(\[\[)(?P<namespace>%s)%s(?P<params>%s*?)(\]\])' % (r_namespace, r_image, r_param)) + + while True: + m = rImage.search(new_text) + if not m: + break + new_text = new_text[:m.start()] + simple_replacer(m, 2) + new_text[m.end():] + # Remove the image from galleries r_galleries = ur'(?s)(\<%s\>)(?s)(.*?)(\<\/%s\>)' % (create_regex_i('gallery'), create_regex_i('gallery'))

16 years, 9 months

SVN: [3950] trunk/pywikipedia/wikipedia.py

by valhallasw＠svn.wikimedia.org

Revision: 3950 Author: valhallasw Date: 2007-08-02 12:11:29 +0000 (Thu, 02 Aug 2007) Log Message: ----------- bugfix: the output buffer now actually gets cleared. Changed break to return in the put_async waiting routine ^c-handler. Modified Paths: -------------- trunk/pywikipedia/wikipedia.py Modified: trunk/pywikipedia/wikipedia.py =================================================================== --- trunk/pywikipedia/wikipedia.py 2007-08-02 10:26:31 UTC (rev 3949) +++ trunk/pywikipedia/wikipedia.py 2007-08-02 12:11:29 UTC (rev 3950) @@ -4491,12 +4491,20 @@ logfile.write(text + '\n') logfile.flush() if input_lock.locked(): - output_cache.append(((text,), {'colors': colors, 'newline': newline, 'toStdout': toStdout})) + cache_output(text, colors = colors, newline = newline, toStdout = toStdout) else: ui.output(text, colors = colors, newline = newline, toStdout = toStdout) finally: output_lock.release() +def cache_output(*args, **kwargs): + output_cache.append((args, kwargs)) + +def flush_output_cache(): + while(output_cache): + (args, kwargs) = output_cache.pop(0) + ui.output(*args, **kwargs) + def input(question, colors = None, password = False): """ Asks the user a question, then returns the user's answer. @@ -4513,15 +4521,12 @@ input_lock.acquire() try: data = ui.input(question, colors, password) - finally: - for output in output_cache: - ui.output(*output[0], **output[1]) + finally: + flush_output_cache() input_lock.release() - for output in output_cache: #for output added between the start of the for loop and the lock release - ui.output(*output[0], **output[1]) - + return data - + def inputChoice(question, answers, hotkeys, default = None): """ Asks the user a question and offers several options, then returns the @@ -4543,12 +4548,9 @@ try: data = ui.inputChoice(question, answers, hotkeys, default).lower() finally: - for output in output_cache: - ui.output(*output[0], **output[1]) + flush_output_cache() input_lock.release() - for output in output_cache: #for output added between the start of the for loop and the lock release - ui.output(*output[0], **output[1]) - + return data def showHelp(moduleName = None): @@ -4646,7 +4648,7 @@ % (page_put_queue.qsize(), datetime.timedelta(seconds=(page_put_queue.qsize()) * config.put_throttle)), ['yes', 'no'], ['y', 'N'], 'N') if answer in ['y', 'Y']: - break + return import atexit atexit.register(_flush)

16 years, 9 months

SVN: [3949] trunk/pywikipedia/solve_disambiguation.py

by valhallasw＠svn.wikimedia.org

Revision: 3949 Author: valhallasw Date: 2007-08-02 10:26:31 +0000 (Thu, 02 Aug 2007) Log Message: ----------- Updated: no more finally with 'please wait' message needed; this is now handled by wikipedia.py Modified Paths: -------------- trunk/pywikipedia/solve_disambiguation.py Modified: trunk/pywikipedia/solve_disambiguation.py =================================================================== --- trunk/pywikipedia/solve_disambiguation.py 2007-08-02 10:25:38 UTC (rev 3948) +++ trunk/pywikipedia/solve_disambiguation.py 2007-08-02 10:26:31 UTC (rev 3949) @@ -899,11 +899,10 @@ generator = iter([page]) bot = DisambiguationRobot(always, alternatives, getAlternatives, generator, primary, main_only) - try: - bot.run() - finally: - wikipedia.output(u'\n\nPlease wait for the asynchronous page edits to finish...') + bot.run() + + if __name__ == "__main__": try: main()

16 years, 9 months

SVN: [3948] trunk/pywikipedia/catlib.py

by valhallasw＠svn.wikimedia.org

Revision: 3948 Author: valhallasw Date: 2007-08-02 10:25:38 +0000 (Thu, 02 Aug 2007) Log Message: ----------- bugfix: category.articles(startFrom) now passes startFrom to the correct parameter of _getContentsAndSupercats Modified Paths: -------------- trunk/pywikipedia/catlib.py Modified: trunk/pywikipedia/catlib.py =================================================================== --- trunk/pywikipedia/catlib.py 2007-08-02 01:15:39 UTC (rev 3947) +++ trunk/pywikipedia/catlib.py 2007-08-02 10:25:38 UTC (rev 3948) @@ -1,4 +1,4 @@ -#!/usr/bin/python +#!/usr/bin/python # -*- coding: utf-8 -*- """ Library to work with category pages on Wikipedia @@ -295,7 +295,7 @@ Results are unsorted (except as sorted by MediaWiki), and need not be unique. """ - for tag, page in self._getContentsAndSupercats(recurse, startFrom): + for tag, page in self._getContentsAndSupercats(recurse, startFrom=startFrom): if tag == ARTICLE: yield page

16 years, 9 months

SVN: [3947] trunk/pywikipedia/weblinkchecker.py

by wikipedian＠svn.wikimedia.org

Revision: 3947 Author: wikipedian Date: 2007-08-02 01:15:39 +0000 (Thu, 02 Aug 2007) Log Message: ----------- Fixed serious bug: The bot didn't report to talk pages anymore unless the URL was found in the Internet Archive. I wonder that nobody noticed. Modified Paths: -------------- trunk/pywikipedia/weblinkchecker.py Modified: trunk/pywikipedia/weblinkchecker.py =================================================================== --- trunk/pywikipedia/weblinkchecker.py 2007-08-02 00:48:21 UTC (rev 3946) +++ trunk/pywikipedia/weblinkchecker.py 2007-08-02 01:15:39 UTC (rev 3947) @@ -550,13 +550,13 @@ self.semaphore.acquire() (url, errorReport, containingPage, archiveURL) = self.queue[0] self.queue = self.queue[1:] - message = u'** Reporting dead link on ' + containingPage.toggleTalkPage().aslink() + '...' + talkPage = containingPage.toggleTalkPage() + message = u'** Reporting dead link on ' + talkPage.aslink() + '...' wikipedia.output(message, colors = [11] * len(message)) - talk = containingPage.toggleTalkPage() try: - content = talk.get() + "\n\n" + content = talkPage.get() + "\n\n" if url in content: - message = u'** Dead link seems to have already been reported on ' + containingPage.toggleTalkPage().aslink() + '.' + message = u'** Dead link seems to have already been reported on ' + talkPage.aslink() + '.' wikipedia.output(message, colors = [11] * len(message)) self.semaphore.release() continue @@ -565,11 +565,13 @@ if archiveURL: archiveMsg = wikipedia.translate(wikipedia.getSite(), talk_report_archive) % archiveURL - content += wikipedia.translate(wikipedia.getSite(), talk_report) % (errorReport, archiveMsg) + else: + archiveMsg = u'' + content += wikipedia.translate(wikipedia.getSite(), talk_report) % (errorReport, archiveMsg) try: - talk.put(content) + talkPage.put(content) except wikipedia.SpamfilterError, error: - message = u'** SpamfilterError while trying to change %s: %s' % (containingPage.toggleTalkPage().aslink(), error.url) + message = u'** SpamfilterError while trying to change %s: %s' % (talkPage.aslink(), error.url) wikipedia.output(message, colors = [11] * len(message)) self.semaphore.release()

16 years, 9 months

SVN: [3946] trunk/pywikipedia/weblinkchecker.py

by wikipedian＠svn.wikimedia.org

Revision: 3946 Author: wikipedian Date: 2007-08-02 00:48:21 +0000 (Thu, 02 Aug 2007) Log Message: ----------- improved error messages: * prevent flooding talk pages by putting [ ] around links when reporting redirect chains/loops * don't give useless error codes on socket errors * give better message on BadStatusLine errors fixed encoding crash when consulting internet archive Modified Paths: -------------- trunk/pywikipedia/weblinkchecker.py Modified: trunk/pywikipedia/weblinkchecker.py =================================================================== --- trunk/pywikipedia/weblinkchecker.py 2007-08-02 00:41:50 UTC (rev 3945) +++ trunk/pywikipedia/weblinkchecker.py 2007-08-02 00:48:21 UTC (rev 3946) @@ -153,6 +153,8 @@ # The Internet Archive yields a 403 error when the site was not # archived due to robots.txt restrictions. return None + except UnicodeEncodeError: + return None text = f.read() if text.find("Search Results for ") != -1: return archiveURL @@ -201,7 +203,6 @@ def getEncodingUsedByServer(self): if not self.serverEncoding: try: - print conn.__dict__ wikipedia.output(u'Contacting server %s to find out its default encoding...' % self.conn) conn = self.getConnection() conn.request('HEAD', '/', None, self.header) @@ -312,11 +313,14 @@ try: wasRedirected = self.resolveRedirect(useHEAD = useHEAD) except UnicodeError, arg: - return False, u'Encoding Error: %s' % arg + return False, u'Encoding Error: %s (%s)' % (arg.__class__.__name__, unicode(arg)) except httplib.error, arg: - return False, u'HTTP Error: %s' % arg + return False, u'HTTP Error: %s (%s)' % (arg.__class__.__name__, arg.line) except socket.error, arg: - return False, u'Socket Error: %s' % arg + # TODO: decode arg[1]. On Linux, it's encoded in UTF-8. + # How is it encoded in Windows? Or can we somehow just + # get the English message? + return False, u'Socket Error: %s' % arg[1] #except UnicodeEncodeError, arg: # return False, u'Non-ASCII Characters in URL: %s' % arg if wasRedirected: @@ -329,7 +333,8 @@ redirChecker = LinkChecker(self.redirectChain[0], serverEncoding = self.serverEncoding) return redirChecker.check(useHEAD = False) else: - return False, u'HTTP Redirect Loop: %s' % ' -> '.join(self.redirectChain + [self.url]) + urlList = ['[%s]' % url for url in self.redirectChain + [self.url]] + return False, u'HTTP Redirect Loop: %s' % ' -> '.join(urlList) elif len(self.redirectChain) >= 19: if useHEAD: # Some servers don't seem to handle HEAD requests properly, @@ -339,7 +344,8 @@ redirChecker = LinkChecker(self.redirectChain[0], serverEncoding = self.serverEncoding) return redirChecker.check(useHEAD = False) else: - return False, u'Long Chain of Redirects: %s' % ' -> '.join(self.redirectChain + [self.url]) + urlList = ['[%s]' % url for url in self.redirectChain + [self.url]] + return False, u'Long Chain of Redirects: %s' % ' -> '.join(urlList) else: redirChecker = LinkChecker(self.url, self.redirectChain, self.serverEncoding) return redirChecker.check(useHEAD = useHEAD) @@ -347,13 +353,13 @@ try: conn = self.getConnection() except httplib.error, arg: - return False, u'HTTP Error: %s' % arg + return False, u'HTTP Error: %s (%s)' % (arg.__class__.__name__, arg.line) try: conn.request('GET', '%s%s' % (self.path, self.query), None, self.header) except socket.error, arg: - return False, u'Socket Error: %s' % arg - except UnicodeEncodeError, arg: - return False, u'Non-ASCII Characters in URL: %s' % arg + return False, u'Socket Error: %s' % arg[1] + #except UnicodeEncodeError, arg: + # return False, u'Non-ASCII Characters in URL: %s' % arg try: response = conn.getresponse() except Exception, arg:

16 years, 9 months

SVN: [3945] trunk/pywikipedia/wikipedia.py

by valhallasw＠svn.wikimedia.org

Revision: 3945 Author: valhallasw Date: 2007-08-02 00:41:50 +0000 (Thu, 02 Aug 2007) Log Message: ----------- Updated async put: added estimate of remaining time when quitting; KeyboardInterrupts are caught by joining the thread only one second at a time (instead of indefinitly) Modified Paths: -------------- trunk/pywikipedia/wikipedia.py Modified: trunk/pywikipedia/wikipedia.py =================================================================== --- trunk/pywikipedia/wikipedia.py 2007-08-01 23:41:57 UTC (rev 3944) +++ trunk/pywikipedia/wikipedia.py 2007-08-02 00:41:50 UTC (rev 3945) @@ -4539,7 +4539,17 @@ Returns a one-letter string in lowercase. """ - return ui.inputChoice(question, answers, hotkeys, default).lower() + input_lock.acquire() + try: + data = ui.inputChoice(question, answers, hotkeys, default).lower() + finally: + for output in output_cache: + ui.output(*output[0], **output[1]) + input_lock.release() + for output in output_cache: #for output added between the start of the for loop and the lock release + ui.output(*output[0], **output[1]) + + return data def showHelp(moduleName = None): # the parameter moduleName is deprecated and should be left out. @@ -4581,7 +4591,6 @@ output(u'Sorry, no help available for %s' % moduleName) page_put_queue = Queue.Queue() - def async_put(): ''' Daemon that takes pages from the queue and tries to save them on the wiki. @@ -4622,8 +4631,22 @@ '''Wait for the page-putter to flush its queue; called automatically upon exiting from Python. ''' + if page_put_queue.qsize() > 0: + import datetime + remaining = datetime.timedelta(seconds=(page_put_queue.qsize()+1) * config.put_throttle) + output('Waiting for %i pages to be put. Estimated time remaining: %s' % (page_put_queue.qsize()+1, remaining)) + page_put_queue.put((None, None, None, None, None)) - _putthread.join() + + while(_putthread.isAlive()): + try: + _putthread.join(1) + except KeyboardInterrupt: + answer = inputChoice(u'There are %i pages remaining in the queue. Estimated time remaining: %s\nReally exit?' + % (page_put_queue.qsize(), datetime.timedelta(seconds=(page_put_queue.qsize()) * config.put_throttle)), + ['yes', 'no'], ['y', 'N'], 'N') + if answer in ['y', 'Y']: + break import atexit atexit.register(_flush)

16 years, 9 months

[Fwd: Re: Subversion]

by Merlijn van Deen

---------------------------- Original Message ---------------------------- Subject: Re: [Pywikipedia-l] Subversion From: "Merlijn van Deen" <valhallasw(a)arctus.nl> Date: Thu, August 2, 2007 1:57 am To: "Andre Engels" <andreengels(a)gmail.com> -------------------------------------------------------------------------- linux: make sure your private key is in .ssh/id_rsa svn co svn+ssh://svn.mediawiki.org/svnroot/pywikipedia/trunk/pywikipedia pywikipedia windows: download TortoiseSVN and putty. Run putty, create a new session with these settings: host name: svn.mediawiki.org session name: svn.mediawiki.org connection/data/username: a_engels connection/data/SSH/auth/private key: (select your private key) save this session Then use tortoiseSVN to checkout svn+ssh://svn.mediawiki.org/svnroot/pywikipedia/trunk/pywikipedia --valhallasw Note to self: learn the 'reply to all' button?

16 years, 9 months

Jump to page:

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

Pywikipedia-l August 2007