pywikibot October 2007

pywikibot@lists.wikimedia.org

24 participants
197 discussions

[Pywikipedia-l] [ pywikipediabot-Patches-1814580 ] save page generator output
by SourceForge.net 17 Oct '07

17 Oct '07

Patches item #1814580, was opened at 2007-10-16 18:36 Message generated for change (Comment added) made by wikipedian You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=603140&aid=1814580&group_… Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. Category: None Group: None Status: Open Resolution: None Priority: 5 Private: No Submitted By: John Vandenberg (zeroj) Assigned to: Nobody/Anonymous (nobody) Summary: save page generator output Initial Comment: When calling pagegenerators.py directly from the command line, it is handy to save the output for post-processing or use as input into replace.py -file page generator. This patch provides a -output argument. As this patch does not handle unicode page names, it needs improvement before it should be committed. ---------------------------------------------------------------------- >Comment By: Daniel Herding (wikipedian) Date: 2007-10-17 11:10 Message: Logged In: YES user_id=880694 Originator: NO Is this really needed? I mean, you can do this: daniel@localhost:~/projekte/pywikipedia> python pagegenerators.py -ref:Wikipedia:Pywikipediabot > output.txt Checked for running processes. 2 processes currently running, including the current process. Getting references to [[Wikipedia:Pywikipediabot]] daniel@localhost:~/projekte/pywikipedia> cat output.txt Benutzer:Head Wikipedia:Selbstlinks Benutzer:Zwobot Benutzer Diskussion:Waugsberg/Archiv/2006-8 Wikipedia:Bots [etc.] ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=603140&aid=1814580&group_…

1 0

[Pywikipedia-l] SVN: [4439] trunk/pywikiparser
by cosoleto＠svn.wikimedia.org 17 Oct '07

17 Oct '07

Revision: 4439 Author: cosoleto Date: 2007-10-10 09:39:51 +0000 (Wed, 10 Oct 2007) Log Message: ----------- Disabled debug messages by default. Added __init__.py file. Modified Paths: -------------- trunk/pywikiparser/Parser.py Added Paths: ----------- trunk/pywikiparser/__init__.py Modified: trunk/pywikiparser/Parser.py =================================================================== --- trunk/pywikiparser/Parser.py 2007-10-10 09:34:00 UTC (rev 4438) +++ trunk/pywikiparser/Parser.py 2007-10-10 09:39:51 UTC (rev 4439) @@ -1,4 +1,4 @@ -# -*- coding: utf-8 -*- +# -*- coding: utf-8 -*- """ Mediawiki wikitext parser """ # # (C) 2007 Merlijn 'valhallasw' van Deen @@ -15,11 +15,20 @@ from Lexer import Lexer, Tokens +_debug = False + +def dbgmsg(text): + if _debug: + print 'debug> ' + text + class ParseError(Exception): """ Parsing Error """ class Parser: - def __init__(self, data): + def __init__(self, data, debug = False): + global _debug + _debug = debug + self.lex = BufferedReader(Lexer(data).lexer()) def expect(self, tokens): @@ -55,7 +64,7 @@ break node = self.parsetoken(token, restore) - print "Adding %r (was %r)" % (node,token) + dbgmsg("Adding %r (was %r)" % (node,token)) self.par.extend(node) restore = self.lex.commit(restore) @@ -105,7 +114,7 @@ newitalic = not self.italic newbold = not self.bold - print 'bold: %r>%r italic: %r>%r' % (self.bold, newbold, self.italic, newitalic) + dbgmsg('bold: %r>%r italic: %r>%r' % (self.bold, newbold, self.italic, newitalic)) if self.italic and not newitalic: if self.par.name == 'i' or not newbold: self.par = self.par.parent @@ -283,7 +292,7 @@ self.expect(Tokens.CURL_OPEN) self.expect(Tokens.CURL_OPEN) pre = self.eat(Tokens.CURL_OPEN) - print 'pre: ' + pre + dbgmsg('pre: ' + pre) if pre: retval.append(pre) @@ -302,7 +311,7 @@ raise ParseError("Needs implementation") def parseWikitable(self): - raise ParseError("Needs implementation") + raise ParseError("Needs implementation") titlere = re.compile(r"[^\^\]<>\[\|\{\}\n]*$") def parseTitle(self, closetoken): @@ -314,7 +323,7 @@ elif next[0] == Tokens.CURL_OPEN: # allow templates to expand restore = self.lex.getrestore() data = self.parseCURL_OPEN(restore) - print 'Parsed template: %r' % (data,) + dbgmsg('Parsed template: %r' % (data,)) for item in data: if isinstance(item, basestring): if not self.titlere.match(item): @@ -326,5 +335,3 @@ raise ParseError('illegal wiki link') title.append(next[1]) return title - - \ No newline at end of file Added: trunk/pywikiparser/__init__.py =================================================================== --- trunk/pywikiparser/__init__.py (rev 0) +++ trunk/pywikiparser/__init__.py 2007-10-10 09:39:51 UTC (rev 4439) @@ -0,0 +1,10 @@ +# -*- coding: utf-8 -*- +""" Mediawiki wikitext parser """ +# +# (C) 2007 Merlijn 'valhallasw' van Deen +# +# Distributed under the terms of the MIT license. +# +__version__ = u'$Id$' + +from Parser import Parser

3 2

[Pywikipedia-l] SVN: [4457] trunk/pywikipedia/interwiki.py
by wikipedian＠svn.wikimedia.org 17 Oct '07

17 Oct '07

Revision: 4457 Author: wikipedian Date: 2007-10-17 09:01:52 +0000 (Wed, 17 Oct 2007) Log Message: ----------- applied patch [ 1807596 ] interwiki.py: Add Chinese Translation by Alex S.H. Lin - lin4h Modified Paths: -------------- trunk/pywikipedia/interwiki.py Modified: trunk/pywikipedia/interwiki.py =================================================================== --- trunk/pywikipedia/interwiki.py 2007-10-17 08:50:43 UTC (rev 4456) +++ trunk/pywikipedia/interwiki.py 2007-10-17 09:01:52 UTC (rev 4457) @@ -323,7 +323,8 @@ 'uk': (u'робот', u'додав', u'видалив', u'змінив'), 'vi': (u'robot ', u'Thêm', u'Dời', u'Thay'), 'vo': (u'bot ', u'läükon', u'moükon', u'votükon'), - } + 'zh': (u'機器人 ', u'正在新增', u'移除', u'修改'), +} class Global(object): """Container class for global settings.

1 0

[Pywikipedia-l] [ pywikipediabot-Patches-1807596 ] interwiki.py: Add Chinese Translation
by SourceForge.net 17 Oct '07

17 Oct '07

Patches item #1807596, was opened at 2007-10-04 17:49 Message generated for change (Comment added) made by wikipedian You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=603140&aid=1807596&group_… Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. Category: None Group: None >Status: Closed >Resolution: Accepted Priority: 5 Private: No Submitted By: Alex S.H. Lin (lin4h) Assigned to: Nobody/Anonymous (nobody) Summary: interwiki.py: Add Chinese Translation Initial Comment: interwiki.py Chinese Message Translation ---------------------------------------------------------------------- >Comment By: Daniel Herding (wikipedian) Date: 2007-10-17 11:00 Message: Logged In: YES user_id=880694 Originator: NO Patch applied, thank you. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=603140&aid=1807596&group_…

1 0

[Pywikipedia-l] [ pywikipediabot-Bugs-1809802 ] weblinkchecker.py inefficiently respects max_external_links
by SourceForge.net 17 Oct '07

17 Oct '07

Bugs item #1809802, was opened at 2007-10-08 23:34 Message generated for change (Comment added) made by wikipedian You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=603138&aid=1809802&group_… Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. Category: None Group: None >Status: Closed >Resolution: Fixed Priority: 5 Private: No Submitted By: Nobody/Anonymous (nobody) Assigned to: Nobody/Anonymous (nobody) Summary: weblinkchecker.py inefficiently respects max_external_links Initial Comment: I noticed while testing my new system that setting max_external_links to anything above 250 seems to be pointless, as 250 page names is hardcoded gen = pagegenerators.PreloadingGenerator(gen, pageNumber = 250) So if more than 250 threads were to be created, they would have nothing to do, because it seems a fresh batch of page names (one per thread) will only be fetched once all the previous 250 threads have finished (I could be wrong here). In that case, it'd be better to have a statement like gen = pagegenerators.PreloadingGenerator(gen, pageNumber = config.max_external_links) so you fetch at least as much or more page names than the current batch of threads need (I figure the more stored page names have, the less often it would need to wait for downloads). ie. something like: --- weblinkchecker.py 2007-10-08 17:15:09.000000000 -0400 +++ weblinkchecker.py.bak 2007-10-08 17:14:58.000000000 -0400 @@ -729,7 +729,7 @@ if gen: if namespaces != []: gen = pagegenerators.NamespaceFilterPageGenerator(gen, namespaces) - gen = pagegenerators.PreloadingGenerator(gen, pageNumber = (config.max_external_links * 2)) + gen = pagegenerators.PreloadingGenerator(gen, pageNumber = 260) gen = pagegenerators.RedirectFilterPageGenerator(gen) bot = WeblinkCheckerRobot(gen) try: ---------------------------------------------------------------------- >Comment By: Daniel Herding (wikipedian) Date: 2007-10-17 10:51 Message: Logged In: YES user_id=880694 Originator: NO I changed it to this: # fetch at least 240 pages simultaneously from the wiki, but more if # a high thread number is set. pageNumber = max(240, config.max_external_links * 2) gen = pagegenerators.PreloadingGenerator(gen, pageNumber = pageNumber) I think that should be OK for everyone. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=603138&aid=1809802&group_…

1 0

[Pywikipedia-l] SVN: [4456] trunk/pywikipedia/weblinkchecker.py
by wikipedian＠svn.wikimedia.org 17 Oct '07

17 Oct '07

Revision: 4456 Author: wikipedian Date: 2007-10-17 08:50:43 +0000 (Wed, 17 Oct 2007) Log Message: ----------- fixed bug [ 1809802 ] weblinkchecker.py inefficiently respects max_external_links Modified Paths: -------------- trunk/pywikipedia/weblinkchecker.py Modified: trunk/pywikipedia/weblinkchecker.py =================================================================== --- trunk/pywikipedia/weblinkchecker.py 2007-10-17 08:44:54 UTC (rev 4455) +++ trunk/pywikipedia/weblinkchecker.py 2007-10-17 08:50:43 UTC (rev 4456) @@ -729,7 +729,10 @@ if gen: if namespaces != []: gen = pagegenerators.NamespaceFilterPageGenerator(gen, namespaces) - gen = pagegenerators.PreloadingGenerator(gen, pageNumber = 240) + # fetch at least 240 pages simultaneously from the wiki, but more if + # a high thread number is set. + pageNumber = max(240, config.max_external_links * 2) + gen = pagegenerators.PreloadingGenerator(gen, pageNumber = pageNumber) gen = pagegenerators.RedirectFilterPageGenerator(gen) bot = WeblinkCheckerRobot(gen) try:

1 0

[Pywikipedia-l] [ pywikipediabot-Bugs-1811843 ] cosmetic_changes.py should not edit <nowiki>
by SourceForge.net 17 Oct '07

17 Oct '07

Bugs item #1811843, was opened at 2007-10-11 22:54 Message generated for change (Comment added) made by wikipedian You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=603138&aid=1811843&group_… Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. Category: None Group: None >Status: Closed >Resolution: Fixed Priority: 5 Private: No Submitted By: AnMaster (anmaster) Assigned to: Nobody/Anonymous (nobody) Summary: cosmetic_changes.py should not edit <nowiki> Initial Comment: <AnMaster> with cosmetic cleanup the bot sometimes edit too much: http://gentoo-wiki.com/index.php?title=HOWTO_AutoLiveCD&diff=117937&oldid=9… <AnMaster> - if [[ "$IF_UP" == "$IF_ALL" ]]; then + if [["$IF UP" == "$IF ALL"]] ; then <AnMaster> inside a <nowiki> that is a clear error ... later ... <AnMaster> where should I report this problem of cosmetic_cleanup? <Hojjat> You can report it on the bug tracker <Hojjat> Here is the link: <Hojjat> http://sourceforge.net/tracker/?group_id=93107 goto bugs sectin So here it is. This bug is very irritating. ---------------------------------------------------------------------- >Comment By: Daniel Herding (wikipedian) Date: 2007-10-17 10:45 Message: Logged In: YES user_id=880694 Originator: NO Fixed in SVN, thanks for your bug report. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=603138&aid=1811843&group_…

1 0

[Pywikipedia-l] SVN: [4455] trunk/pywikipedia/cosmetic_changes.py
by wikipedian＠svn.wikimedia.org 17 Oct '07

17 Oct '07

Revision: 4455 Author: wikipedian Date: 2007-10-17 08:44:54 +0000 (Wed, 17 Oct 2007) Log Message: ----------- fixed bug [ 1811843 ] cosmetic_changes.py should not edit <nowiki> Modified Paths: -------------- trunk/pywikipedia/cosmetic_changes.py Modified: trunk/pywikipedia/cosmetic_changes.py =================================================================== --- trunk/pywikipedia/cosmetic_changes.py 2007-10-17 08:42:41 UTC (rev 4454) +++ trunk/pywikipedia/cosmetic_changes.py 2007-10-17 08:44:54 UTC (rev 4455) @@ -158,25 +158,12 @@ return text def cleanUpLinks(self, text): - trailR = re.compile(self.site.linktrail()) - # The regular expression which finds links. Results consist of four groups: - # group title is the target page title, that is, everything before | or ]. - # group section is the page section. It'll include the # to make life easier for us. - # group label is the alternative link title, that's everything between | and ]. - # group linktrail is the link trail, that's letters after ]] which are part of the word. - # note that the definition of 'letter' varies from language to language. - self.linkR = re.compile(r'\[\[(?P<titleWithSection>[^\]\|]+)(\|(?P<label>[^\]\|]*))?\]\](?P<linktrail>' + self.site.linktrail() + ')') - curpos = 0 - # This loop will run until we have finished the current page - while True: - m = self.linkR.search(text, pos = curpos) - if not m: - break - # Make sure that next time around we will not find this same hit. - curpos = m.start() + 1 - titleWithSection = m.group('titleWithSection') - label = m.group('label') - trailingChars = m.group('linktrail') + # helper function which works on one link and either returns it + # unmodified, or returns a replacement. + def handleOneLink(match): + titleWithSection = match.group('titleWithSection') + label = match.group('label') + trailingChars = match.group('linktrail') if not self.site.isInterwikiLink(titleWithSection): # The link looks like this: @@ -210,7 +197,7 @@ if titleWithSection == '': # just skip empty links. - continue + return match.group() # Remove unnecessary initial and final spaces from label. # Please note that some editors prefer spaces around pipes. (See [[en:Wikipedia:Semi-bots]]). We remove them anyway. @@ -256,7 +243,20 @@ newLink = ' ' + newLink if hadTrailingSpaces: newLink = newLink + ' ' - text = text[:m.start()] + newLink + text[m.end():] + return newLink + # don't change anything + return match.group() + + trailR = re.compile(self.site.linktrail()) + # The regular expression which finds links. Results consist of four groups: + # group title is the target page title, that is, everything before | or ]. + # group section is the page section. It'll include the # to make life easier for us. + # group label is the alternative link title, that's everything between | and ]. + # group linktrail is the link trail, that's letters after ]] which are part of the word. + # note that the definition of 'letter' varies from language to language. + linkR = re.compile(r'\[\[(?P<titleWithSection>[^\]\|]+)(\|(?P<label>[^\]\|]*))?\]\](?P<linktrail>' + self.site.linktrail() + ')') + + text = wikipedia.replaceExcept(text, linkR, handleOneLink, ['comment', 'math', 'nowiki', 'pre', 'startspace']) return text def resolveHtmlEntities(self, text): @@ -273,7 +273,7 @@ return text def validXhtml(self, text): - text = wikipedia.replaceExcept(text, r'<br>', r'<br />', ['comment', 'nowiki', 'pre']) + text = wikipedia.replaceExcept(text, r'<br>', r'<br />', ['comment', 'math', 'nowiki', 'pre']) return text def removeUselessSpaces(self, text):

1 0

[Pywikipedia-l] SVN: [4454] trunk/pywikipedia/wikipedia.py
by wikipedian＠svn.wikimedia.org 17 Oct '07

17 Oct '07

Revision: 4454 Author: wikipedian Date: 2007-10-17 08:42:41 +0000 (Wed, 17 Oct 2007) Log Message: ----------- replaceExcept() now accepts a function as replacement parameter, like in re.sub(). Modified Paths: -------------- trunk/pywikipedia/wikipedia.py Modified: trunk/pywikipedia/wikipedia.py =================================================================== --- trunk/pywikipedia/wikipedia.py 2007-10-16 12:44:22 UTC (rev 4453) +++ trunk/pywikipedia/wikipedia.py 2007-10-17 08:42:41 UTC (rev 4454) @@ -2714,7 +2714,10 @@ Parameters: text - a unicode string old - a compiled regular expression - new - a unicode string + new - a unicode string (which can contain regular + expression references), or a function which takes + a match object as parameter. See parameter repl of + re.sub(). exceptions - a list of strings which signal what to leave out, e.g. ['math', 'table', 'template'] caseInsensitive - a boolean @@ -2805,23 +2808,29 @@ if sys.platform=='win32': new = new.replace('\\n', '\n') - # We cannot just insert the new string, as it may contain regex - # group references such as \2 or \g<name>. - # On the other hand, this approach does not work because it can't - # handle lookahead or lookbehind (see bug #1731008): - #replacement = old.sub(new, text[match.start():match.end()]) - #text = text[:match.start()] + replacement + text[match.end():] + try: + # the parameter new can be a function which takes the match as a parameter. + replacement = new(match) + except TypeError: + # it is not a function, but a string. - # So we have to process the group references manually. - replacement = new + # We cannot just insert the new string, as it may contain regex + # group references such as \2 or \g<name>. + # On the other hand, this approach does not work because it can't + # handle lookahead or lookbehind (see bug #1731008): + #replacement = old.sub(new, text[match.start():match.end()]) + #text = text[:match.start()] + replacement + text[match.end():] - groupR = re.compile(r'\\(?P<number>\d+)|\\g<(?P<name>.+?)>') - while True: - groupMatch = groupR.search(replacement) - if not groupMatch: - break - groupID = groupMatch.group('name') or int(groupMatch.group('number')) - replacement = replacement[:groupMatch.start()] + match.group(groupID) + replacement[groupMatch.end():] + # So we have to process the group references manually. + replacement = new + + groupR = re.compile(r'\\(?P<number>\d+)|\\g<(?P<name>.+?)>') + while True: + groupMatch = groupR.search(replacement) + if not groupMatch: + break + groupID = groupMatch.group('name') or int(groupMatch.group('number')) + replacement = replacement[:groupMatch.start()] + match.group(groupID) + replacement[groupMatch.end():] text = text[:match.start()] + replacement + text[match.end():] # continue the search on the remaining text

1 0

[Pywikipedia-l] [ pywikipediabot-Patches-1814580 ] save page generator output
by SourceForge.net 16 Oct '07

16 Oct '07

Patches item #1814580, was opened at 2007-10-17 02:36 Message generated for change (Tracker Item Submitted) made by Item Submitter You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=603140&aid=1814580&group_… Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. Category: None Group: None Status: Open Resolution: None Priority: 5 Private: No Submitted By: John Vandenberg (zeroj) Assigned to: Nobody/Anonymous (nobody) Summary: save page generator output Initial Comment: When calling pagegenerators.py directly from the command line, it is handy to save the output for post-processing or use as input into replace.py -file page generator. This patch provides a -output argument. As this patch does not handle unicode page names, it needs improvement before it should be committed. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=603140&aid=1814580&group_…

1 0

← Newer
1
...
5
6
7
8
9
10
11
...
20
Older →

Jump to page:

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

pywikibot October 2007