pywikibot

pywikibot@lists.wikimedia.org

3 participants
6830 discussions

[Pywikipedia-l] SVN: [4208] trunk/pywikipedia
by wikipedian＠svn.wikimedia.org 07 Sep '07

07 Sep '07

Revision: 4208 Author: wikipedian Date: 2007-09-07 13:22:08 +0000 (Fri, 07 Sep 2007) Log Message: ----------- improved working on multiple sites Modified Paths: -------------- trunk/pywikipedia/pagegenerators.py trunk/pywikipedia/replace.py Modified: trunk/pywikipedia/pagegenerators.py =================================================================== --- trunk/pywikipedia/pagegenerators.py 2007-09-07 13:08:19 UTC (rev 4207) +++ trunk/pywikipedia/pagegenerators.py 2007-09-07 13:22:08 UTC (rev 4208) @@ -499,10 +499,16 @@ def preload(self, pages): try: - site = pages[0].site() - # filter out pages that are on other sites - pages = filter(lambda p: p.site() == site, pages) - wikipedia.getall(site, pages, throttle=False) + while len(pages) > 0: + # It might be that the pages are on different sites, + # e.g. because the -interwiki parameter was used. + # Query the sites one by one. + site = pages[0].site() + pagesThisSite = [page for page in pages if page.site() == site] + pages = [page for page in pages if page.site() != site] + wikipedia.getall(site, pagesThisSite, throttle=False) + for page in pagesThisSite: + yield page except IndexError: # Can happen if the pages list is empty. Don't care. pass @@ -520,14 +526,12 @@ # We don't want to load too many pages at once using XML export. # We only get a maximum number at a time. if len(somePages) >= self.pageNumber: - self.preload(somePages) - for refpage in somePages: + for refpage in self.preload(somePages): self.queue.put(refpage) somePages = [] if somePages: # preload remaining pages - self.preload(somePages) - for refpage in somePages: + for refpage in self.preload(somePages): self.queue.put(refpage) self.queue.put(None) # to signal end of list except Exception, e: Modified: trunk/pywikipedia/replace.py =================================================================== --- trunk/pywikipedia/replace.py 2007-09-07 13:08:19 UTC (rev 4207) +++ trunk/pywikipedia/replace.py 2007-09-07 13:22:08 UTC (rev 4208) @@ -223,21 +223,21 @@ # Load the page's text from the wiki original_text = page.get() if not page.canBeEdited(): - wikipedia.output(u'Skipping locked page %s' % page.title()) + wikipedia.output(u'Skipping locked page %s' % page.aslink()) continue except wikipedia.NoPage: - wikipedia.output(u'Page %s not found' % page.title()) + wikipedia.output(u'Page %s not found' % page.aslink()) continue except wikipedia.IsRedirectPage: original_text = page.get(get_redirect=True) match = self.checkExceptions(original_text) # skip all pages that contain certain texts if match: - wikipedia.output(u'Skipping %s because it contains %s' % (page.title(), match)) + wikipedia.output(u'Skipping %s because it contains %s' % (page.aslink(), match)) else: new_text = self.doReplacements(original_text) if new_text == original_text: - wikipedia.output('No changes were necessary in %s' % page.title()) + wikipedia.output('No changes were necessary in %s' % page.aslink()) else: if self.recursive: newest_text = self.doReplacements(new_text)

1 0

[Pywikipedia-l] SVN: [4207] trunk/pywikipedia/pagegenerators.py
by wikipedian＠svn.wikimedia.org 07 Sep '07

07 Sep '07

Revision: 4207 Author: wikipedian Date: 2007-09-07 13:08:19 +0000 (Fri, 07 Sep 2007) Log Message: ----------- preloading generator: filter out pages from other sites (required for -interwiki parameter) Modified Paths: -------------- trunk/pywikipedia/pagegenerators.py Modified: trunk/pywikipedia/pagegenerators.py =================================================================== --- trunk/pywikipedia/pagegenerators.py 2007-09-07 12:57:23 UTC (rev 4206) +++ trunk/pywikipedia/pagegenerators.py 2007-09-07 13:08:19 UTC (rev 4207) @@ -500,6 +500,8 @@ def preload(self, pages): try: site = pages[0].site() + # filter out pages that are on other sites + pages = filter(lambda p: p.site() == site, pages) wikipedia.getall(site, pages, throttle=False) except IndexError: # Can happen if the pages list is empty. Don't care.

1 0

[Pywikipedia-l] SVN: [4206] trunk/pywikipedia/pagegenerators.py
by wikipedian＠svn.wikimedia.org 07 Sep '07

07 Sep '07

Revision: 4206 Author: wikipedian Date: 2007-09-07 12:57:23 +0000 (Fri, 07 Sep 2007) Log Message: ----------- added -interwiki parameter for cross-site work Modified Paths: -------------- trunk/pywikipedia/pagegenerators.py Modified: trunk/pywikipedia/pagegenerators.py =================================================================== --- trunk/pywikipedia/pagegenerators.py 2007-09-07 09:43:40 UTC (rev 4205) +++ trunk/pywikipedia/pagegenerators.py 2007-09-07 12:57:23 UTC (rev 4206) @@ -31,6 +31,13 @@ config.py for instructions. Argument can also be given as "-google:searchstring". + -interwiki Work on the given page and all equivalent pages in other + languages. This can, for example, be used to fight + multi-site spamming. + Attention: this will cause the bot to modify + pages on several wiki sites, this is not well tested, + so check your edits! + -links Work on all pages that are linked from a certain page. Argument can also be given as "-links:linkingpagetitle". @@ -130,6 +137,11 @@ for page in site.withoutinterwiki(number=number, repeat=repeat): yield page +def InterwikiPageGenerator(page): + yield page + for iwPage in page.interwiki(): + yield iwPage + def ReferringPageGenerator(referredPage, followRedirects=False, withTemplateInclusion=True, onlyTemplateInclusion=False): @@ -582,6 +594,13 @@ gen = WithoutInterwikiPageGenerator() else: gen = WithoutInterwikiPageGenerator(number = int(arg[18:])) + elif arg.startswith('-interwiki'): + if len(arg) == 10: + title = wikipedia.input(u'Which page should be processed?') + else: + title = arg[11:] + page = wikipedia.Page(wikipedia.getSite(), title) + gen = InterwikiPageGenerator(page) elif arg.startswith('-file'): if len(arg) == 5: textfilename = wikipedia.input(u'Please enter the local file name:')

1 0

[Pywikipedia-l] [ pywikipediabot-Feature Requests-1789962 ] reintroducing without_interwiki feature
by SourceForge.net 07 Sep '07

07 Sep '07

Feature Requests item #1789962, was opened at 2007-09-07 11:26 Message generated for change (Comment added) made by wikipedian You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=603141&aid=1789962&group_… Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. Category: None Group: None >Status: Closed Priority: 5 Private: No Submitted By: ChongDae (cdpark) Assigned to: Nobody/Anonymous (nobody) Summary: reintroducing without_interwiki feature Initial Comment: Just now, without_interwiki feature is removed. http://svn.wikimedia.org/viewvc/pywikipedia/trunk/pywikipedia/interwiki.py?… Please introduce without_interwiki feature again. It is still helpful for bot operators. After running batch jobs, without_interwiki and autonomous_problem logs are helpful for finding missing/conflict interwiki jobs. [[Special:Withoutinterwiki]] is not alternative for -new or -cat or -link runs. ---------------------------------------------------------------------- >Comment By: Daniel Herding (wikipedian) Date: 2007-09-07 11:44 Message: Logged In: YES user_id=880694 Originator: NO done ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=603141&aid=1789962&group_…

1 0

[Pywikipedia-l] SVN: [4205] trunk/pywikipedia
by wikipedian＠svn.wikimedia.org 07 Sep '07

07 Sep '07

Revision: 4205 Author: wikipedian Date: 2007-09-07 09:43:40 +0000 (Fri, 07 Sep 2007) Log Message: ----------- reverted, reintroducing without_interwiki config option, as per request in [ pywikipediabot-Feature Requests-1789962 ] Modified Paths: -------------- trunk/pywikipedia/config.py trunk/pywikipedia/interwiki.py Modified: trunk/pywikipedia/config.py =================================================================== --- trunk/pywikipedia/config.py 2007-09-06 22:16:48 UTC (rev 4204) +++ trunk/pywikipedia/config.py 2007-09-07 09:43:40 UTC (rev 4205) @@ -211,6 +211,9 @@ # 'http://www.example.org/~yourname/interwiki-graphs/' interwiki_graph_url = None +# Save file with local articles without interwikis. +without_interwiki = False + ############## SOLVE_DISAMBIGUATION SETTINGS ############ # # Set disambiguation_comment[FAMILY][LANG] to a non-empty string to override Modified: trunk/pywikipedia/interwiki.py =================================================================== --- trunk/pywikipedia/interwiki.py 2007-09-06 22:16:48 UTC (rev 4204) +++ trunk/pywikipedia/interwiki.py 2007-09-07 09:43:40 UTC (rev 4205) @@ -186,6 +186,8 @@ interwiki_graph_format: the file format for interwiki graphs +without_interwiki: save file with local articles without interwikis + All these options can be changed through the user-config.py configuration file. If interwiki.py is terminated before it is finished, it will write a dump file @@ -577,6 +579,13 @@ return True return False + def reportInterwikilessPage(self, page): + wikipedia.output(u"NOTE: %s does not have any interwiki links" % self.originPage.aslink(True)) + if config.without_interwiki: + f = codecs.open('without_interwiki.txt', 'a', 'utf-8') + f.write("# %s \n" % page.aslink()) + f.close() + def askForHints(self, counter): if (self.untranslated or globalvar.askhints) and not self.hintsAsked and not self.originPage.isRedirectPage(): # Only once! @@ -704,7 +713,7 @@ self.pending = [] # Check whether we need hints and the user offered to give them if self.untranslated and not self.hintsAsked: - wikipedia.output(u"NOTE: %s does not have any interwiki links" % self.originPage.aslink(True)) + self.reportInterwikilessPage(page) self.askForHints(counter) def isDone(self):

1 0

[Pywikipedia-l] [ pywikipediabot-Feature Requests-1789962 ] reintroducing without_interwiki feature
by SourceForge.net 07 Sep '07

07 Sep '07

Feature Requests item #1789962, was opened at 2007-09-07 18:26 Message generated for change (Tracker Item Submitted) made by Item Submitter You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=603141&aid=1789962&group_… Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. Category: None Group: None Status: Open Priority: 5 Private: No Submitted By: ChongDae (cdpark) Assigned to: Nobody/Anonymous (nobody) Summary: reintroducing without_interwiki feature Initial Comment: Just now, without_interwiki feature is removed. http://svn.wikimedia.org/viewvc/pywikipedia/trunk/pywikipedia/interwiki.py?… Please introduce without_interwiki feature again. It is still helpful for bot operators. After running batch jobs, without_interwiki and autonomous_problem logs are helpful for finding missing/conflict interwiki jobs. [[Special:Withoutinterwiki]] is not alternative for -new or -cat or -link runs. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=603141&aid=1789962&group_…

1 0

[Pywikipedia-l] SVN: [4204] trunk/pywikipedia
by wikipedian＠svn.wikimedia.org 06 Sep '07

06 Sep '07

Revision: 4204 Author: wikipedian Date: 2007-09-06 22:16:48 +0000 (Thu, 06 Sep 2007) Log Message: ----------- removed obsolete without_interwiki config parameter: there is now [[Special:Withoutinterwiki]] for this. Modified Paths: -------------- trunk/pywikipedia/config.py trunk/pywikipedia/interwiki.py Modified: trunk/pywikipedia/config.py =================================================================== --- trunk/pywikipedia/config.py 2007-09-06 21:35:13 UTC (rev 4203) +++ trunk/pywikipedia/config.py 2007-09-06 22:16:48 UTC (rev 4204) @@ -211,9 +211,6 @@ # 'http://www.example.org/~yourname/interwiki-graphs/' interwiki_graph_url = None -# Save file with local articles without interwikis. -without_interwiki = False - ############## SOLVE_DISAMBIGUATION SETTINGS ############ # # Set disambiguation_comment[FAMILY][LANG] to a non-empty string to override Modified: trunk/pywikipedia/interwiki.py =================================================================== --- trunk/pywikipedia/interwiki.py 2007-09-06 21:35:13 UTC (rev 4203) +++ trunk/pywikipedia/interwiki.py 2007-09-06 22:16:48 UTC (rev 4204) @@ -186,8 +186,6 @@ interwiki_graph_format: the file format for interwiki graphs -without_interwiki: save file with local articles without interwikis - All these options can be changed through the user-config.py configuration file. If interwiki.py is terminated before it is finished, it will write a dump file @@ -579,13 +577,6 @@ return True return False - def reportInterwikilessPage(self, page): - wikipedia.output(u"NOTE: %s does not have any interwiki links" % self.originPage.aslink(True)) - if config.without_interwiki: - f = codecs.open('without_interwiki.txt', 'a', 'utf-8') - f.write("# %s \n" % page.aslink()) - f.close() - def askForHints(self, counter): if (self.untranslated or globalvar.askhints) and not self.hintsAsked and not self.originPage.isRedirectPage(): # Only once! @@ -713,7 +704,7 @@ self.pending = [] # Check whether we need hints and the user offered to give them if self.untranslated and not self.hintsAsked: - self.reportInterwikilessPage(page) + wikipedia.output(u"NOTE: %s does not have any interwiki links" % self.originPage.aslink(True)) self.askForHints(counter) def isDone(self):

1 0

[Pywikipedia-l] SVN: [4203] trunk/pywikipedia/interwiki.py
by wikipedian＠svn.wikimedia.org 06 Sep '07

06 Sep '07

Revision: 4203 Author: wikipedian Date: 2007-09-06 21:35:13 +0000 (Thu, 06 Sep 2007) Log Message: ----------- docu Modified Paths: -------------- trunk/pywikipedia/interwiki.py Modified: trunk/pywikipedia/interwiki.py =================================================================== --- trunk/pywikipedia/interwiki.py 2007-09-06 17:15:47 UTC (rev 4202) +++ trunk/pywikipedia/interwiki.py 2007-09-06 21:35:13 UTC (rev 4203) @@ -173,6 +173,9 @@ Some configuration option can be used to change the working of this robot: +interwiki_min_subjects: the minimum amount of subjects that should be processed + at the same time. + interwiki_backlink: if set to True, all problems in foreign wikis will be reported

1 0

[Pywikipedia-l] SVN: [4202] trunk/pywikipedia/wikipedia.py
by misza13＠svn.wikimedia.org 06 Sep '07

06 Sep '07

Revision: 4202 Author: misza13 Date: 2007-09-06 17:15:47 +0000 (Thu, 06 Sep 2007) Log Message: ----------- Fix ImagePage.fileUrl() - image page looks slightly different for .ogg files. Bypassing using negative lookahead (bogus "image link" has a 'class="image"' following it). Modified Paths: -------------- trunk/pywikipedia/wikipedia.py Modified: trunk/pywikipedia/wikipedia.py =================================================================== --- trunk/pywikipedia/wikipedia.py 2007-09-05 17:18:41 UTC (rev 4201) +++ trunk/pywikipedia/wikipedia.py 2007-09-06 17:15:47 UTC (rev 4202) @@ -2120,7 +2120,7 @@ # The part after the | is required for copying .ogg files from en:, as they do not # have a "full image link" div. This might change in the future; on commons, there # is a full image link for .ogg and .mid files. - urlR = re.compile(r'<div class="fullImageLink" id="file">.*?<a href="(?P<url>.+?)"|<span class="dangerousLink"><a href="(?P<url2>.+?)"', re.DOTALL) + urlR = re.compile(r'<div class="fullImageLink" id="file">.*?<a href="(?P<url>[^ ]+?)"(?! class="image")|<span class="dangerousLink"><a href="(?P<url2>.+?)"', re.DOTALL) m = urlR.search(self.getImagePageHtml()) try: url = m.group('url') or m.group('url2')

1 0

[Pywikipedia-l] [ pywikipediabot-Support Requests-1789036 ] -nobacklink option removed?
by SourceForge.net 06 Sep '07

06 Sep '07

Support Requests item #1789036, was opened at 2007-09-06 00:10 Message generated for change (Comment added) made by malafaya You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=603139&aid=1789036&group_… Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. Category: None Group: None >Status: Closed Priority: 5 Private: No Submitted By: André Malafaya Baptista (malafaya) Assigned to: Nobody/Anonymous (nobody) Summary: -nobacklink option removed? Initial Comment: I would like to know if the option -nobacklink has been removed from interwiki.py. I always used this option to reduce processing times and also CPU consumption (while displaying the backlink warnings, the CPU reaches 100% for some seconds, freezing other processes almost completely). Thanks. ---------------------------------------------------------------------- >Comment By: André Malafaya Baptista (malafaya) Date: 2007-09-06 12:21 Message: Logged In: YES user_id=1037345 Originator: YES Thanks for the info, wikipedian. I just had noticed the parameter missing from the help text but I hadn't noticed it had been moved to user-config.py. That surely helps. Thanks. ---------------------------------------------------------------------- Comment By: Daniel Herding (wikipedian) Date: 2007-09-06 00:58 Message: Logged In: YES user_id=880694 Originator: NO Hi André, first of all, I have sped up the backlink processing code. It is still slow, but not as slow as it used to be. Here is my commit message from 2007-08-13: ---- By making use of dictionaries and sets, decreased complexity from O(n^3) to O(n^2). For example, the backlinks report for python interwiki.py -lang:de Indien -localonly is now generated in 26 seconds, instead of the 190 seconds that were needed before. ---- Also, I thought that this is something you don't want to change each time you run the bot. Either you always want them, or you never do (IMHO). Thus, a command-line parameter just wasn't the right thing, especially because interwiki.py has way too many of them, which makes it hard to learn. Instead, you can now add this in your user-config.py: interwiki_backlink = False Is that OK for you? ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=603139&aid=1789036&group_…

1 0

Jump to page:

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

pywikibot