pywikibot October 2008

pywikibot@lists.wikimedia.org

25 participants
195 discussions

[Pywikipedia-l] [ pywikipediabot-Bugs-2136828 ] wiktionary_family.py - wrong sort order for fr.wikt.
by SourceForge.net 11 Oct '08

11 Oct '08

Bugs item #2136828, was opened at 2008-09-29 21:10 Message generated for change (Comment added) made by spacebirdy You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=603138&aid=2136828&group_… Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. Category: None Group: None Status: Open Resolution: None Priority: 5 Private: No Submitted By: ulana merops (spacebirdy) Assigned to: Nobody/Anonymous (nobody) Summary: wiktionary_family.py - wrong sort order for fr.wikt. Initial Comment: Please see http://fr.wiktionary.org/wiki/Discussion_Wiktionnaire:Structure_des_article… and remove 'fr': self.alphabetic, in line 416 Syntax on fr.wikt: http://fr.wiktionary.org/wiki/Wiktionnaire:Structure_des_articles#Liens_int… I don't know who added this here but it seems wrong, thanks ---------------------------------------------------------------------- >Comment By: ulana merops (spacebirdy) Date: 2008-10-11 14:02 Message: Please, I would like to update the bot normally without having to remove that line all the time, thanks in advance. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=603138&aid=2136828&group_…

1 0

[Pywikipedia-l] [ pywikipediabot-Bugs-2158249 ] weblinkchecker.py doesn't report archive.org links anymore
by SourceForge.net 10 Oct '08

10 Oct '08

Bugs item #2158249, was opened at 2008-10-10 22:01 Message generated for change (Tracker Item Submitted) made by Item Submitter You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=603138&aid=2158249&group_… Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. Category: other Group: None Status: Open Resolution: None Priority: 5 Private: No Submitted By: Nobody/Anonymous (nobody) Assigned to: Nobody/Anonymous (nobody) Summary: weblinkchecker.py doesn't report archive.org links anymore Initial Comment: Weblinkchecker does not report archive.org links anymore. On my run on Sept 26, it still reported the archive links, on Oct 3 weblinkchecker reported not a single (from several hundred dead links on that run). For example http://web.archive.org/web/*/http://www.gruene-muenchen.de/landesverband.64… is available, but is no reported on http://de.wikipedia.org/wiki/Diskussion:Theresa_Schopper During the run weblinkchecker gives the output: Consulting the Internet Archive for http://www.gruene-muenchen.de/landesverband.6417.0.html python version.py Pywikipedia [http] trunk/pywikipedia (r5945, Oct 10 2008, 11:16:07) Python 2.5.2 (r252:60911, Oct 5 2008, 19:24:49) [GCC 4.3.2] ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=603138&aid=2158249&group_…

1 0

[Pywikipedia-l] [ pywikipediabot-Bugs-2158228 ] weblinkchecker.py doesn't report archive.org links anymore
by SourceForge.net 10 Oct '08

10 Oct '08

Bugs item #2158228, was opened at 2008-10-10 21:48 Message generated for change (Tracker Item Submitted) made by Item Submitter You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=603138&aid=2158228&group_… Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. Category: other Group: None Status: Open Resolution: None Priority: 5 Private: No Submitted By: Nobody/Anonymous (nobody) Assigned to: Nobody/Anonymous (nobody) Summary: weblinkchecker.py doesn't report archive.org links anymore Initial Comment: Weblinkchecker does not report archive.org links anymore. On my run on Sept 26, it still reported the archive links, on Oct 3 weblinkchecker reported not a single (from several hundred dead links on that run). For example http://web.archive.org/web/*/http://www.gruene-muenchen.de/landesverband.64… is available, but is no reported on http://de.wikipedia.org/wiki/Diskussion:Theresa_Schopper During the run weblinkchecker gives the output: Consulting the Internet Archive for http://www.gruene-muenchen.de/landesverband.6417.0.html python version.py Pywikipedia [http] trunk/pywikipedia (r5945, Oct 10 2008, 11:16:07) Python 2.5.2 (r252:60911, Oct 5 2008, 19:24:49) [GCC 4.3.2] ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=603138&aid=2158228&group_…

1 0

[Pywikipedia-l] SVN: [5955] trunk/pywikipedia/cosmetic_changes.py
by mfarag＠svn.wikimedia.org 10 Oct '08

10 Oct '08

Revision: 5955 Author: mfarag Date: 2008-10-10 21:20:05 +0000 (Fri, 10 Oct 2008) Log Message: ----------- Update Arabic (ar) Modified Paths: -------------- trunk/pywikipedia/cosmetic_changes.py Modified: trunk/pywikipedia/cosmetic_changes.py =================================================================== --- trunk/pywikipedia/cosmetic_changes.py 2008-10-10 17:30:54 UTC (rev 5954) +++ trunk/pywikipedia/cosmetic_changes.py 2008-10-10 21:20:05 UTC (rev 5955) @@ -41,7 +41,7 @@ # Summary message when using this module as a stand-alone script msg_standalone = { - 'ar': u'روبوت: تغييرات طفيفة', + 'ar': u'روبوت: تغييرات تجميلية', 'da': u'Bot: Kosmetiske ændringer', 'de': u'Bot: Kosmetische Änderungen', 'en': u'Robot: Cosmetic changes', @@ -63,7 +63,7 @@ # Summary message that will be appended to the normal message when # cosmetic changes are made on the fly msg_append = { - 'ar': u'; تغييرات طفيفة', + 'ar': u'; تغييرات تجميلية', 'de': u'; kosmetische Änderungen', 'da': u'; kosmetiske ændringer', 'en': u'; cosmetic changes',

1 0

[Pywikipedia-l] Rewrite status report and roadmap
by Russell Blau 10 Oct '08

10 Oct '08

Since I actually got a request for information about the rewrite project(!), here's a summary of where things stand and what other developers can help with. For those who aren't aware, the goal of the rewrite branch is to convert the entire bot framework to use the MediaWiki API instead of screen-scraping for both reading from and writing to a wiki. Generally, the changes are to be "behind the scenes," with the goal of maintaining backwards-compatibility with the old framework as much as possible. Nonetheless, we are taking this opportunity to clean up some warts in the old framework and add some new capabilities, so old code won't "just run" without some conversion effort. Why bother? Because the API is faster and more reliable than screen-scraping, and we won't have to spend hours hunting and fixing bugs every time the MediaWiki developers decide to change an HTML tag somewhere in their page design. As Brion Vibber said, "Screen-scraping constantly-changing UI is like repeatedly banging yourself in the head with a bowling ball. It's painful and doesn't accomplish much, but it feels SO GOOD when you stop!" http://lists.wikimedia.org/pipermail/wikitech-l/2008-August/039076.html And he's made it very clear that changes to the UI will be made regardless of what effect they may have on bots. * Where we stand First of all, the code in the rewrite branch actually works; you can check it out from SVN, run it, and experiment with it on the wiki of your choice. Not all the functionality of the current framework has been replicated yet, but you can instantiate a Site or a Page, get the page text, save the page, and so forth. See the file 'README-conversion.txt' for a brief rundown of how to convert from the old syntax to the new. You will need to create a new user-config.py for the new framework, and tuck it away in a different directory than the one you use for the old framework. (Preferably, this should be ~/.pywikibot for Unix and similar systems, and C:\Documents and Settings\USERNAME\Application Settings\pywikibot for Windows systems.) Set the environment key PYWIKIBOT2_DIR to the name of this directory. The design of the framework is based on the following layers: - Communications (http request handling) - Data (forming API requests and parsing the responses) - Wiki (objects representing contents of a wiki, including Sites and Pages) - Bot (the application programs) Generally, each layer should only interact with the ones immediately above and below it (although in practice there are a few exceptions). Recently I have been working on testing the Site object's methods; this has been exceedingly tedious but very useful, as it has uncovered a number of bugs. I am hoping to complete this phase soon, as I find the time, then move on to the Page object and its subclasses. * How others can help 1. Test the new framework, and report (or, even better, fix) any bugs or unclear documentation you find. 2. Develop and run unit tests for the Page object and its subclasses. 3. Port existing functions and methods that manipulate wiki text and return a new text (from wikipedia.py, catlib.py, and so forth) into a new textlib.py module. 4. Help identify any exceptions to backwards-compatibility, and if appropriate add a new function/method to map the old framework's code to the new one. 5. Start writing a new Bot class that can be subclassed by developers for their bots; this should at a minimum provide the capabilities now in wikipedia.handleArgs(), including help functionality, and the pagegenerators.py module. 6. Identify what's missing from this list! ;) Thanks in advance to anyone who pitches in on this project. And don't hesitate to bother me with questions! Russ Blau

2 1

[Pywikipedia-l] SVN: [5954] trunk/pywikipedia/checkimages.py
by filnik＠svn.wikimedia.org 10 Oct '08

10 Oct '08

Revision: 5954 Author: filnik Date: 2008-10-10 17:30:54 +0000 (Fri, 10 Oct 2008) Log Message: ----------- Minibugfix (?) Modified Paths: -------------- trunk/pywikipedia/checkimages.py Modified: trunk/pywikipedia/checkimages.py =================================================================== --- trunk/pywikipedia/checkimages.py 2008-10-10 16:48:56 UTC (rev 5953) +++ trunk/pywikipedia/checkimages.py 2008-10-10 17:30:54 UTC (rev 5954) @@ -1054,22 +1054,21 @@ except wikipedia.BadTitle: # Template with wrong name, no need to report, simply skip continue - else: - if template in self.list_licenses: # the list_licenses are loaded in the __init__ (not to load them multimple times) - seems_ok = True - exit_cicle = True - license_found = license_selected # let the last "fake" license normally detected - break + if template in self.list_licenses: # the list_licenses are loaded in the __init__ (not to load them multimple times) + seems_ok = True + exit_cicle = True + license_found = license_selected # let the last "fake" license normally detected + break # previous block was unsuccessful? Try with the next one for license_selected in licenses_found: try: template = self.giveMeTheTemplate(license_selected) + if template == None: + continue # ok, this template it's not ok, continue.. except wikipedia.BadTitle: # Template with wrong name, no need to report, simply skip continue - try: - if template == None: - continue # ok, this template it's not ok, continue.. + try: template_text = template.get() except wikipedia.NoPage: continue # ok, this template it's not ok, continue..

1 0

[Pywikipedia-l] SVN: [5953] trunk/pywikipedia/checkimages.py
by filnik＠svn.wikimedia.org 10 Oct '08

10 Oct '08

Revision: 5953 Author: filnik Date: 2008-10-10 16:48:56 +0000 (Fri, 10 Oct 2008) Log Message: ----------- Little bugfix on smartDetection() Modified Paths: -------------- trunk/pywikipedia/checkimages.py Modified: trunk/pywikipedia/checkimages.py =================================================================== --- trunk/pywikipedia/checkimages.py 2008-10-10 16:47:17 UTC (rev 5952) +++ trunk/pywikipedia/checkimages.py 2008-10-10 16:48:56 UTC (rev 5953) @@ -1068,9 +1068,9 @@ # Template with wrong name, no need to report, simply skip continue try: - template_text = template.get() if template == None: - continue # ok, this template it's not ok, continue.. + continue # ok, this template it's not ok, continue.. + template_text = template.get() except wikipedia.NoPage: continue # ok, this template it's not ok, continue.. regex_noinclude = re.compile(r'<noinclude>(.*?)</noinclude>', re.DOTALL)

1 0

[Pywikipedia-l] SVN: [5952] trunk/pywikipedia/checkimages.py
by filnik＠svn.wikimedia.org 10 Oct '08

10 Oct '08

Revision: 5952 Author: filnik Date: 2008-10-10 16:47:17 +0000 (Fri, 10 Oct 2008) Log Message: ----------- Little fix in the checkImageOnCommons() function, add 'same name' if the images have.. the same name, yes, how do you guess? :) Modified Paths: -------------- trunk/pywikipedia/checkimages.py Modified: trunk/pywikipedia/checkimages.py =================================================================== --- trunk/pywikipedia/checkimages.py 2008-10-10 15:28:22 UTC (rev 5951) +++ trunk/pywikipedia/checkimages.py 2008-10-10 16:47:17 UTC (rev 5952) @@ -793,7 +793,7 @@ """ Checking if the image is on commons """ wikipedia.output(u'Checking if %s is on commons...' % self.image) commons_site = wikipedia.getSite('commons', 'commons') - regexOnCommons = r"\n\*\[\[:Image:%s\]\] is also on '''Commons''': \[\[commons:Image:.*?\]\]$" % self.image + regexOnCommons = r"\n\*\[\[:Image:%s\]\] is also on '''Commons''': \[\[commons:Image:.*?\]\](?: $same name$|)$" % self.image imagePage = wikipedia.ImagePage(self.site, 'Image:%s' % self.image) hash_found = imagePage.getHash() if hash_found == None: @@ -809,11 +809,14 @@ # Problems? Yes! We have to skip the check part for that image! # Because it's on commons but someone has added something on your project. return False - elif 'stemma' in self.image.lower() and self.site.lang == 'it': + elif re.findall(r'\bstemma\b', self.image.lower()) != [] and self.site.lang == 'it': wikipedia.output(u'%s has "stemma" inside, means that it\'s ok.' % self.image) return True # Problems? No, it's only not on commons but the image needs a check else: - repme = "\n*[[:Image:%s]] is also on '''Commons''': [[commons:Image:%s]]" % (self.image, commons_image_with_this_hash[0]) + if self.image == commons_image_with_this_hash[0]: + repme = "\n*[[:Image:%s]] is also on '''Commons''': [[commons:Image:%s]] (same name)" % (self.image, commons_image_with_this_hash[0]) + else: + repme = "\n*[[:Image:%s]] is also on '''Commons''': [[commons:Image:%s]]" % (self.image, commons_image_with_this_hash[0]) self.report_image(self.image, self.rep_page, self.com, repme, addings = False, regex = regexOnCommons) # Problems? No, return True return True

1 0

[Pywikipedia-l] SVN: [5951] trunk/pywikipedia/checkimages.py
by filnik＠svn.wikimedia.org 10 Oct '08

10 Oct '08

Revision: 5951 Author: filnik Date: 2008-10-10 15:28:22 +0000 (Fri, 10 Oct 2008) Log Message: ----------- Testing phase on commons gives a lot of things to think about.. continuing with the fixing phase Modified Paths: -------------- trunk/pywikipedia/checkimages.py Modified: trunk/pywikipedia/checkimages.py =================================================================== --- trunk/pywikipedia/checkimages.py 2008-10-10 14:33:40 UTC (rev 5950) +++ trunk/pywikipedia/checkimages.py 2008-10-10 15:28:22 UTC (rev 5951) @@ -1017,6 +1017,17 @@ list_licenses.append(pageLicense) # the list has wiki-pages return list_licenses + def giveMeTheTemplate(self, license_selected): + #print template.exists() + template = wikipedia.Page(self.site, 'Template:%s' % license_selected) + if not template.exists(): + template = wikipedia.Page(self.site, license_selected) + if not template.exists(): + return None # break and exit + if template.isRedirectPage(): + template = template.getRedirectTarget() + return template + def smartDetection(self, image_text): seems_ok = False license_found = None @@ -1030,16 +1041,13 @@ break if licenses_found != []: for license_selected in licenses_found: - #print template.exists() - template = wikipedia.Page(self.site, 'Template:%s' % license_selected) - if not template.exists(): - template = wikipedia.Page(self.site, license_selected) - if not template.exists(): - exit_cicle = True - break # break and report + # put the first, if there is problem, this will be reported in the log + if license_found == None: + license_found = license_selected try: - if template.isRedirectPage(): - template = template.getRedirectTarget() + template = self.giveMeTheTemplate(license_selected) + if template == None: + continue except wikipedia.BadTitle: # Template with wrong name, no need to report, simply skip continue @@ -1047,16 +1055,21 @@ if template in self.list_licenses: # the list_licenses are loaded in the __init__ (not to load them multimple times) seems_ok = True exit_cicle = True + license_found = license_selected # let the last "fake" license normally detected break - license_found = license_selected # let the last "fake" license normally detected # previous block was unsuccessful? Try with the next one for license_selected in licenses_found: try: + template = self.giveMeTheTemplate(license_selected) + except wikipedia.BadTitle: + # Template with wrong name, no need to report, simply skip + continue + try: template_text = template.get() + if template == None: + continue # ok, this template it's not ok, continue.. except wikipedia.NoPage: - seems_ok = False # Empty template (maybe deleted while the script's running) - exit_cicle = True - break + continue # ok, this template it's not ok, continue.. regex_noinclude = re.compile(r'<noinclude>(.*?)</noinclude>', re.DOTALL) template_text = regex_noinclude.sub('', template_text) if second_round == False: @@ -1065,7 +1078,6 @@ break # only exit from the for, not from the while else: exit_cicle = True - license_found = license_selected # A good license? Ok, let's use it instead break if not seems_ok: rep_text_license_fake = "\n*[[:Image:%s]] seems to have a ''fake license'', license detected: {{tl|%s}}." % (self.image, license_found)

1 0

[Pywikipedia-l] SVN: [5950] trunk/pywikipedia/checkimages.py
by filnik＠svn.wikimedia.org 10 Oct '08

10 Oct '08

Revision: 5950 Author: filnik Date: 2008-10-10 14:33:40 +0000 (Fri, 10 Oct 2008) Log Message: ----------- Fixing again smartdetection, commons testing phase successful, let's see if there's anything else to add.. Modified Paths: -------------- trunk/pywikipedia/checkimages.py Modified: trunk/pywikipedia/checkimages.py =================================================================== --- trunk/pywikipedia/checkimages.py 2008-10-10 14:14:04 UTC (rev 5949) +++ trunk/pywikipedia/checkimages.py 2008-10-10 14:33:40 UTC (rev 5950) @@ -342,7 +342,7 @@ 'ta':[u'information'], 'zh':[u'information'], } - +# A page where there's a list of template to skip. PageWithHiddenTemplates = { 'commons': u'User:Filbot/White_templates#White_templates', 'en':None, @@ -350,6 +350,14 @@ 'ko': u'User:Kwjbot_IV/whitetemplates/list', } +# A page where there's a list of template to consider as licenses. +PageWithAllowedTemplates = { + 'commons': u'User:Filbot/Allowed templates', + 'en':None, + 'it':u'Progetto:Coordinamento/Immagini/Bot/AllowedTemplates', + 'ko': u'User:Kwjbot_IV/whitetemplates/list', + } + # Template added when the bot finds only an hidden template and nothing else. # Note: every __botnick__ will be repleaced with your bot's nickname (feel free not to use if you don't need it) HiddenTemplateNotification = { @@ -497,6 +505,7 @@ self.com = wikipedia.translate(self.site, comm10) self.hiddentemplate = wikipedia.translate(self.site, HiddenTemplate) self.pageHidden = wikipedia.translate(self.site, PageWithHiddenTemplates) + self.pageAllowed = wikipedia.translate(self.site, PageWithAllowedTemplates) # Commento = Summary in italian self.commento = wikipedia.translate(self.site, comm) # Adding the bot's nickname at the notification text if needed. @@ -992,6 +1001,20 @@ gen = pagegenerators.CategorizedPageGenerator(cat) pages = [page for page in gen] list_licenses.extend(pages) + + # Add the licenses set in the default page as licenses + # to check + if self.pageAllowed != None: + try: + pageAllowedText = wikipedia.Page(self.site, self.pageAllowed).get() + except (wikipedia.NoPage, wikipedia.IsRedirectPage): + pageAllowedText = '' + for nameLicense in self.load(pageAllowedText): + if not 'template:' in nameLicense.lower(): + nameLicense = 'Template:%s' % nameLicense + pageLicense = wikipedia.Page(self.site, nameLicense) + if pageLicense not in list_licenses: + list_licenses.append(pageLicense) # the list has wiki-pages return list_licenses def smartDetection(self, image_text): @@ -1000,6 +1023,7 @@ regex_find_licenses = re.compile(r'\{\{(?:[Tt]emplate:|)(.*?)(?:[|\n].*?|)\}\}', re.DOTALL) licenses_found = regex_find_licenses.findall(image_text) second_round = False + exit_cicle = False # howTo exit from both the for and the while cicle while 1: if exit_cicle: # howTo exit from the while @@ -1033,6 +1057,8 @@ seems_ok = False # Empty template (maybe deleted while the script's running) exit_cicle = True break + regex_noinclude = re.compile(r'<noinclude>(.*?)</noinclude>', re.DOTALL) + template_text = regex_noinclude.sub('', template_text) if second_round == False: licenses_found = regex_find_licenses.findall(template_text) second_round = True

1 0

← Newer
1
...
12
13
14
15
16
17
18
19
20
Older →

Jump to page:

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

pywikibot October 2008