pywikibot November 2007

pywikibot@lists.wikimedia.org

29 participants
285 discussions

[Pywikipedia-l] [ pywikipediabot-Patches-1840253 ] #REDIRECT alias for Javanese (jv)
by SourceForge.net 28 Nov '07

28 Nov '07

Patches item #1840253, was opened at 2007-11-28 14:13 Message generated for change (Tracker Item Submitted) made by Item Submitter You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=603140&aid=1840253&group_… Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. Category: None Group: None Status: Open Resolution: None Priority: 5 Private: No Submitted By: André Malafaya Baptista (malafaya) Assigned to: Nobody/Anonymous (nobody) Summary: #REDIRECT alias for Javanese (jv) Initial Comment: Please apply attached patch which includes an alias for the #Redirect magic word to language jv - Javanese. Thanks. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=603140&aid=1840253&group_…

1 0

[Pywikipedia-l] SVN: [4609] trunk/pywikipedia/redirect.py
by russblau＠svn.wikimedia.org 28 Nov '07

28 Nov '07

Revision: 4609 Author: russblau Date: 2007-11-28 13:57:35 +0000 (Wed, 28 Nov 2007) Log Message: ----------- Allow redirect.py to retrieve all double-redirects up to 1000, regardless of the user's configured limit. Modified Paths: -------------- trunk/pywikipedia/redirect.py Modified: trunk/pywikipedia/redirect.py =================================================================== --- trunk/pywikipedia/redirect.py 2007-11-28 13:49:24 UTC (rev 4608) +++ trunk/pywikipedia/redirect.py 2007-11-28 13:57:35 UTC (rev 4609) @@ -177,6 +177,7 @@ mysite = wikipedia.getSite() # retrieve information from the live wiki's maintenance page # double redirect maintenance page's URL + wikipedia.config.special_page_limit = 1000 path = mysite.double_redirects_address(default_limit = False) wikipedia.output(u'Retrieving special page...') maintenance_txt = mysite.getUrl(path)

1 0

[Pywikipedia-l] SVN: [4608] trunk/pywikipedia/redirect.py
by russblau＠svn.wikimedia.org 28 Nov '07

28 Nov '07

Revision: 4608 Author: russblau Date: 2007-11-28 13:49:24 +0000 (Wed, 28 Nov 2007) Log Message: ----------- Fix bug that was causing help text to print twice when executing redirect.py from the console. Modified Paths: -------------- trunk/pywikipedia/redirect.py Modified: trunk/pywikipedia/redirect.py =================================================================== --- trunk/pywikipedia/redirect.py 2007-11-28 09:58:49 UTC (rev 4607) +++ trunk/pywikipedia/redirect.py 2007-11-28 13:49:24 UTC (rev 4608) @@ -344,8 +344,9 @@ bot = RedirectRobot(action, gen, always) bot.run() -try: - main() -finally: - wikipedia.stopme() +if __name__ == '__main__': + try: + main() + finally: + wikipedia.stopme()

1 0

[Pywikipedia-l] [ pywikipediabot-Patches-1840148 ] update clean_sandbox.py Info for jawiki
by SourceForge.net 28 Nov '07

28 Nov '07

Patches item #1840148, was opened at 2007-11-28 11:50 Message generated for change (Comment added) made by rotemliss You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=603140&aid=1840148&group_… Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. Category: None Group: None Status: Open Resolution: None Priority: 5 Private: No Submitted By: Alex S.H. Lin (lin4h) Assigned to: Nobody/Anonymous (nobody) Summary: update clean_sandbox.py Info for jawiki Initial Comment: as title. ---------------------------------------------------------------------- Comment By: Rotem Liss (rotemliss) Date: 2007-11-28 12:00 Message: Logged In: YES user_id=1327030 Originator: NO Added in r4607. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=603140&aid=1840148&group_…

1 0

[Pywikipedia-l] SVN: [4607] trunk/pywikipedia/clean_sandbox.py
by rotem＠svn.wikimedia.org 28 Nov '07

28 Nov '07

Revision: 4607 Author: rotem Date: 2007-11-28 09:58:49 +0000 (Wed, 28 Nov 2007) Log Message: ----------- Adding ja translation from patch 1840148, fixing indentation. Modified Paths: -------------- trunk/pywikipedia/clean_sandbox.py Modified: trunk/pywikipedia/clean_sandbox.py =================================================================== --- trunk/pywikipedia/clean_sandbox.py 2007-11-27 16:25:59 UTC (rev 4606) +++ trunk/pywikipedia/clean_sandbox.py 2007-11-28 09:58:49 UTC (rev 4607) @@ -29,8 +29,9 @@ 'de': u'{{Bitte erst NACH dieser Zeile schreiben! (Begrüßungskasten)}}\r\n', 'en': u'{{Please leave this line alone (sandbox heading)}}\n ', 'he': u'{{ארגז חול}}\n', - 'it': u'{{sandbox}} ', - 'ko': u'{{연습장 안내문}}', + 'it': u'{{sandbox}} ', + 'ja': u'{{subst:サンドボックス}}', + 'ko': u'{{연습장 안내문}}', 'nl': u'{{subst:Wikipedia:Zandbak/schoon zand}}', 'no': u'{{Sandkasse}}\n}}', 'pl': u'{{Prosimy - NIE ZMIENIAJ, NIE KASUJ, NIE PRZENOŚ tej linijki - pisz niżej}}', @@ -44,9 +45,9 @@ 'de': u'Bot: Setze Seite zurück.', 'en': u'Robot: This page will automatically be cleaned.', 'he': u'בוט: דף זה ינוקה אוטומטית.', - 'it': u'Bot: pulitura sandbox', - 'ja': u'BOT: 砂場ならし', - 'ko': u'로봇: 연습장 비움', + 'it': u'Bot: pulitura sandbox', + 'ja': u'ロボットによる: 砂場ならし', + 'ko': u'로봇: 연습장 비움', 'nl': u'Robot: Automatisch voorzien van schoon zand.', 'no': u'bot: Rydder sandkassa.', 'pl': u'Robot czyści brudnopis', @@ -59,8 +60,9 @@ 'de': u'Wikipedia:Spielwiese', 'en': u'Wikipedia:Sandbox', 'he': u'ויקיפדיה:ארגז חול', - 'it': u'Project:Sandbox', - 'ko': u'위키백과:연습장', + 'it': u'Project:Sandbox', + 'ja': u'Wikipedia:サンドボックス', + 'ko': u'위키백과:연습장', 'nl': u'Wikipedia:Zandbak', 'no': u'Wikipedia:Sandkasse', 'pl': u'Wikipedia:Brudnopis',

1 0

[Pywikipedia-l] [ pywikipediabot-Patches-1840148 ] update clean_sandbox.py Info for jawiki
by SourceForge.net 28 Nov '07

28 Nov '07

Patches item #1840148, was opened at 2007-11-28 17:50 Message generated for change (Tracker Item Submitted) made by Item Submitter You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=603140&aid=1840148&group_… Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. Category: None Group: None Status: Open Resolution: None Priority: 5 Private: No Submitted By: Alex S.H. Lin (lin4h) Assigned to: Nobody/Anonymous (nobody) Summary: update clean_sandbox.py Info for jawiki Initial Comment: as title. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=603140&aid=1840148&group_…

1 0

[Pywikipedia-l] [ pywikipediabot-Bugs-1829405 ] redirect.py says pages don't exist
by SourceForge.net 28 Nov '07

28 Nov '07

Bugs item #1829405, was opened at 2007-11-09 23:47 Message generated for change (Comment added) made by sf-robot You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=603138&aid=1829405&group_… Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. Category: None Group: None >Status: Closed Resolution: Invalid Priority: 5 Private: No Submitted By: Nobody/Anonymous (nobody) Assigned to: Leonardo Gregianin (leogregianin) Summary: redirect.py says pages don't exist Initial Comment: When inputting "python redirect.py double", 120 redirects are found, and each is individually opened, but instead of fixing them, it returns "[[PAGENAME]] doesn't exist." instead. See the attached screenshot anotherpeteparker(a)gmail.com ---------------------------------------------------------------------- >Comment By: SourceForge Robot (sf-robot) Date: 2007-11-27 19:20 Message: Logged In: YES user_id=1312539 Originator: NO This Tracker item was closed automatically by the system. It was previously set to a Pending status, and the original submitter did not respond within 14 days (the time period specified by the administrator of this Tracker). ---------------------------------------------------------------------- Comment By: Leonardo Gregianin (leogregianin) Date: 2007-11-13 08:47 Message: Logged In: YES user_id=1136737 Originator: NO The list of double redirects in wikipedia is save in cached, this article already was deleted but the Special:doubleredirects list not up to date. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=603138&aid=1829405&group_…

1 0

[Pywikipedia-l] SVN: [4606] trunk/pywikipedia/add_text.py
by filnik＠svn.wikimedia.org 27 Nov '07

27 Nov '07

Revision: 4606 Author: filnik Date: 2007-11-27 16:25:59 +0000 (Tue, 27 Nov 2007) Log Message: ----------- Bugfix | generator = None added Modified Paths: -------------- trunk/pywikipedia/add_text.py Modified: trunk/pywikipedia/add_text.py =================================================================== --- trunk/pywikipedia/add_text.py 2007-11-27 16:23:45 UTC (rev 4605) +++ trunk/pywikipedia/add_text.py 2007-11-27 16:25:59 UTC (rev 4606) @@ -125,6 +125,7 @@ summary = None addText = None regexSkip = None + generator = None always = False exceptUrl = False genFactory = pagegenerators.GeneratorFactory()

1 0

[Pywikipedia-l] SVN: [4605] trunk/pywikipedia/checkimages.py
by filnik＠svn.wikimedia.org 27 Nov '07

27 Nov '07

Revision: 4605 Author: filnik Date: 2007-11-27 16:23:45 +0000 (Tue, 27 Nov 2007) Log Message: ----------- Bugfix and adding hu language Modified Paths: -------------- trunk/pywikipedia/checkimages.py Modified: trunk/pywikipedia/checkimages.py =================================================================== --- trunk/pywikipedia/checkimages.py 2007-11-27 16:20:41 UTC (rev 4604) +++ trunk/pywikipedia/checkimages.py 2007-11-27 16:23:45 UTC (rev 4605) @@ -82,12 +82,14 @@ 'commons':'\n{{subst:nld}}', 'en' :'\n{{subst:nld}}', 'it' :'\n{{subst:unverdata}}', + 'hu' :u'\n{{nincslicenc|~~~~~}}', } txt_find = { 'commons':['{{no license', '{{nld'], 'en':['{{nld', '{{no license'], 'it':['{{unverdata', '{{unverified'], + 'hu':[u'{{nincsforrás',u'{{nincslicenc'], } # Summary for when the will add the no source @@ -95,6 +97,7 @@ 'commons':'Bot: Marking newly uploaded untagged file', 'en' :'Bot: Marking newly uploaded untagged file', 'it' :"Bot: Aggiungo unverified", + 'hu' :'Robot: Frissen feltöltött licencsablon nélküli fájl megjelölése', } # Summary that the bot use when it notify the problem with the image's license @@ -102,6 +105,7 @@ 'commons':"Bot: Requesting source information." , 'en' :"Bot: Requesting source information." , 'it' :"Bot: Notifico l'unverified", + 'hu' :'Robot: Forrásinformáció kérése', } # When the Bot find that the usertalk is empty is not pretty to put only the no source without the welcome, isn't it? @@ -109,6 +113,7 @@ 'commons':'{{subst:welcome}}\n~~~~\n', 'en' :'{{welcome}}\n~~~~\n', 'it' :'{{benvenuto}}\n~~~~\n', + 'hu' :u'{{subst:Üdvözlet|~~~~}}\n', } # General summary @@ -116,6 +121,7 @@ 'commons':'Bot: no source', 'en' :'Bot: no source', 'it' :'Bot: Unverified!', + 'hu' :'Robot: nincs forrás', } # if the file has an unknown extension it will be tagged with this template. @@ -124,6 +130,7 @@ 'commons':"{{db-meta|The file has .%s as extension.}}", 'en' :"{{db-meta|The file has .%s as extension.}}", 'it' :'{{cancella subito|motivo=Il file ha come estensione ".%s"}}', + 'hu' :u'{{azonnali|A fájlnak .%s a kiterjesztése}}', } # The header of the Unknown extension's message. @@ -131,6 +138,7 @@ 'commons':"\n== Unknown extension! ==\n", 'en' :"\n== Unknown extension! ==\n", 'it' :'\n== File non specificato ==\n', + 'hu' :u'\n== Ismeretlen kiterjesztésű fájl ==\n', } # Text that will be add if the bot find a unknown extension. @@ -138,12 +146,14 @@ 'commons':'The [[:Image:%s]] file has a wrong extension, please check. ~~~~', 'en' :'The [[:Image:%s]] file has a wrong extension, please check. ~~~~', 'it' :'{{subst:Utente:Filbot/Ext|%s}}', + 'hu' :u'A [[:Kép:%s]] fájlnak rossz a kiterjesztése, kérlek ellenőrízd. ~~~~', } # Summary of the delate immediately. (f.e: Adding {{db-meta|The file has .%s as extension.}}) del_comm = { 'commons':'Bot: Adding %s', 'en' :'Bot: Adding %s', 'it' :'Bot: Aggiungo %s', + 'hu' :u'Robot:"%s" hozzáadása', } # This is the most important header, because it will be used a lot. That's the header that the bot @@ -152,12 +162,14 @@ 'commons':"",# Nothing, the template has already the header inside. 'en' :"\n== Image without license ==\n", 'it' :"\n== Immagine senza licenza ==\n", + 'hu' :u"\n== Licenc nélküli kép ==\n", } # That's the text that the bot will add if it doesn't find the license. nothing_notification = { 'commons':"{{subst:User:Filnik/untagged|Image:%s}}Image:%s}}\n\n''This message was '''added automatically by [[User:Filbot|Filbot]]''', if you need some help about it, ask [[User:Filnik|its master]] or go to the [[Commons:Help desk]]''. --~~~~", 'en' :"{{subst:image source|Image:%s}} --~~~~", 'it' :"{{subst:Utente:Filbot/Senza licenza|%s}} --~~~~", + 'hu' :u"{{subst:adjforrást|Kép:%s}} \n Ezt az üzenetet ~~~ automatikusan helyezte el a vitalapodon, kérdéseddel fordulj a gazdájához, vagy a [[WP:KF|Kocsmafalhoz]]. --~~~~", } # This is a list of what bots used this script in your project. # NOTE: YOUR Botnick is automatically added. It's not required to add it twice. @@ -172,12 +184,14 @@ 'commons':None, 'en': None, 'it':'{{subst:Utente:Filbot/Senza licenza2|%s}} --~~~~', + 'hu':u'\nSzia! Úgy tűnik a [[:Kép:%s]] képpel is hasonló a probléma, mint az előbbivel. Kérlek olvasd el a [[WP:KÉPLIC|feltölthető képek]]ről szóló oldalunk, és segítségért fordulj a [[WP:KF-JO|Jogi kocsmafalhoz]]. Köszönöm --~~~~', } # You can add some settings to wikipedia. In this way, you can change them without touch the code. # That's useful if you are running the bot on Toolserver. page_with_settings = { 'commons':None, 'en':None, + 'hu':None, 'it':'Utente:Nikbot/Settings#Settings', } # The bot can report some images (like the images that have the same name of an image on commons) @@ -186,6 +200,7 @@ 'commons':'User:Filbot/Report', 'en' :'User:Filnik/Report', 'it' :'Utente:Nikbot/Report', + 'hu' :'User:Bdamokos/Report', } # Adding the date after the signature. timeselected = u' ~~~~~' @@ -194,12 +209,14 @@ 'commons':"\n*[[:Image:%s]] " + timeselected, 'en':"\n*[[:Image:%s]] " + timeselected, 'it':"\n*[[:Immagine:%s]] " + timeselected, + 'hu':u"\n*[[:Kép:%s]] " + timeselected, } # The summary of the report comm10 = { 'commons':'Bot: Updating the log', 'en':'Bot: Updating the log', 'it':'Bot: Aggiorno il log', + 'hu': 'Robot: A napló frissítése', } # If a template isn't a license but it's included on a lot of images, that can be skipped to @@ -208,10 +225,11 @@ 'commons':['{{information'], 'en':['{{information'], 'it':['{{edp', '{{informazioni file', '{{information'], + 'hu':[u'{{információ','{{enwiki', '{{azonnali'], } # Add your project (in alphabetical order) if you want that the bot start -project_inserted = ['commons', 'en', 'it'] +project_inserted = ['commons', 'en','hu', 'it'] # Ok, that's all. What is below, is the rest of code, now the code is fixed and it will run correctly in your project. ######################################################################################################################### @@ -464,37 +482,40 @@ def takesettings(self, settings): pos = 0 - x = wikipedia.Page(self.site, settings) - lista = list() - try: - testo = x.get() - rxp = "<------- ------->\n\*[Nn]ame=['\"](.*?)['\"]\n\*([Ff]ind|[Ff]indonly)=(.*?)\n\*[Ii]magechanges=(.*?)\n\*[Ss]ummary=['\"](.*?)['\"]\n\*[Hh]ead=['\"](.*?)['\"]\n\*[Tt]ext ?= ?['\"](.*?)['\"]\n\*[Mm]ex ?= ?['\"]?(.*?)['\"]?$" - r = re.compile(rxp, re.UNICODE|re.M) - number = 1 - while 1: - m = r.search(testo, pos) - if m == None: - if lista == list(): - wikipedia.output(u"You've set wrongly your settings, please take a look to the relative page. (run without them)") - lista = None - else: - break - else: - pos = m.end() - name = str(m.group(1)) - find_tipe = str(m.group(2)) - find = str(m.group(3)) - imagechanges = str(m.group(4)) - summary = str(m.group(5)) - head = str(m.group(6)) - text = str(m.group(7)) - mexcatched = str(m.group(8)) - tupla = [number, name, find_tipe, find, imagechanges, summary, head, text, mexcatched] - lista += [tupla] - number += 1 - except wikipedia.NoPage: - lista = None - return lista + if settings != None: + x = wikipedia.Page(self.site, settings) + lista = list() + try: + testo = x.get() + rxp = "<------- ------->\n\*[Nn]ame=['\"](.*?)['\"]\n\*([Ff]ind|[Ff]indonly)=(.*?)\n\*[Ii]magechanges=(.*?)\n\*[Ss]ummary=['\"](.*?)['\"]\n\*[Hh]ead=['\"](.*?)['\"]\n\*[Tt]ext ?= ?['\"](.*?)['\"]\n\*[Mm]ex ?= ?['\"]?(.*?)['\"]?$" + r = re.compile(rxp, re.UNICODE|re.M) + number = 1 + while 1: + m = r.search(testo, pos) + if m == None: + if lista == list(): + wikipedia.output(u"You've set wrongly your settings, please take a look to the relative page. (run without them)") + lista = None + else: + break + else: + pos = m.end() + name = str(m.group(1)) + find_tipe = str(m.group(2)) + find = str(m.group(3)) + imagechanges = str(m.group(4)) + summary = str(m.group(5)) + head = str(m.group(6)) + text = str(m.group(7)) + mexcatched = str(m.group(8)) + tupla = [number, name, find_tipe, find, imagechanges, summary, head, text, mexcatched] + lista += [tupla] + number += 1 + except wikipedia.NoPage: + lista = None + return lista + else: + return [] def load(self, raw): list_loaded = list()

1 0

[Pywikipedia-l] SVN: [4604] trunk/pywikipedia
by siebrand＠svn.wikimedia.org 27 Nov '07

27 Nov '07

Revision: 4604 Author: siebrand Date: 2007-11-27 16:20:41 +0000 (Tue, 27 Nov 2007) Log Message: ----------- Rename lower case Added Paths: ----------- trunk/pywikipedia/add_text.py Removed Paths: ------------- trunk/pywikipedia/AddText.py Deleted: trunk/pywikipedia/AddText.py =================================================================== --- trunk/pywikipedia/AddText.py 2007-11-27 16:11:27 UTC (rev 4603) +++ trunk/pywikipedia/AddText.py 2007-11-27 16:20:41 UTC (rev 4604) @@ -1,284 +0,0 @@ -#!/usr/bin/python -# -*- coding: utf-8 -*- -""" -This is a Bot written by Filnik to add a text in a given category. - ---- GenFactory Generator is used --- --start Define from which page should the Bot start --ref Use the ref as generator --cat Use a category as generator --filelinks Use all the links to an image as generator --unusedfiles --unwatched --withoutinterwiki --interwiki --file --uncatfiles --uncatcat --uncat --subcat --transcludes Use all the page that transclude a certain page as generator --weblink Use the pages with a certain web link as generator --links Use the links from a certain page as generator --regex Only work on pages whose titles match the given regex - ---- Other parameters --- --page Use a page as generator --text Define which text add --summary Define the summary to use --except Use a regex to understand if the template is already in the page --excepturl Use the html page as text where you want to see if there's the text, not the wiki-page. --newimages Add text in the new images --untagged Add text in the images that doesn't have any license template --always If used, the bot won't asked if it should add the text specified -""" - -# -# (C) Filnik, 2007 -# -# Distributed under the terms of the MIT license. -# -__version__ = '$Id: AddText.py,v 1.0 2007/11/27 17:08:30 filnik Exp$' -# - -import re, pagegenerators, urllib2, urllib -import wikipedia, catlib - -class NoEnoughData(wikipedia.Error): - """ Error class for when the user doesn't specified all the data needed """ - -class NothingFound(wikipedia.Error): - """ An exception indicating that a regex has return [] instead of results.""" - -def pageText(url): - try: - request = urllib2.Request(url) - user_agent = 'Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.7.12) Gecko/20050915 Firefox/1.0.7' - request.add_header("User-Agent", user_agent) - response = urllib2.urlopen(request) - text = response.read() - response.close() - # When you load to many users, urllib2 can give this error. - except urllib2.HTTPError: - wikipedia.output(u"Server error. Pausing for 10 seconds... " + time.strftime("%d %b %Y %H:%M:%S (UTC)", time.gmtime()) ) - time.sleep(10) - request = urllib2.Request(url) - user_agent = 'Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.7.12) Gecko/20050915 Firefox/1.0.7' - request.add_header("User-Agent", user_agent) - response = urllib2.urlopen(request) - text = response.read() - response.close() - return text - -def untaggedGenerator(untaggedProject, limit = 500): - lang = untaggedProject.split('.', 1)[0] - project = '.' + untaggedProject.split('.', 1)[1] - if lang == 'commons': - link = 'http://tools.wikimedia.de/~daniel/WikiSense/UntaggedImages.php?wikifam=comm…' - else: - link = 'http://tools.wikimedia.de/~daniel/WikiSense/UntaggedImages.php?wikilang=' + lang + '&wikifam=' + project + '&order=img_timestamp&max=' + str(limit) + '&ofs=0&max=' + str(limit) - text = pageText(link) - #print text - regexp = r"""<td valign='top' title='Name'><a href='http://.*?\..*?\.org/w/index\.php\?title=(.*?)'>.*?</a></td>""" - results = re.findall(regexp, text) - if results == []: - print link - raise NothingFound('Nothing found! Try to use the tool by yourself to be sure that it works!') - else: - for result in results: - yield wikipedia.Page(self.site, result) - -def newImages(limit): - # Search regular expression to find links like this (and the class attribute is optional too) - # class="new" title="Immagine:Soldatino2.jpg">Immagine:Soldatino2.jpg</a>" ‎ <span class="comment"> - url = "/w/index.php?title=Special:Log&type=upload&user=&page=&pattern=&limit=%d&offset=0" % int(limit) - site = wikipedia.getSite() - textrun = site.getUrl(url) - image_namespace = site.image_namespace() + ":" - regexp = r'(class=\"new\" |)title=\"' + image_namespace + '(.*?)\.(\w\w\w|jpeg)\">.*?</a>\".*?<span class=\"comment\">' - pos = 0 - done = list() - ext_list = list() - r = re.compile(regexp, re.UNICODE) - while 1: - m = r.search(textrun, pos) - if m == None: - wikipedia.output(u"\t\t>> All images checked. <<") - break - pos = m.end() - new = m.group(1) - im = m.group(2) - ext = m.group(3) - # This prevent pages with strange characters. They will be loaded without problem. - image = im + "." + ext - if new != '': - wikipedia.output(u"Skipping %s because it has been deleted." % image) - done.append(image) - if image not in done: - done.append(image) - yield wikipedia.Page(site, 'Image:%s' % image) - -def main(): - starsList = ['link[ _]fa', 'link[ _]adq', 'enllaç[ _]ad', - 'link[ _]ua', 'legătură[ _]af', 'destacado', - 'ua', 'liên k[ _]t[ _]chọn[ _]lọc'] - summary = None - addText = None - regexSkip = None - always = False - exceptUrl = False - genFactory = pagegenerators.GeneratorFactory() - errorCount = 0 - - for arg in wikipedia.handleArgs(): - if arg.startswith('-text'): - if len(arg) == 5: - addText = wikipedia.input(u'What text do you want to add?') - else: - addText = arg[6:] - elif arg.startswith('-summary'): - if len(arg) == 8: - summary = wikipedia.input(u'What summary do you want to use?') - else: - summary = arg[9:] - elif arg.startswith('-page'): - if len(arg) == 5: - generator = list(wikipedia.input(u'What page do you want to use?')) - else: - generator = listr(arg[6:]) - elif arg.startswith('-excepturl'): - exceptUrl = True - if len(arg) == 10: - regexSkip = wikipedia.input(u'What text should I skip?') - else: - regexSkip = arg[11:] - elif arg.startswith('-except'): - if len(arg) == 7: - regexSkip = wikipedia.input(u'What text should I skip?') - else: - regexSkip = arg[8:] - elif arg.startswith('-untagged'): - if len(arg) == 9: - untaggedProject = wikipedia.input(u'What project do you want to use?') - else: - untaggedProject = arg[10:] - generator = untaggedGenerator(untaggedProject) - elif arg.startswith('-newimages'): - if len(arg) == 10: - limit = wikipedia.input(u'How many images do you want to check?') - else: - limit = arg[11:] - generator = newImages(limit) - elif arg == '-always': - always = True - else: - generator = genFactory.handleArg(arg) - - site = wikipedia.getSite() - pathWiki = site.family.nicepath(site.lang) - if not generator: - raise NoEnoughData('You have to specify the generator you want to use for the script!') - if not addText: - raise NoEnoughData('You have to specify what text you want to add!') - if not summary: - summary = 'Bot: Adding %s' % addText - for page in generator: - wikipedia.output(u'Loading %s...' % page.title()) - try: - text = page.get() - except wikipedia.NoPage: - wikipedia.output(u"%s doesn't exist, skip!" % page.title()) - continue - except wikipedia.IsRedirectPage: - wikipedia.output(u"%s is a redirect, skip!" % page.title()) - continue - if regexSkip and exceptUrl: - url = '%s%s' % (pathWiki, page.urlname()) - result = re.findall(regexSkip, site.getUrl(url)) - elif regexSkip: - result = re.findall(regexSkip, text) - else: - result = [] - if result != []: - wikipedia.output(u'Exception! regex (or word) use with -except, is in the page. Skip!') - continue - newtext = text - categoryNamespace = site.namespace(14) - regexpCat = re.compile(r'\[\[((?:category|%s):.*?)\]\]' % categoryNamespace.lower(), re.I) - categorieInside = regexpCat.findall(text) - newtext = wikipedia.removeCategoryLinks(newtext, site) - interwikiInside = page.interwiki() - interwikiList = list() - for paginetta in interwikiInside: - nome = str(paginetta).split('[[')[1].split(']]')[0] - interwikiList.append(nome) - lang = nome.split(':')[0] - newtext = wikipedia.removeLanguageLinks(newtext, site) - interwikiList.sort() - newtext += "\n%s" % addText - for paginetta in categorieInside: - try: - newtext += '\n[[%s]]' % paginetta.decode('utf-8') - except UnicodeEncodeError: - try: - newtext += '\n[[%s]]' % paginetta.decode('Latin-1') - except UnicodeEncodeError: - newtext += '\n[[%s]]' % paginetta - newtext += '\n' - starsListInPage = list() - for star in starsList: - regex = re.compile('(\{\{(?:template:|)%s\|.*?\}\}\n)' % star, re.I) - risultato = regex.findall(newtext) - if risultato != []: - newtext = regex.sub('', newtext) - for element in risultato: - newtext += '\n%s' % element - for paginetta in interwikiList: - try: - newtext += '\n[[%s]]' % paginetta.decode('utf-8') - except UnicodeEncodeError: - try: - newtext += '\n[[%s]]' % paginetta.decode('Latin-1') - except UnicodeEncodeError: - newtext += '\n[[%s]]' % paginetta - wikipedia.output(u"\n\n>>> \03{lightpurple}%s\03{default} <<<" % page.title()) - wikipedia.showDiff(text, newtext) - while 1: - if not always: - choice = wikipedia.inputChoice(u'Do you want to accept these changes?', ['Yes', 'No', 'All'], ['y', 'N', 'a'], 'N') - if choice.lower() in ['a', 'all']: - always = True - if choice.lower() in ['n', 'no']: - break - if choice.lower() in ['y', 'yes'] or always: - try: - page.put(newtext, summary) - except wikipedia.EditConflict: - wikipedia.output(u'Edit conflict! skip!') - break - except wikipedia.ServerError: - errorCount += 1 - if errorCount < 5: - wikipedia.output(u'Server Error! Wait..') - time.sleep(3) - continue - else: - raise wikipedia.ServerError(u'Fifth Server Error!') - except wikipedia.SpamfilterError, e: - wikipedia.output(u'Cannot change %s because of blacklist entry %s' % (page.title(), e.url)) - break - except wikipedia.PageNotSaved, error: - wikipedia.output(u'Error putting page: %s' % (error.args,)) - break - except wikipedia.LockedPage: - wikipedia.output(u'Skipping %s (locked page)' % (page.title(),)) - break - else: - # Break only if the errors are one after the other... - errorCount = 0 - break -if __name__ == "__main__": - try: - main() - finally: - wikipedia.stopme() Copied: trunk/pywikipedia/add_text.py (from rev 4603, trunk/pywikipedia/AddText.py) =================================================================== --- trunk/pywikipedia/add_text.py (rev 0) +++ trunk/pywikipedia/add_text.py 2007-11-27 16:20:41 UTC (rev 4604) @@ -0,0 +1,284 @@ +#!/usr/bin/python +# -*- coding: utf-8 -*- +""" +This is a Bot written by Filnik to add a text in a given category. + +--- GenFactory Generator is used --- +-start Define from which page should the Bot start +-ref Use the ref as generator +-cat Use a category as generator +-filelinks Use all the links to an image as generator +-unusedfiles +-unwatched +-withoutinterwiki +-interwiki +-file +-uncatfiles +-uncatcat +-uncat +-subcat +-transcludes Use all the page that transclude a certain page as generator +-weblink Use the pages with a certain web link as generator +-links Use the links from a certain page as generator +-regex Only work on pages whose titles match the given regex + +--- Other parameters --- +-page Use a page as generator +-text Define which text add +-summary Define the summary to use +-except Use a regex to understand if the template is already in the page +-excepturl Use the html page as text where you want to see if there's the text, not the wiki-page. +-newimages Add text in the new images +-untagged Add text in the images that doesn't have any license template +-always If used, the bot won't asked if it should add the text specified +""" + +# +# (C) Filnik, 2007 +# +# Distributed under the terms of the MIT license. +# +__version__ = '$Id: AddText.py,v 1.0 2007/11/27 17:08:30 filnik Exp$' +# + +import re, pagegenerators, urllib2, urllib +import wikipedia, catlib + +class NoEnoughData(wikipedia.Error): + """ Error class for when the user doesn't specified all the data needed """ + +class NothingFound(wikipedia.Error): + """ An exception indicating that a regex has return [] instead of results.""" + +def pageText(url): + try: + request = urllib2.Request(url) + user_agent = 'Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.7.12) Gecko/20050915 Firefox/1.0.7' + request.add_header("User-Agent", user_agent) + response = urllib2.urlopen(request) + text = response.read() + response.close() + # When you load to many users, urllib2 can give this error. + except urllib2.HTTPError: + wikipedia.output(u"Server error. Pausing for 10 seconds... " + time.strftime("%d %b %Y %H:%M:%S (UTC)", time.gmtime()) ) + time.sleep(10) + request = urllib2.Request(url) + user_agent = 'Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.7.12) Gecko/20050915 Firefox/1.0.7' + request.add_header("User-Agent", user_agent) + response = urllib2.urlopen(request) + text = response.read() + response.close() + return text + +def untaggedGenerator(untaggedProject, limit = 500): + lang = untaggedProject.split('.', 1)[0] + project = '.' + untaggedProject.split('.', 1)[1] + if lang == 'commons': + link = 'http://tools.wikimedia.de/~daniel/WikiSense/UntaggedImages.php?wikifam=comm…' + else: + link = 'http://tools.wikimedia.de/~daniel/WikiSense/UntaggedImages.php?wikilang=' + lang + '&wikifam=' + project + '&order=img_timestamp&max=' + str(limit) + '&ofs=0&max=' + str(limit) + text = pageText(link) + #print text + regexp = r"""<td valign='top' title='Name'><a href='http://.*?\..*?\.org/w/index\.php\?title=(.*?)'>.*?</a></td>""" + results = re.findall(regexp, text) + if results == []: + print link + raise NothingFound('Nothing found! Try to use the tool by yourself to be sure that it works!') + else: + for result in results: + yield wikipedia.Page(self.site, result) + +def newImages(limit): + # Search regular expression to find links like this (and the class attribute is optional too) + # class="new" title="Immagine:Soldatino2.jpg">Immagine:Soldatino2.jpg</a>" ‎ <span class="comment"> + url = "/w/index.php?title=Special:Log&type=upload&user=&page=&pattern=&limit=%d&offset=0" % int(limit) + site = wikipedia.getSite() + textrun = site.getUrl(url) + image_namespace = site.image_namespace() + ":" + regexp = r'(class=\"new\" |)title=\"' + image_namespace + '(.*?)\.(\w\w\w|jpeg)\">.*?</a>\".*?<span class=\"comment\">' + pos = 0 + done = list() + ext_list = list() + r = re.compile(regexp, re.UNICODE) + while 1: + m = r.search(textrun, pos) + if m == None: + wikipedia.output(u"\t\t>> All images checked. <<") + break + pos = m.end() + new = m.group(1) + im = m.group(2) + ext = m.group(3) + # This prevent pages with strange characters. They will be loaded without problem. + image = im + "." + ext + if new != '': + wikipedia.output(u"Skipping %s because it has been deleted." % image) + done.append(image) + if image not in done: + done.append(image) + yield wikipedia.Page(site, 'Image:%s' % image) + +def main(): + starsList = ['link[ _]fa', 'link[ _]adq', 'enllaç[ _]ad', + 'link[ _]ua', 'legătură[ _]af', 'destacado', + 'ua', 'liên k[ _]t[ _]chọn[ _]lọc'] + summary = None + addText = None + regexSkip = None + always = False + exceptUrl = False + genFactory = pagegenerators.GeneratorFactory() + errorCount = 0 + + for arg in wikipedia.handleArgs(): + if arg.startswith('-text'): + if len(arg) == 5: + addText = wikipedia.input(u'What text do you want to add?') + else: + addText = arg[6:] + elif arg.startswith('-summary'): + if len(arg) == 8: + summary = wikipedia.input(u'What summary do you want to use?') + else: + summary = arg[9:] + elif arg.startswith('-page'): + if len(arg) == 5: + generator = list(wikipedia.input(u'What page do you want to use?')) + else: + generator = listr(arg[6:]) + elif arg.startswith('-excepturl'): + exceptUrl = True + if len(arg) == 10: + regexSkip = wikipedia.input(u'What text should I skip?') + else: + regexSkip = arg[11:] + elif arg.startswith('-except'): + if len(arg) == 7: + regexSkip = wikipedia.input(u'What text should I skip?') + else: + regexSkip = arg[8:] + elif arg.startswith('-untagged'): + if len(arg) == 9: + untaggedProject = wikipedia.input(u'What project do you want to use?') + else: + untaggedProject = arg[10:] + generator = untaggedGenerator(untaggedProject) + elif arg.startswith('-newimages'): + if len(arg) == 10: + limit = wikipedia.input(u'How many images do you want to check?') + else: + limit = arg[11:] + generator = newImages(limit) + elif arg == '-always': + always = True + else: + generator = genFactory.handleArg(arg) + + site = wikipedia.getSite() + pathWiki = site.family.nicepath(site.lang) + if not generator: + raise NoEnoughData('You have to specify the generator you want to use for the script!') + if not addText: + raise NoEnoughData('You have to specify what text you want to add!') + if not summary: + summary = 'Bot: Adding %s' % addText + for page in generator: + wikipedia.output(u'Loading %s...' % page.title()) + try: + text = page.get() + except wikipedia.NoPage: + wikipedia.output(u"%s doesn't exist, skip!" % page.title()) + continue + except wikipedia.IsRedirectPage: + wikipedia.output(u"%s is a redirect, skip!" % page.title()) + continue + if regexSkip and exceptUrl: + url = '%s%s' % (pathWiki, page.urlname()) + result = re.findall(regexSkip, site.getUrl(url)) + elif regexSkip: + result = re.findall(regexSkip, text) + else: + result = [] + if result != []: + wikipedia.output(u'Exception! regex (or word) use with -except, is in the page. Skip!') + continue + newtext = text + categoryNamespace = site.namespace(14) + regexpCat = re.compile(r'\[\[((?:category|%s):.*?)\]\]' % categoryNamespace.lower(), re.I) + categorieInside = regexpCat.findall(text) + newtext = wikipedia.removeCategoryLinks(newtext, site) + interwikiInside = page.interwiki() + interwikiList = list() + for paginetta in interwikiInside: + nome = str(paginetta).split('[[')[1].split(']]')[0] + interwikiList.append(nome) + lang = nome.split(':')[0] + newtext = wikipedia.removeLanguageLinks(newtext, site) + interwikiList.sort() + newtext += "\n%s" % addText + for paginetta in categorieInside: + try: + newtext += '\n[[%s]]' % paginetta.decode('utf-8') + except UnicodeEncodeError: + try: + newtext += '\n[[%s]]' % paginetta.decode('Latin-1') + except UnicodeEncodeError: + newtext += '\n[[%s]]' % paginetta + newtext += '\n' + starsListInPage = list() + for star in starsList: + regex = re.compile('(\{\{(?:template:|)%s\|.*?\}\}\n)' % star, re.I) + risultato = regex.findall(newtext) + if risultato != []: + newtext = regex.sub('', newtext) + for element in risultato: + newtext += '\n%s' % element + for paginetta in interwikiList: + try: + newtext += '\n[[%s]]' % paginetta.decode('utf-8') + except UnicodeEncodeError: + try: + newtext += '\n[[%s]]' % paginetta.decode('Latin-1') + except UnicodeEncodeError: + newtext += '\n[[%s]]' % paginetta + wikipedia.output(u"\n\n>>> \03{lightpurple}%s\03{default} <<<" % page.title()) + wikipedia.showDiff(text, newtext) + while 1: + if not always: + choice = wikipedia.inputChoice(u'Do you want to accept these changes?', ['Yes', 'No', 'All'], ['y', 'N', 'a'], 'N') + if choice.lower() in ['a', 'all']: + always = True + if choice.lower() in ['n', 'no']: + break + if choice.lower() in ['y', 'yes'] or always: + try: + page.put(newtext, summary) + except wikipedia.EditConflict: + wikipedia.output(u'Edit conflict! skip!') + break + except wikipedia.ServerError: + errorCount += 1 + if errorCount < 5: + wikipedia.output(u'Server Error! Wait..') + time.sleep(3) + continue + else: + raise wikipedia.ServerError(u'Fifth Server Error!') + except wikipedia.SpamfilterError, e: + wikipedia.output(u'Cannot change %s because of blacklist entry %s' % (page.title(), e.url)) + break + except wikipedia.PageNotSaved, error: + wikipedia.output(u'Error putting page: %s' % (error.args,)) + break + except wikipedia.LockedPage: + wikipedia.output(u'Skipping %s (locked page)' % (page.title(),)) + break + else: + # Break only if the errors are one after the other... + errorCount = 0 + break +if __name__ == "__main__": + try: + main() + finally: + wikipedia.stopme()

1 0

← Newer
1
2
3
4
5
6
7
...
29
Older →

Jump to page:

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

pywikibot November 2007