pywikibot January 2009

pywikibot@lists.wikimedia.org

21 participants
246 discussions

[Pywikipedia-l] SVN: [6263] trunk/pywikipedia/spellcheck.py
by a_engels＠svn.wikimedia.org 16 Jan '09

16 Jan '09

Revision: 6263 Author: a_engels Date: 2009-01-15 22:39:04 +0000 (Thu, 15 Jan 2009) Log Message: ----------- Add option "p" to ignore a word for one page. Modified Paths: -------------- trunk/pywikipedia/spellcheck.py Modified: trunk/pywikipedia/spellcheck.py =================================================================== --- trunk/pywikipedia/spellcheck.py 2009-01-15 14:43:33 UTC (rev 6262) +++ trunk/pywikipedia/spellcheck.py 2009-01-15 22:39:04 UTC (rev 6263) @@ -152,17 +152,20 @@ if word[0].isupper(): wikipedia.output(u"c: Add '%s' as correct"%(uncap(word))) wikipedia.output(u"i: Ignore once") + wikipedia.output(u"p: Ignore on this page") wikipedia.output(u"r: Replace text") wikipedia.output(u"s: Replace text, but do not save as alternative") wikipedia.output(u"g: Guess (give me a list of similar words)") wikipedia.output(u"*: Edit by hand") wikipedia.output(u"x: Do not check the rest of this page") answer = wikipedia.input(u":") - if answer in "aAiI": + if answer in "aAiIpP": correct = word - if not answer in "iI": + if answer in "aA": knownwords[word] = word newwords.append(word) + elif answer in "pP": + pageskip.append(word) elif answer in "rRsS": correct = wikipedia.input(u"What should I replace it by?") if answer in "rR": @@ -247,6 +250,7 @@ return result def spellcheck(page, checknames = True, knownonly = False): + pageskip = [] text = page if correct_html_codes: text = removeHTML(text) @@ -282,6 +286,7 @@ loc += len(match.group(2)) if correct_html_codes: text = removeHTML(text) + pageskip = [] return text class Word(object): @@ -350,6 +355,8 @@ # be found incorrect if it is not on the list as a correctly spelled word. if self.word == "": return True + if self.word in pageskip: + return True try: if knownwords[self.word] == self.word: return True @@ -400,6 +407,7 @@ return self.alternatives try: + pageskip = [] edit = SpecialTerm("edit") endpage = SpecialTerm("end page") title = []

1 0

[Pywikipedia-l] [ pywikipediabot-Feature Requests-1722782 ] interwiki.py should follow category redirect templates
by SourceForge.net 15 Jan '09

15 Jan '09

Feature Requests item #1722782, was opened at 2007-05-21 11:21 Message generated for change (Comment added) made by russblau You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=603141&aid=1722782&group_… Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. Category: None Group: None Status: Open Priority: 5 Private: No Submitted By: Byrial Ole Jensen (byrial) Assigned to: Russell Blau (russblau) Summary: interwiki.py should follow category redirect templates Initial Comment: Sometimes when a category is moved, a template like [[en:template:Template:Category redirect]] (see its interwiki links for similar templates in other languages) is left at the old category page. It would be good if interwiki.py could check for the presence of a such template and follow the redirect to the new category as given by the template argument. ---------------------------------------------------------------------- >Comment By: Russell Blau (russblau) Date: 2009-01-15 09:45 Message: aronsson: Although your patch works, it causes an unacceptable slowdown in the loading of category pages. I have therefore reverted the change, and will not implement this feature for the time being. I'll leave this open in case anyone has a better idea. ---------------------------------------------------------------------- Comment By: Lars Aronsson (aronsson) Date: 2009-01-12 17:22 Message: The previous comment was a failed attempt to submit a patch. The same code is available as the attached file mydiff. Most of this patch is a list of template names, that originated in category_redirect.py and really should move to family.py or some place like that. What I've done is to add an "elif" branch in the two places, where self.site.redirectRegex() is tested, so it also looks for these templates, but only if we are in a category page. Maybe self.site shouldn't hand out a regex, but instead provide the function that tests for redirects. Feel free to refactor this. ---------------------------------------------------------------------- Comment By: Lars Aronsson (aronsson) Date: 2009-01-09 20:41 Message: Thanks, I hadn't even looked in category_redirect.py. For the moment, I just copied the list of template names to my version of wikipedia.py so all my changes are in one file. I have updated the list with more template names (and more synonyms). The detection of #REDIRECT in wikipedia.py is done in two places, using self.site.redirectRegex() both in Page._getEditPage() and GetAll.oneDone(). These are the two places I added an "elif" branch to look for category redirects. I don't fully understand why there needs to be two places to do this test, but that's a matter of overall design. The naming of redirectRegex() is also hardwired to the use of a single regex, which doesn't scale to category redirects. Perhaps a refactoring would lead to that function being renamed to isRedirect(). I think redirect detection does belong in the Site object, since it depends on language-specific synonyms to REDIRECT and to specific templates used for category redirects. ---------------------------------------------------------------------- Comment By: Russell Blau (russblau) Date: 2009-01-09 08:17 Message: category_redirect.py already contains a list of category redirect templates, although only for a few sites. If it is desired to use this capability in other bots, then the template lists should probably be moved into the family files, and an is_category_redirect() method added to the Category object in catlib.py, or alternatively to the Page object. ---------------------------------------------------------------------- Comment By: Lars Aronsson (aronsson) Date: 2009-01-09 06:40 Message: I now have some code that I believe solves this. But since I'm a beginner in Python, I'd like someone more experienced to look at my code before it is submitted. ---------------------------------------------------------------------- Comment By: Lars Aronsson (aronsson) Date: 2009-01-08 19:46 Message: The previous comment was by me. I don't know why I wasn't logged in. ---------------------------------------------------------------------- Comment By: Nobody/Anonymous (nobody) Date: 2009-01-08 19:43 Message: Implementing this feature involves several steps. First the template needs to be detected. This is similar to isDisambig() in wikipedia.py. Perhaps that function should also require isCategory(), so the template is only detected when used in category pages. Unfortunately, there is no equivalent to the MediaWiki:Disambiguationspage to help us find out what the template name is in each language, so we have to list the template translations for each language. I think that should be manageable. I propose the new function be called isCategoryRedirect(). Then this function needs to be introduced where isRedirect() is used. Or perhaps isRedirect() should call it? That would save a lot of work. Are there some situations where it would be harmful to detect this template? Should the use of the new function be configurable? ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=603141&aid=1722782&group_…

1 0

[Pywikipedia-l] SVN: [6262] trunk/pywikipedia/wikipedia.py
by russblau＠svn.wikimedia.org 15 Jan '09

15 Jan '09

Revision: 6262 Author: russblau Date: 2009-01-15 14:43:33 +0000 (Thu, 15 Jan 2009) Log Message: ----------- revert to r6251 Modified Paths: -------------- trunk/pywikipedia/wikipedia.py Modified: trunk/pywikipedia/wikipedia.py =================================================================== --- trunk/pywikipedia/wikipedia.py 2009-01-15 13:54:29 UTC (rev 6261) +++ trunk/pywikipedia/wikipedia.py 2009-01-15 14:43:33 UTC (rev 6262) @@ -1,4 +1,4 @@ -# -*- coding: utf-8 -*- +# -*- coding: utf-8 -*- """ Library to get and put pages on a MediaWiki. @@ -834,10 +834,6 @@ self._redirarg = redirtarget else: raise IsRedirectPage(redirtarget) - elif self.is_category_redirect(pagetext): # sets _redirarg - if not get_redirect: - self._getexception = IsRedirectPage - raise IsRedirectPage(self._redirarg) if self.section(): # TODO: What the hell is this? Docu please. m = re.search("\.3D\_*(\.27\.27+)?(\.5B\.5B)?\_*%s\_*(\.5B\.5B)?(\.27\.27+)?\_*\.3D" % re.escape(self.section()), sectionencode(text,self.site().encoding())) @@ -981,41 +977,6 @@ return False return False - def is_category_redirect(self, text=None): - """Return True if this is a category redirect. - - Category redirects are identified by the presence of any of the - templates found in self.site().category_redirects(), including - redirects to any of those templates, in the page text. - - """ - if not self.isCategory(): - return False - if not hasattr(self, "_catredirect"): - redir_list = [Page(self.site(), name, defaultNamespace=10) - for name in self.site().category_redirects()] - try: - templates_and_params = self.templatesWithParams( - thistxt=text, - get_redirect=True) - except Error: # couldn't retrieve templates - self._catredirect = False - else: - for item in templates_and_params: - tempname = item[0] - template = Page(self.site(), tempname, defaultNamespace=10) - while template.isRedirectPage(): - template = template.getRedirectTarget() - if template in redir_list: - self._catredirect = True - self._redirarg = Page(self.site(), item[1][0], - defaultNamespace=14).title() - # treat first template arg as name of target category - break - else: - self._catredirect = False - return self._catredirect - def isEmpty(self): """Return True if the page text has less than 4 characters. @@ -3001,8 +2962,6 @@ page2._revisionId = revisionId page2._editTime = timestamp section = page2.section() - # Store the content - page2._contents = text m = self.site.redirectRegex().match(text) if m: ## output(u"%s is a redirect" % page2.aslink()) @@ -3011,36 +2970,26 @@ redirectto = redirectto+"#"+section page2._getexception = IsRedirectPage page2._redirarg = redirectto - elif page2.is_category_redirect(): - page2._getexception = IsRedirectPage - # This is used for checking deletion conflict. # Use the data loading time. - page2._startTime = time.strftime('%Y%m%d%H%M%S', - time.gmtime()) + page2._startTime = time.strftime('%Y%m%d%H%M%S', time.gmtime()) if section: - # WHAT IS THIS? - m = re.search( - "\.3D\_*(\.27\.27+)?(\.5B\.5B)?\_*%s\_*(\.5B\.5B)?(\.27\.27+)?\_*\.3D" - % re.escape(section), - sectionencode(text, page2.site().encoding())) + m = re.search("\.3D\_*(\.27\.27+)?(\.5B\.5B)?\_*%s\_*(\.5B\.5B)?(\.27\.27+)?\_*\.3D" % re.escape(section), sectionencode(text,page2.site().encoding())) if not m: try: page2._getexception - output(u"WARNING: Section not found: %s" - % page2.aslink(forceInterwiki = True)) + output(u"WARNING: Section not found: %s" % page2.aslink(forceInterwiki = True)) except AttributeError: # There is no exception yet page2._getexception = SectionError + # Store the content + page2._contents = text successful = True # Note that there is no break here. The reason is that there # might be duplicates in the pages list. if not successful: - output(u"BUG>> title %s (%s) not found in list" - % (title, page.aslink(forceInterwiki=True))) - output(u'Expected one of: %s' - % u','.join([page2.aslink(forceInterwiki=True) - for page2 in self.pages])) + output(u"BUG>> title %s (%s) not found in list" % (title, page.aslink(forceInterwiki=True))) + output(u'Expected one of: %s' % u','.join([page2.aslink(forceInterwiki=True) for page2 in self.pages])) raise PageNotFound def headerDone(self, header): @@ -6135,9 +6084,6 @@ """Return list of language codes that can be used in interwiki links.""" return self._validlanguages - def category_redirects(self): - return self.family.category_redirects(self.lang, fallback="_default") - def disambcategory(self): """Return Category in which disambig pages are listed.""" import catlib @@ -6861,7 +6807,6 @@ raise return data - class MyURLopener(urllib.FancyURLopener): version="PythonWikipediaBot/1.0" @@ -6872,6 +6817,7 @@ return urllib.FancyURLopener.http_error_default(self, url, fp, errcode, errmsg, headers) + # Special opener in case we are using a site with authentication if config.authenticate: import urllib2, cookielib

1 0

[Pywikipedia-l] SVN: [6261] trunk/pywikipedia/wikipedia.py
by russblau＠svn.wikimedia.org 15 Jan '09

15 Jan '09

Revision: 6261 Author: russblau Date: 2009-01-15 13:54:29 +0000 (Thu, 15 Jan 2009) Log Message: ----------- Fix category-redirect bug Modified Paths: -------------- trunk/pywikipedia/wikipedia.py Modified: trunk/pywikipedia/wikipedia.py =================================================================== --- trunk/pywikipedia/wikipedia.py 2009-01-15 05:15:14 UTC (rev 6260) +++ trunk/pywikipedia/wikipedia.py 2009-01-15 13:54:29 UTC (rev 6261) @@ -1011,8 +1011,9 @@ self._redirarg = Page(self.site(), item[1][0], defaultNamespace=14).title() # treat first template arg as name of target category - else: - self._catredirect = False + break + else: + self._catredirect = False return self._catredirect def isEmpty(self):

1 0

[Pywikipedia-l] [ pywikipediabot-Bugs-2508909 ] AttributeError: no attribute _catredirect
by SourceForge.net 15 Jan '09

15 Jan '09

Bugs item #2508909, was opened at 2009-01-15 07:18 Message generated for change (Tracker Item Submitted) made by Item Submitter You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=603138&aid=2508909&group_… Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. Category: General Group: None Status: Open Resolution: None Priority: 5 Private: No Submitted By: Mikko Silvonen (silvonen) Assigned to: Nobody/Anonymous (nobody) Summary: AttributeError: no attribute _catredirect Initial Comment: I just got this error message. Is there something wrong with the newly added support for category redirects? >interwiki.py -untranslated "Luokka:Gregg Arakin ohjaamat elokuvat" Checked for running processes. 2 processes currently running, including the current process. Getting 1 pages from wikipedia:fi... NOTE: [[fi:Luokka:Gregg Arakin ohjaamat elokuvat]] does not have any interwiki links Give a hint (? to see pagetext): en:Category:Films directed by Gregg Araki Give a hint (? to see pagetext): nl:Categorie:Film van Gregg Araki Give a hint (? to see pagetext): Getting 1 pages from wikipedia:nl... Getting 1 pages from wikipedia:en... Traceback (most recent call last): File "c:\svn\pywikipedia\pagegenerators.py", line 764, in __iter__ for loaded_page in self.preload(somePages): File "c:\svn\pywikipedia\pagegenerators.py", line 781, in preload wikipedia.getall(site, pagesThisSite) File "c:\svn\pywikipedia\wikipedia.py", line 3134, in getall _GetAll(site, pages, throttle, force).run() File "c:\svn\pywikipedia\wikipedia.py", line 2967, in run xml.sax.parseString(data, handler) File "C:\Python25\lib\xml\sax\__init__.py", line 49, in parseString parser.parse(inpsrc) File "C:\Python25\lib\xml\sax\expatreader.py", line 107, in parse xmlreader.IncrementalParser.parse(self, source) File "C:\Python25\lib\xml\sax\xmlreader.py", line 123, in parse self.feed(buffer) File "C:\Python25\lib\xml\sax\expatreader.py", line 207, in feed self._parser.Parse(data, isFinal) File "C:\Python25\lib\xml\sax\expatreader.py", line 304, in end_element self._cont_handler.endElement(name) File "c:\svn\pywikipedia\xmlreader.py", line 176, in endElement self.callback(entry) File "c:\svn\pywikipedia\wikipedia.py", line 3013, in oneDone elif page2.is_category_redirect(): File "c:\svn\pywikipedia\wikipedia.py", line 1016, in is_category_redirect return self._catredirect AttributeError: 'Page' object has no attribute '_catredirect' 'Page' object has no attribute '_catredirect' >python version.py Pywikipedia [http] trunk/pywikipedia (r6259, Jan 14 2009, 21:01:07) Python 2.5.1 (r251:54863, May 1 2007, 17:47:05) [MSC v.1310 32 bit (Intel)] ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=603138&aid=2508909&group_…

1 0

[Pywikipedia-l] SVN: [6260] trunk/pywikipedia/redirect.py
by purodha＠svn.wikimedia.org 15 Jan '09

15 Jan '09

Revision: 6260 Author: purodha Date: 2009-01-15 05:15:14 +0000 (Thu, 15 Jan 2009) Log Message: ----------- Message for nds added. Modified Paths: -------------- trunk/pywikipedia/redirect.py Modified: trunk/pywikipedia/redirect.py =================================================================== --- trunk/pywikipedia/redirect.py 2009-01-14 21:01:07 UTC (rev 6259) +++ trunk/pywikipedia/redirect.py 2009-01-15 05:15:14 UTC (rev 6260) @@ -68,6 +68,7 @@ 'ksh':u'Bot: Dubbel Ömlëijdong fottjemaat', 'lb': u'Bot: Duebel Viruleedung gefléckt', 'lt': u'robotas: Taisomas dvigubas peradresavimas', + 'nds':u'Bot: Dubbelte Wiederleiden rutmakt', 'nl': u'Robot: Dubbele doorverwijzing gecorrigeerd', 'nn': u'robot: retta dobbel omdirigering', 'no': u'bot: Retter dobbel omdirigering', @@ -102,6 +103,7 @@ 'kk': u'Бот: Айдату нысанасы жоқ болды', 'ksh':u'Bot: Dė Ömlëijdong jingk ennet Liiere', 'lt': u'robotas: Peradresavimas į niekur', + 'nds':u'Bot: Kaputte Wiederleiden rutmakt', 'nl': u'Robot: Doel doorverwijzing bestaat niet', 'nn': u'robot: målet for omdirigeringa eksisterer ikkje', 'no': u'robot: målet for omdirigeringen eksisterer ikke', @@ -115,21 +117,23 @@ 'zh-yue': u'機械人：跳轉目標唔存在', } -#Summary message for put broken redirect to speedy delete +# Summary message for putting broken redirect to speedy delete sd_tagging_sum = { 'ar': u'روبوت: وسم للحذف السريع', 'en': u'Robot: Tagging for speedy deletion', 'ja': u'ロボットによる:迷子のリダイレクトを即時削除へ', 'ksh':u'Bot: Di Ömlëijdong jeiht noh nörjendwoh.', + 'nds':u'Bot: Kaputte Wiederleiden ward nich brukt', 'zh':u'機器人: 將損壞的重定向提報快速刪除', } -#put deletion template +# Insert deletion template into page with a broken redirect sd_template = { 'ar':u'{{شطب|تحويلة مكسورة}}', 'en':u'{{db-r1}}', 'ja':u'{{即時削除|壊れたリダイレクト}}', 'ksh':u'{{Schmieß fott}}Di Ömlëijdong jeiht noh nörjendwoh hen.<br />--~~~~~', + 'nds':u'{{delete}}Kaputte Wiederleiden, wat nich brukt ward.<br />--~~~~', 'zh':u'{{delete|R1}}', }

1 0

[Pywikipedia-l] SVN: [6259] trunk/pywikipedia
by russblau＠svn.wikimedia.org 15 Jan '09

15 Jan '09

Revision: 6259 Author: russblau Date: 2009-01-14 21:01:07 +0000 (Wed, 14 Jan 2009) Log Message: ----------- Implement category redirect detection; category pages containing a listed redirect template will be treated as redirect pages (e.g., .IsRedirectPage() will return True) Modified Paths: -------------- trunk/pywikipedia/families/commons_family.py trunk/pywikipedia/families/wikipedia_family.py trunk/pywikipedia/family.py trunk/pywikipedia/wikipedia.py Modified: trunk/pywikipedia/families/commons_family.py =================================================================== --- trunk/pywikipedia/families/commons_family.py 2009-01-14 18:22:16 UTC (rev 6258) +++ trunk/pywikipedia/families/commons_family.py 2009-01-14 21:01:07 UTC (rev 6259) @@ -28,10 +28,16 @@ } self.interwiki_forward = 'wikipedia' + + self.category_redirect_templates = { + 'commons': ('Category redirect',), + } + self.disambiguationTemplates = { 'commons': [u'Disambig', u'Disambiguation', u'Razločitev', u'Begriffsklärung'] } + self.disambcatname = { 'commons': u'Disambiguation' } Modified: trunk/pywikipedia/families/wikipedia_family.py =================================================================== --- trunk/pywikipedia/families/wikipedia_family.py 2009-01-14 18:22:16 UTC (rev 6258) +++ trunk/pywikipedia/families/wikipedia_family.py 2009-01-14 21:01:07 UTC (rev 6259) @@ -540,6 +540,48 @@ 'als': u'Nochricht Diskussion', } + self.category_redirect_templates = { + '_default': (), + 'ar': (u"تحويل تصنيف",), + 'arz': (u'تحويل تصنيف',), + 'cs': (u'Zastaralá kategorie',), + 'da': (u'Kategoriomdirigering',), + 'de': (u'Kategorieweiterleitung',), + 'en': (u"Category redirect", + u"Category redirect3", + ), + 'es': (u'Categoría redirigida',), + 'eu': (u'Kategoria redirect',), + 'fa': (u'رده بهتر', + u'انتقال رده', + u'فیلم‌های امریکایی'), + 'fr': (u'Redirection de catégorie',), + 'hi': (u'श्रेणीअनुप्रेषित',), + 'id': (u'Alih kategori',), + # 'it' has removed its template + # 'ja' is discussing to remove this template + 'ja': (u"Category redirect",), + 'ko': (u'분류 넘겨주기',), + 'mk': (u'Премести категорија',), + 'ms': (u'Pengalihan kategori',), + 'mt': (u'Redirect kategorija',), + # 'nl' has removed its template + 'no': (u"Kategoriomdirigering",), + 'pl': (u'Przekierowanie kategorii',), + 'pt': (u'Redirecionamento de categoria',), + 'ro': (u'Redirect categorie',), + 'ru': (u'Переименованная категория',), + 'simple': (u"Category redirect",), + 'sq': (u'Kategori e zhvendosur',), + 'tl': (u'Category redirect',), + 'tr': (u'Kategori yönlendirme',), + 'uk': (u'Categoryredirect',), + 'vi': (u'Đổi hướng thể loại',), + 'yi': (u'קאטעגאריע אריבערפירן',), + 'zh': (u'分类重定向',), + 'zh-yue': (u'分類彈去',), + } + self.disambiguationTemplates = { # set value to None, instead of a list, to retrieve names from # the live wiki ([[MediaWiki:Disambiguationspage]] Modified: trunk/pywikipedia/family.py =================================================================== --- trunk/pywikipedia/family.py 2009-01-14 18:22:16 UTC (rev 6258) +++ trunk/pywikipedia/family.py 2009-01-14 21:01:07 UTC (rev 6259) @@ -2851,6 +2851,12 @@ 'zzz wiki': 'zzz wiki', } + # A list of category redirect template names in different languages + # Note: It is *not* necessary to list template redirects here + self.category_redirect_templates = { + '_default': [] + } + # A list of disambiguation template names in different languages self.disambiguationTemplates = { '_default': [] @@ -3123,6 +3129,16 @@ # give up return None + def category_redirects(self, code, fallback="_default"): + if code in self.category_redirect_templates: + return self.category_redirect_templates[code] + elif fallback: + return self.category_redirect_templates[fallback] + else: + raise KeyError( +"ERROR: title for category redirect template in language '%s' unknown" + % code) + def disambig(self, code, fallback = '_default'): if self.disambiguationTemplates.has_key(code): return self.disambiguationTemplates[code] Modified: trunk/pywikipedia/wikipedia.py =================================================================== --- trunk/pywikipedia/wikipedia.py 2009-01-14 18:22:16 UTC (rev 6258) +++ trunk/pywikipedia/wikipedia.py 2009-01-14 21:01:07 UTC (rev 6259) @@ -1,4 +1,4 @@ -# -*- coding: utf-8 -*- +# -*- coding: utf-8 -*- """ Library to get and put pages on a MediaWiki. @@ -834,6 +834,10 @@ self._redirarg = redirtarget else: raise IsRedirectPage(redirtarget) + elif self.is_category_redirect(pagetext): # sets _redirarg + if not get_redirect: + self._getexception = IsRedirectPage + raise IsRedirectPage(self._redirarg) if self.section(): # TODO: What the hell is this? Docu please. m = re.search("\.3D\_*(\.27\.27+)?(\.5B\.5B)?\_*%s\_*(\.5B\.5B)?(\.27\.27+)?\_*\.3D" % re.escape(self.section()), sectionencode(text,self.site().encoding())) @@ -977,6 +981,40 @@ return False return False + def is_category_redirect(self, text=None): + """Return True if this is a category redirect. + + Category redirects are identified by the presence of any of the + templates found in self.site().category_redirects(), including + redirects to any of those templates, in the page text. + + """ + if not self.isCategory(): + return False + if not hasattr(self, "_catredirect"): + redir_list = [Page(self.site(), name, defaultNamespace=10) + for name in self.site().category_redirects()] + try: + templates_and_params = self.templatesWithParams( + thistxt=text, + get_redirect=True) + except Error: # couldn't retrieve templates + self._catredirect = False + else: + for item in templates_and_params: + tempname = item[0] + template = Page(self.site(), tempname, defaultNamespace=10) + while template.isRedirectPage(): + template = template.getRedirectTarget() + if template in redir_list: + self._catredirect = True + self._redirarg = Page(self.site(), item[1][0], + defaultNamespace=14).title() + # treat first template arg as name of target category + else: + self._catredirect = False + return self._catredirect + def isEmpty(self): """Return True if the page text has less than 4 characters. @@ -2962,6 +3000,8 @@ page2._revisionId = revisionId page2._editTime = timestamp section = page2.section() + # Store the content + page2._contents = text m = self.site.redirectRegex().match(text) if m: ## output(u"%s is a redirect" % page2.aslink()) @@ -2970,26 +3010,36 @@ redirectto = redirectto+"#"+section page2._getexception = IsRedirectPage page2._redirarg = redirectto + elif page2.is_category_redirect(): + page2._getexception = IsRedirectPage + # This is used for checking deletion conflict. # Use the data loading time. - page2._startTime = time.strftime('%Y%m%d%H%M%S', time.gmtime()) + page2._startTime = time.strftime('%Y%m%d%H%M%S', + time.gmtime()) if section: - m = re.search("\.3D\_*(\.27\.27+)?(\.5B\.5B)?\_*%s\_*(\.5B\.5B)?(\.27\.27+)?\_*\.3D" % re.escape(section), sectionencode(text,page2.site().encoding())) + # WHAT IS THIS? + m = re.search( + "\.3D\_*(\.27\.27+)?(\.5B\.5B)?\_*%s\_*(\.5B\.5B)?(\.27\.27+)?\_*\.3D" + % re.escape(section), + sectionencode(text, page2.site().encoding())) if not m: try: page2._getexception - output(u"WARNING: Section not found: %s" % page2.aslink(forceInterwiki = True)) + output(u"WARNING: Section not found: %s" + % page2.aslink(forceInterwiki = True)) except AttributeError: # There is no exception yet page2._getexception = SectionError - # Store the content - page2._contents = text successful = True # Note that there is no break here. The reason is that there # might be duplicates in the pages list. if not successful: - output(u"BUG>> title %s (%s) not found in list" % (title, page.aslink(forceInterwiki=True))) - output(u'Expected one of: %s' % u','.join([page2.aslink(forceInterwiki=True) for page2 in self.pages])) + output(u"BUG>> title %s (%s) not found in list" + % (title, page.aslink(forceInterwiki=True))) + output(u'Expected one of: %s' + % u','.join([page2.aslink(forceInterwiki=True) + for page2 in self.pages])) raise PageNotFound def headerDone(self, header): @@ -6084,6 +6134,9 @@ """Return list of language codes that can be used in interwiki links.""" return self._validlanguages + def category_redirects(self): + return self.family.category_redirects(self.lang, fallback="_default") + def disambcategory(self): """Return Category in which disambig pages are listed.""" import catlib @@ -6807,6 +6860,7 @@ raise return data + class MyURLopener(urllib.FancyURLopener): version="PythonWikipediaBot/1.0" @@ -6817,7 +6871,6 @@ return urllib.FancyURLopener.http_error_default(self, url, fp, errcode, errmsg, headers) - # Special opener in case we are using a site with authentication if config.authenticate: import urllib2, cookielib

1 0

[Pywikipedia-l] SVN: [6258] branches/rewrite/pywikibot/scripts/solve_disambiguation.py
by russblau＠svn.wikimedia.org 15 Jan '09

15 Jan '09

Revision: 6258 Author: russblau Date: 2009-01-14 18:22:16 +0000 (Wed, 14 Jan 2009) Log Message: ----------- port to new framework; reformat long lines and refactor some code for efficiency Modified Paths: -------------- branches/rewrite/pywikibot/scripts/solve_disambiguation.py Modified: branches/rewrite/pywikibot/scripts/solve_disambiguation.py =================================================================== --- branches/rewrite/pywikibot/scripts/solve_disambiguation.py 2009-01-14 18:21:42 UTC (rev 6257) +++ branches/rewrite/pywikibot/scripts/solve_disambiguation.py 2009-01-14 18:22:16 UTC (rev 6258) @@ -16,8 +16,9 @@ It is possible to choose to replace only the link (just type the number) or replace both link and link-text (type 'r' followed by the number). -Multiple references in one page will be scanned in order, but typing 'n' (next) -on any one of them will leave the complete page unchanged. To leave only some reference unchanged, use the 's' (skip) option. +Multiple references in one page will be scanned in order, but typing 'n' +(next) on any one of them will leave the complete page unchanged. To leave +only some reference unchanged, use the 's' (skip) option. Command line options: @@ -36,9 +37,10 @@ Note: this is the same as -primary -just -pos:XY -file:XYZ reads a list of pages from a text file. XYZ is the name of the - file from which the list is taken. If XYZ is not given, the user is asked for a filename. - Page titles should be inside [[double brackets]]. - The -pos parameter won't work if -file is used. + file from which the list is taken. If XYZ is not given, the + user is asked for a filename. Page titles should be inside + [[double brackets]]. The -pos parameter won't work if -file + is used. -always:XY instead of asking the user what to do, always perform the same action. For example, XY can be "r0", "u" or "2". Be careful with @@ -49,10 +51,10 @@ -main only check pages in the main namespace, not in the talk, wikipedia, user, etc. namespaces. - -start:XY goes through all disambiguation pages in the category on your wiki - that is defined (to the bot) as the category containing disambiguation - pages, starting at XY. If only '-start' or '-start:' is given, it starts - at the beginning. + -start:XY goes through all disambiguation pages in the category on your + wiki that is defined (to the bot) as the category containing + disambiguation pages, starting at XY. If only '-start' or + '-start:' is given, it starts at the beginning. -min:XX (XX being a number) only work on disambiguation pages for which at least XX are to be worked on. @@ -66,6 +68,7 @@ # (C) Daniel Herding, 2004 # (C) Andre Engels, 2003-2004 # (C) WikiWichtel, 2004 +# (C) Pywikipediabot team, 2003-2009 # # Distributed under the terms of the MIT license. # @@ -75,7 +78,9 @@ import re, sys, codecs # Application specific imports -import wikipedia, pagegenerators, editarticle +import pywikibot +from pywikibot import config, pagegenerators +from pywikibot.scripts import editarticle # Summary message when working on disambiguation pages msg = { @@ -389,7 +394,8 @@ return string[0].upper()+string[1:] def correctcap(link, text): - # If text links to a page with title link uncapitalized, uncapitalize link, otherwise capitalize it + # If text links to a page with title link uncapitalized, + # uncapitalize link, otherwise capitalize it linkupper = link.title() linklower = linkupper[0].lower() + linkupper[1:] if text.find("[[%s]]"%linklower) > -1 or text.find("[[%s|"%linklower) > -1: @@ -407,24 +413,29 @@ def __iter__(self): # TODO: start yielding before all referring pages have been found - refs = [page for page in self.disambPage.getReferences(follow_redirects = False, withTemplateInclusion = False)] - wikipedia.output(u"Found %d references." % len(refs)) + refs = [page for page in + self.disambPage.getReferences(follow_redirects=False, + withTemplateInclusion=False)] + pywikibot.output(u"Found %d references." % len(refs)) # Remove ignorables - if ignore_title.has_key(self.disambPage.site().family.name) and ignore_title[self.disambPage.site().family.name].has_key(self.disambPage.site().lang): - for ig in ignore_title[self.disambPage.site().family.name][self.disambPage.site().lang]: + if self.disambPage.site().family.name in ignore_title \ + and self.disambPage.site().lang \ + in ignore_title[self.disambPage.site().family.name]: + for ig in ignore_title[self.disambPage.site().family.name + ][self.disambPage.site().lang]: for i in range(len(refs)-1, -1, -1): if re.match(ig, refs[i].title()): - if wikipedia.verbose: - wikipedia.output('Ignoring page %s' - % refs[i].title()) + pywikibot.output(u'Ignoring page %s' + % refs[i].title(), + level=pywikibot.VERBOSE) del refs[i] elif self.primaryIgnoreManager.isIgnored(refs[i]): - #wikipedia.output('Ignoring page %s because it was skipped before' % refs[i].title()) del refs[i] if len(refs) < self.minimum: - wikipedia.output(u"Found only %d pages to work on; skipping." % len(refs)) + pywikibot.output(u"Found only %d pages to work on; skipping." + % len(refs)) return - wikipedia.output(u"Will work on %d pages." % len(refs)) + pywikibot.output(u"Will work on %d pages." % len(refs)) for ref in refs: yield ref @@ -439,10 +450,12 @@ self.enabled = enabled self.ignorelist = [] - filename = wikipedia.config.datafilepath('disambiguations', - self.disambPage.titleForFilename() + '.txt') + filename = config.datafilepath( + 'disambiguations', + self.disambPage.titleForFilename() + '.txt') try: - # The file is stored in the disambiguation/ subdir. Create if necessary. + # The file is stored in the disambiguation/ subdir. + # Create if necessary. f = codecs.open(filename, 'r', 'utf-8') for line in f.readlines(): # remove trailing newlines and carriage returns @@ -461,11 +474,11 @@ def ignore(self, refPage): if self.enabled: # Skip this occurence next time. - filename = wikipedia.config.datafilepath('disambiguations', - self.disambPage.urlname() + '.txt') + filename = config.datafilepath( + 'disambiguations', + self.disambPage.urlname() + '.txt') try: # Open file for appending. If none exists yet, create a new one. - # The file is stored in the disambiguation/ subdir. Create if necessary. f = codecs.open(filename, 'a', 'utf-8') f.write(refPage.urlname() + '\n') f.close() @@ -496,7 +509,8 @@ 'hu': u'Egyért-redir', } - def __init__(self, always, alternatives, getAlternatives, generator, primary, main_only, minimum = 0): + def __init__(self, always, alternatives, getAlternatives, generator, + primary, main_only, minimum = 0): self.always = always self.alternatives = alternatives self.getAlternatives = getAlternatives @@ -505,7 +519,7 @@ self.main_only = main_only self.minimum = minimum - self.mysite = wikipedia.getSite() + self.mysite = pywikibot.getSite() self.mylang = self.mysite.language() self.comment = None @@ -536,33 +550,41 @@ list = u'\n' for i in range(len(self.alternatives)): list += (u"%3i - %s\n" % (i, self.alternatives[i])) - wikipedia.output(list) + pywikibot.output(list) def setupRegexes(self): # compile regular expressions self.ignore_contents_regexes = [] - if self.ignore_contents.has_key(self.mylang): + if self.mylang in self.ignore_contents: for ig in self.ignore_contents[self.mylang]: self.ignore_contents_regexes.append(re.compile(ig)) linktrail = self.mysite.linktrail() self.trailR = re.compile(linktrail) - # The regular expression which finds links. Results consist of four groups: - # group title is the target page title, that is, everything before | or ]. - # group section is the page section. It'll include the # to make life easier for us. - # group label is the alternative link title, that's everything between | and ]. - # group linktrail is the link trail, that's letters after ]] which are part of the word. + # The regular expression which finds links. Results consist of four + # groups: + # group title is the target page title, that is, everything before + # | or ]. + # group section is the page section. It'll include the # to make life + # easier for us. + # group label is the alternative link title, that's everything + # between | and ]. + # group linktrail is the link trail, that's letters after ]] which + # are part of the word. # note that the definition of 'letter' varies from language to language. - self.linkR = re.compile(r'\[\[(?P<title>[^\]\|#]*)(?P<section>#[^\]\|]*)?(\|(?P<label>[^\]]*))?\]\](?P<linktrail>' + linktrail + ')') + self.linkR = re.compile( + r'\[\[(?P<title>[^\]\|#]*)(?P<section>#[^\]\|]*)?(\|(?P<label>[^\]]*))?\]\](?P<linktrail>' + + linktrail + ')') def treat(self, refPage, disambPage): """ Parameters: - disambPage - The disambiguation page or redirect we don't want anything - to link on + disambPage - The disambiguation page or redirect we don't want + anything to link to refPage - A page linking to disambPage Returns False if the user pressed q to completely quit the program. Otherwise, returns True. + """ # TODO: break this function up into subroutines! @@ -573,24 +595,33 @@ text=refPage.get(throttle=False) ignoreReason = self.checkContents(text) if ignoreReason: - wikipedia.output('\n\nSkipping %s because it contains %s.\n\n' % (refPage.title(), ignoreReason)) + pywikibot.output('\n\nSkipping %s because it contains %s.\n' + % (refPage.title(), ignoreReason)) else: include = True - except wikipedia.IsRedirectPage: - wikipedia.output(u'%s is a redirect to %s' % (refPage.title(), disambPage.title())) + except pywikibot.IsRedirectPage: + pywikibot.output(u'%s is a redirect to %s' + % (refPage.title(), disambPage.title())) if disambPage.isRedirectPage(): target = self.alternatives[0] - choice = wikipedia.inputChoice(u'Do you want to make redirect %s point to %s?' % (refPage.title(), target), ['yes', 'no'], ['y', 'N'], 'N') + choice = pywikibot.inputChoice( + u'Do you want to make redirect %s point to %s?' + % (refPage.title(), target), ['yes', 'no'], ['y', 'N'], 'N') if choice == 'y': - redir_text = '#%s [[%s]]' % (self.mysite.redirect(default=True), target) + redir_text = '#%s [[%s]]' % ( + self.mysite.redirect(default=True), target) try: refPage.put_async(redir_text,comment=self.comment) - except wikipedia.PageNotSaved, error: - wikipedia.output(u'Page not saved: %s' % error.args) + except pywikibot.PageNotSaved, error: + pywikibot.output(u'Page not saved: %s' % error.args) else: - choice = wikipedia.inputChoice(u'Do you want to work on pages linking to %s?' % refPage.title(), ['yes', 'no', 'change redirect'], ['y', 'N', 'c'], 'N') + choice = pywikibot.inputChoice( + u'Do you want to work on pages linking to %s?' + % refPage.title(), ['yes', 'no', 'change redirect'], + ['y', 'N', 'c'], 'N') if choice == 'y': - gen = ReferringPageGeneratorWithIgnore(refPage, self.primary) + gen = ReferringPageGeneratorWithIgnore(refPage, + self.primary) preloadingGen = pagegenerators.PreloadingGenerator(gen) for refPage2 in preloadingGen: # run until the user selected 'quit' @@ -599,11 +630,13 @@ elif choice == 'c': text=refPage.get(throttle=False,get_redirect=True) include = "redirect" - except wikipedia.NoPage: - wikipedia.output(u'Page [[%s]] does not seem to exist?! Skipping.' % refPage.title()) + except pywikibot.NoPage: + pywikibot.output( + u'Page [[%s]] does not seem to exist?! Skipping.' + % refPage.title()) include = False if include in (True, "redirect"): - # make a backup of the original text so we can show the changes later + # make a backup of the original text so we can show changes later original_text = text n = 0 curpos = 0 @@ -613,25 +646,25 @@ m = self.linkR.search(text, pos = curpos) if not m: if n == 0: - wikipedia.output(u"No changes necessary in %s" % refPage.title()) + pywikibot.output(u"No changes necessary in %s" + % refPage.title()) return True else: # stop loop and save page break - # Make sure that next time around we will not find this same hit. + # Make sure that next time around we will not find same hit. curpos = m.start() + 1 - # ignore interwiki links and links to sections of the same page - if m.group('title') == '' or self.mysite.isInterwikiLink(m.group('title')): + try: + foundlink = pywikibot.Link(m.group('title'), + disambPage.site()) + except pywikibot.Error: continue - else: - try: - linkPage = wikipedia.Page(disambPage.site(), m.group('title')) - # Check whether the link found is to disambPage. - except wikipedia.InvalidTitle: - continue - if linkPage != disambPage: - continue - + # ignore interwiki links + if foundlink.site != disambPage.site(): + continue + # check whether the link found is to disambPage + if foundlink.canonical_title() != disambPage.title(): + continue n += 1 # how many bytes should be displayed around the current link context = 60 @@ -640,28 +673,38 @@ while True: # Show the title of the page where the link was found. # Highlight the title in purple. - wikipedia.output(u"\n\n>>> \03{lightpurple}%s\03{default} <<<" % refPage.title()) + pywikibot.output( + u"\n\n>>> \03{lightpurple}%s\03{default} <<<" + % refPage.title()) # at the beginning of the link, start red color. # at the end of the link, reset the color to default - wikipedia.output(text[max(0, m.start() - context) : m.start()] + '\03{lightred}' + text[m.start() : m.end()] + '\03{default}' + text[m.end() : m.end() + context]) + pywikibot.output(text[max(0, m.start() - context) + : m.start()] + + '\03{lightred}' + + text[m.start() : m.end()] + + '\03{default}' + + text[m.end() : m.end() + context]) if not self.always: if edited: - choice = wikipedia.input(u"Option (#, r#, s=skip link, e=edit page, n=next page, u=unlink, q=quit\n" - " m=more context, l=list, a=add new, x=save in this form):") + choice = pywikibot.input( +u"Option (#, r#, s=skip link, e=edit page, n=next page, u=unlink, q=quit,\n" +u" m=more context, l=list, a=add new, x=save in this form):") else: - choice = wikipedia.input(u"Option (#, r#, s=skip link, e=edit page, n=next page, u=unlink, q=quit\n" - " m=more context, d=show disambiguation page, l=list, a=add new):") + choice = pywikibot.input( +u"Option (#, r#, s=skip link, e=edit page, n=next page, u=unlink, q=quit,\n" +u" m=more context, d=show disambiguation page, l=list, a=add new):") else: choice = self.always if choice in ['a', 'A']: - newAlternative = wikipedia.input(u'New alternative:') + newAlternative = pywikibot.input(u'New alternative:') self.alternatives.append(newAlternative) self.listAlternatives() elif choice in ['e', 'E']: editor = editarticle.TextEditor() - newText = editor.edit(text, jumpIndex = m.start(), highlight = disambPage.title()) + newText = editor.edit(text, jumpIndex=m.start(), + highlight=disambPage.title()) # if user didn't press Cancel if newText and newText != text: text = newText @@ -670,9 +713,15 @@ editor = editarticle.TextEditor() if disambPage.isRedirectPage(): disambredir = disambPage.getRedirectTarget() - disambigText = editor.edit(disambredir.get(), jumpIndex = m.start(), highlight = disambredir.title()) + disambigText = editor.edit( + disambredir.get(), + jumpIndex=m.start(), + highlight=disambredir.title()) else: - disambigText = editor.edit(disambPage.get(), jumpIndex = m.start(), highlight = disambPage.title()) + disambigText = editor.edit( + disambPage.get(), + jumpIndex=m.start(), + highlight = disambPage.title()) elif choice in ['l', 'L']: self.listAlternatives() elif choice in ['m', 'M']: @@ -689,7 +738,7 @@ elif choice in ['n', 'N']: # skip this page if self.primary: - # If run with the -primary argument, skip this occurence next time. + # If run with -primary, skip this occurence next time. self.primaryIgnoreManager.ignore(refPage) return True elif choice in ['q', 'Q']: @@ -737,117 +786,150 @@ try: choice=int(choice) except ValueError: - wikipedia.output(u"Unknown option") - # step back to ask the user again what to do with the current link + pywikibot.output(u"Unknown option") + # step back to ask the user again what to do with the + # current link curpos -= 1 continue if choice >= len(self.alternatives) or choice < 0: - wikipedia.output(u"Choice out of range. Please select a number between 0 and %i." % (len(self.alternatives) - 1)) + pywikibot.output( +u"Choice out of range. Please select a number between 0 and %i." + % (len(self.alternatives) - 1)) # show list of possible choices self.listAlternatives() - # step back to ask the user again what to do with the current link + # step back to ask the user again what to do with the + # current link curpos -= 1 continue new_page_title = self.alternatives[choice] - repPl = wikipedia.Page(disambPage.site(), new_page_title) - if (new_page_title[0].isupper()) or (link_text[0].isupper()): + repPl = pywikibot.Page(pywikibot.Link(new_page_title, + disambPage.site())) + if (new_page_title[0].isupper() + or link_text[0].isupper()): new_page_title = repPl.title() else: new_page_title = repPl.title() - new_page_title = new_page_title[0].lower() + new_page_title[1:] + new_page_title = new_page_title[0].lower() \ + + new_page_title[1:] if new_page_title not in new_targets: new_targets.append(new_page_title) if replaceit and trailing_chars: - newlink = "[[%s%s]]%s" % (new_page_title, section, trailing_chars) - elif replaceit or (new_page_title == link_text and not section): + newlink = "[[%s%s]]%s" % (new_page_title, + section, + trailing_chars) + elif replaceit or (new_page_title == link_text + and not section): newlink = "[[%s]]" % new_page_title - # check if we can create a link with trailing characters instead of a pipelink - elif len(new_page_title) <= len(link_text) and firstcap(link_text[:len(new_page_title)]) == firstcap(new_page_title) and re.sub(self.trailR, '', link_text[len(new_page_title):]) == '' and not section: - newlink = "[[%s]]%s" % (link_text[:len(new_page_title)], link_text[len(new_page_title):]) + # check if we can create a link with trailing characters + # instead of a pipelink + elif len(new_page_title) <= len(link_text) \ + and firstcap(link_text[:len(new_page_title)]) \ + == firstcap(new_page_title) \ + and re.sub(self.trailR, '', + link_text[len(new_page_title):]) == '' \ + and not section: + newlink = "[[%s]]%s" \ + % (link_text[:len(new_page_title)], + link_text[len(new_page_title):]) else: - newlink = "[[%s%s|%s]]" % (new_page_title, section, link_text) + newlink = "[[%s%s|%s]]" \ + % (new_page_title, section, link_text) text = text[:m.start()] + newlink + text[m.end():] continue - wikipedia.output(text[max(0,m.start()-30):m.end()+30]) + pywikibot.output(text[max(0,m.start()-30):m.end()+30]) if text == original_text: - wikipedia.output(u'\nNo changes have been made:\n') + pywikibot.output(u'\nNo changes have been made:\n') else: - wikipedia.output(u'\nThe following changes have been made:\n') - wikipedia.showDiff(original_text, text) - wikipedia.output(u'') + pywikibot.output(u'\nThe following changes have been made:\n') + pywikibot.showDiff(original_text, text) + pywikibot.output(u'') # save the page self.setSummaryMessage(disambPage, new_targets, unlink) try: refPage.put_async(text,comment=self.comment) - except wikipedia.LockedPage: - wikipedia.output(u'Page not saved: page is locked') - except wikipedia.PageNotSaved, error: - wikipedia.output(u'Page not saved: %s' % error.args) + except pywikibot.LockedPage: + pywikibot.output(u'Page not saved: page is locked') + except pywikibot.PageNotSaved, error: + pywikibot.output(u'Page not saved: %s' % error.args) return True def findAlternatives(self, disambPage): if disambPage.isRedirectPage() and not self.primary: - if self.primary_redir_template.has_key(disambPage.site().lang) and self.primary_redir_template[disambPage.site().lang] in disambPage.templates(get_redirect = True): + if (disambPage.site().lang in self.primary_redir_template + and self.primary_redir_template[disambPage.site().lang] + in disambPage.templates(get_redirect = True)): baseTerm = disambPage.title() - for template in disambPage.templatesWithParams(get_redirect = True): - if template[0] == self.primary_redir_template[disambPage.site().lang] and len(template[1]) > 0: + for template in disambPage.templatesWithParams( + get_redirect=True): + if template[0] == self.primary_redir_template[ + disambPage.site().lang] \ + and len(template[1]) > 0: baseTerm = template[1][1] disambTitle = primary_topic_format[self.mylang] % baseTerm try: - disambPage2 = wikipedia.Page(self.mysite, disambTitle) + disambPage2 = pywikibot.Page( + pywikibot.Link(disambTitle, self.mysite)) links = disambPage2.linkedPages() links = [correctcap(l,disambPage2.get()) for l in links] - except wikipedia.NoPage: - wikipedia.output(u"No page at %s, using redirect target." % disambTitle) + except pywikibot.NoPage: + pywikibot.output(u"No page at %s, using redirect target." + % disambTitle) links = disambPage.linkedPages()[:1] - links = [correctcap(l,disambPage.get(get_redirect = True)) for l in links] + links = [correctcap(l,disambPage.get(get_redirect = True)) + for l in links] self.alternatives += links else: try: target = disambPage.getRedirectTarget().title() self.alternatives.append(target) - except wikipedia.NoPage: - wikipedia.output(u"The specified page was not found.") - user_input = wikipedia.input(u"""\ + except pywikibot.NoPage: + pywikibot.output(u"The specified page was not found.") + user_input = pywikibot.input(u"""\ Please enter the name of the page where the redirect should have pointed at, or press enter to quit:""") if user_input == "": sys.exit(1) else: self.alternatives.append(user_input) - except wikipedia.IsNotRedirectPage: - wikipedia.output( + except pywikibot.IsNotRedirectPage: + pywikibot.output( u"The specified page is not a redirect. Skipping.") return False elif self.getAlternatives: try: if self.primary: try: - disambPage2 = wikipedia.Page(self.mysite, - primary_topic_format[self.mylang] - % disambPage.title() - ) + disambPage2 = pywikibot.Page( + pywikibot.Link( + primary_topic_format[self.mylang] + % disambPage.title(), + self.mysite)) links = disambPage2.linkedPages() - links = [correctcap(l,disambPage2.get()) for l in links] - except wikipedia.NoPage: - wikipedia.output(u"Page does not exist, using the first link in page %s." % disambPage.title()) + links = [correctcap(l, disambPage2.get()) + for l in links] + except pywikibot.NoPage: + pywikibot.output( + u"Page does not exist, using the first link in page %s." + % disambPage.title()) links = disambPage.linkedPages()[:1] - links = [correctcap(l,disambPage.get()) for l in links] + links = [correctcap(l, disambPage.get()) + for l in links] else: try: links = disambPage.linkedPages() - links = [correctcap(l,disambPage.get()) for l in links] - except wikipedia.NoPage: - wikipedia.output(u"Page does not exist, skipping.") + links = [correctcap(l ,disambPage.get()) + for l in links] + except pywikibot.NoPage: + pywikibot.output(u"Page does not exist, skipping.") return False - except wikipedia.IsRedirectPage: - wikipedia.output(u"Page is a redirect, skipping.") + except pywikibot.IsRedirectPage: + pywikibot.output(u"Page is a redirect, skipping.") return False self.alternatives += links return True - def setSummaryMessage(self, disambPage, new_targets = [], unlink = False): + def setSummaryMessage(self, disambPage, new_targets=[], unlink=False): # make list of new targets targets = '' for page_title in new_targets: @@ -856,57 +938,74 @@ targets = targets[:-2] if not targets: - targets = wikipedia.translate(self.mysite, unknown_msg) + targets = pywikibot.translate(self.mysite, unknown_msg) # first check whether user has customized the edit comment - if wikipedia.config.disambiguation_comment.has_key(self.mysite.family.name) and wikipedia.config.disambiguation_comment[self.mysite.family.name].has_key(self.mylang): + if (self.mysite.family.name in config.disambiguation_comment + and self.mylang in config.disambiguation_comment + [self.mysite.family.name]): try: - self.comment = wikipedia.translate(self.mysite, - wikipedia.config.disambiguation_comment[ - self.mysite.family.name] - ) % (disambPage.title(), targets) - #Backwards compatibility, type error probably caused by too many arguments for format string + self.comment = pywikibot.translate( + self.mysite, + config.disambiguation_comment + [self.mysite.family.name] + ) % (disambPage.title(), targets) + # Backwards compatibility, type error probably caused by too + # many arguments for format string except TypeError: - self.comment = wikipedia.translate(self.mysite, - wikipedia.config.disambiguation_comment[ - self.mysite.family.name] - ) % disambPage.title() + self.comment = pywikibot.translate( + self.mysite, + config.disambiguation_comment + [self.mysite.family.name] + ) % disambPage.title() elif disambPage.isRedirectPage(): # when working on redirects, there's another summary message if unlink and not new_targets: - self.comment = wikipedia.translate(self.mysite, msg_redir_unlink) % disambPage.title() + self.comment = pywikibot.translate( + self.mysite, + msg_redir_unlink + ) % disambPage.title() else: - self.comment = wikipedia.translate(self.mysite, msg_redir) % (disambPage.title(), targets) + self.comment = pywikibot.translate( + self.mysite, msg_redir + ) % (disambPage.title(), targets) else: if unlink and not new_targets: - self.comment = wikipedia.translate(self.mysite, msg_unlink) % disambPage.title() + self.comment = pywikibot.translate( + self.mysite, msg_unlink + ) % disambPage.title() else: - self.comment = wikipedia.translate(self.mysite, msg) % (disambPage.title(), targets) + self.comment = pywikibot.translate( + self.mysite, msg + ) % (disambPage.title(), targets) def run(self): if self.main_only: - if not ignore_title.has_key(self.mysite.family.name): + if self.mysite.family.name not in ignore_title: ignore_title[self.mysite.family.name] = {} - if not ignore_title[self.mysite.family.name].has_key(self.mylang): + if self.mylang not in ignore_title[self.mysite.family.name]: ignore_title[self.mysite.family.name][self.mylang] = [] ignore_title[self.mysite.family.name][self.mylang] += [ - u'%s:' % namespace for namespace in self.mysite.namespaces()] + u'%s:' % ns for namespace in self.mysite.namespaces() + for ns in self.mysite.namespaces()[namespace]] for disambPage in self.generator: - self.primaryIgnoreManager = PrimaryIgnoreManager(disambPage, enabled=self.primary) + self.primaryIgnoreManager = PrimaryIgnoreManager( + disambPage, enabled=self.primary) if not self.findAlternatives(disambPage): continue self.makeAlternativesUnique() # sort possible choices - if wikipedia.config.sort_ignore_case: + if config.sort_ignore_case: self.alternatives.sort(lambda x,y: cmp(x.lower(), y.lower())) else: self.alternatives.sort() self.listAlternatives() - gen = ReferringPageGeneratorWithIgnore(disambPage, self.primary, minimum = self.minimum) + gen = ReferringPageGeneratorWithIgnore(disambPage, self.primary, + minimum=self.minimum) preloadingGen = pagegenerators.PreloadingGenerator(gen) for refPage in preloadingGen: if not self.primaryIgnoreManager.isIgnored(refPage): @@ -917,7 +1016,7 @@ # clear alternatives before working on next disambiguation page self.alternatives = [] -def main(): +def main(*args): # the option that's always selected when the bot wonders what to do with # a link. If it's None, the user is prompted (default behaviour). always = None @@ -936,7 +1035,7 @@ ignoreCase = False minimum = 0 - for arg in wikipedia.handleArgs(): + for arg in pywikibot.handleArgs(*args): if arg.startswith('-primary:'): primary = True getAlternatives = False @@ -947,17 +1046,20 @@ always = arg[8:] elif arg.startswith('-file'): if len(arg) == 5: - generator = pagegenerators.TextfilePageGenerator(filename = None) + generator = pagegenerators.TextfilePageGenerator( + filename = None) else: - generator = pagegenerators.TextfilePageGenerator(filename = arg[6:]) + generator = pagegenerators.TextfilePageGenerator( + filename = arg[6:]) elif arg.startswith('-pos:'): if arg[5]!=':': - mysite = wikipedia.getSite() - page = wikipedia.Page(mysite, arg[5:]) + mysite = pywikibot.getSite() + page = pywikibot.Page(pywikibot.Link(arg[5:], mysite)) if page.exists(): alternatives.append(page.title()) else: - answer = wikipedia.inputChoice(u'Possibility %s does not actually exist. Use it anyway?' + answer = pywikibot.inputChoice( + u'Possibility %s does not actually exist. Use it anyway?' % page.title(), ['yes', 'no'], ['y', 'N'], 'N') if answer == 'y': alternatives.append(page.title()) @@ -972,17 +1074,21 @@ elif arg.startswith('-start'): try: if len(arg) <= len('-start:'): - generator = pagegenerators.CategorizedPageGenerator(wikipedia.getSite().disambcategory()) + generator = pagegenerators.CategorizedPageGenerator( + pywikibot.getSite().disambcategory()) else: - generator = pagegenerators.CategorizedPageGenerator(wikipedia.getSite().disambcategory(), start = arg[7:]) - generator = pagegenerators.NamespaceFilterPageGenerator(generator, [0]) - except wikipedia.NoPage: + generator = pagegenerators.CategorizedPageGenerator( + pywikibot.getSite().disambcategory(), + start = arg[7:]) + generator = pagegenerators.NamespaceFilterPageGenerator( + generator, [0]) + except pywikibot.NoPage: print "Disambiguation category for your wiki is not known." raise elif arg.startswith("-"): print "Unrecognized command line argument: %s" % arg # show help text and exit - wikipedia.showHelp() + pywikibot.showHelp() else: pageTitle.append(arg) @@ -990,17 +1096,19 @@ # connect the title's parts with spaces if pageTitle != []: pageTitle = ' '.join(pageTitle) - page = wikipedia.Page(wikipedia.getSite(), pageTitle) + page = pywikibot.Page(pywikibot.Link(pageTitle, pywikibot.getSite())) generator = iter([page]) # if no disambiguation pages was given as an argument, and none was # read from a file, query the user if not generator: - pageTitle = wikipedia.input(u'On which disambiguation page do you want to work?') - page = wikipedia.Page(wikipedia.getSite(), pageTitle) + pageTitle = pywikibot.input( + u'On which disambiguation page do you want to work?') + page = pywikibot.Page(pywikibot.Link(pageTitle, pywikibot.getSite())) generator = iter([page]) - bot = DisambiguationRobot(always, alternatives, getAlternatives, generator, primary, main_only, minimum = minimum) + bot = DisambiguationRobot(always, alternatives, getAlternatives, generator, + primary, main_only, minimum=minimum) bot.run() @@ -1009,4 +1117,4 @@ try: main() finally: - wikipedia.stopme() + pywikibot.stopme()

1 0

[Pywikipedia-l] SVN: [6257] branches/rewrite/pywikibot/page.py
by russblau＠svn.wikimedia.org 15 Jan '09

15 Jan '09

Revision: 6257 Author: russblau Date: 2009-01-14 18:21:42 +0000 (Wed, 14 Jan 2009) Log Message: ----------- Improve Link parsing, and fix minor category bugs Modified Paths: -------------- branches/rewrite/pywikibot/page.py Modified: branches/rewrite/pywikibot/page.py =================================================================== --- branches/rewrite/pywikibot/page.py 2009-01-14 15:17:47 UTC (rev 6256) +++ branches/rewrite/pywikibot/page.py 2009-01-14 18:21:42 UTC (rev 6257) @@ -1049,7 +1049,6 @@ If newCat is None, the category will be removed. """ # TODO: document remaining arguments - cats = self.categories(get_redirect=True) site = self.site() changesMade = False @@ -1092,6 +1091,7 @@ # and remove duplicates. newCatList = [] newCatSet = set() + cats = list(self.categories(get_redirect=True)) for i in range(len(cats)): cat = cats[i] if cat == oldCat: @@ -1295,8 +1295,7 @@ class Category(Page): """A page in the Category: namespace""" - @deprecate_arg("sortKey", None) - def __init__(self, source, title=u"", insite=None): + def __init__(self, source, title=u"", insite=None, sortKey=None): """All parameters are the same as for Page() constructor. """ @@ -1304,6 +1303,7 @@ if self.namespace() != 14: raise ValueError(u"'%s' is not in the category namespace!" % title) + self.sortKey = sortKey @deprecate_arg("forceInterwiki", None) @deprecate_arg("textlink", None) @@ -1556,28 +1556,22 @@ """ self._text = text - self._source = source + self._source = source or pywikibot.Site() self._defaultns = defaultNamespace - def parse(self): - """Parse text; called internally when accessing attributes""" - - # First remove the anchor, which is stored unchanged, if there is one + # preprocess text (these changes aren't site-dependent) + # First remove anchor, which is stored unchanged, if there is one if u"|" in self._text: self._text, self._anchor = self._text.split(u"|", 1) else: self._anchor = None - if self._source is None: - self._source = pywikibot.Site() - self._site = self._source - # Clean up the name, it can come from anywhere. # Convert HTML entities to unicode t = html2unicode(self._text) # Convert URL-encoded characters to unicode - t = url2unicode(t, site=self._site) + t = url2unicode(t, site=self._source) # Normalize unicode string to a NFC (composed) format to allow proper # string comparisons. According to @@ -1590,7 +1584,6 @@ # if u'\ufffd' in t: raise pywikibot.Error("Title contains illegal char (\\uFFFD)") - self._namespace = self._defaultns # Replace underscores by spaces t = t.replace(u"_", u" ") @@ -1600,9 +1593,63 @@ t = t.strip(" ") # Remove left-to-right and right-to-left markers. t = t.replace(u"\u200e", u"").replace(u"\u200f", u"") + self._text = t + def parse_site(self): + """Parse only enough text to determine the host site.""" + + t = self._text + self._site = self._source firstPass = True while u":" in t: + # Initial colon + if t.startswith(u":"): + # remove the colon but continue processing + # remove any subsequent whitespace + t = t.lstrip(u":").lstrip(u" ") + continue + fam = self._site.family + prefix = t[ :t.index(u":")].lower() # part of text before : + ns = self._site.ns_index(prefix) + if ns: + # Ordinary namespace + return + if prefix in fam.langs.keys()\ + or prefix in fam.get_known_families(site=self._site): + # looks like an interwiki link + if not firstPass: + return + t = t[t.index(u":"): ].lstrip(u": ") # part of text after : + if prefix in fam.langs.keys(): + newsite = pywikibot.Site(prefix, fam) + else: + otherlang = self._site.code + familyName = fam.get_known_families(site=self._site)[prefix] + if familyName in ['commons', 'meta']: + otherlang = familyName + try: + newsite = pywikibot.Site(otherlang, familyName) + except ValueError: + return + # Redundant interwiki prefix to the local wiki + if newsite == self._site: + firstPass = False + continue + self._site = newsite + else: + return # text before : doesn't match any known prefix + + def parse(self): + """Parse text; called internally when accessing attributes""" + + self._site = self._source + self._namespace = self._defaultns + t = self._text + + # This code was adapted from Title.php : secureAndSplit() + # + firstPass = True + while u":" in t: # Initial colon indicates main namespace rather than default if t.startswith(u":"): self._namespace = 0 @@ -1707,7 +1754,7 @@ @property def site(self): if not hasattr(self, "_site"): - self.parse() + self.parse_site() return self._site @property @@ -1734,6 +1781,14 @@ self.parse() return self._anchor + def canonical_title(self): + """Return full page title, including localized namespace.""" + if self.namespace: + return "%s:%s" % (self.site.namespace(self.namespace), + self.title) + else: + return self.title + def astext(self, onsite=None): """Return a text representation of the link. @@ -1763,7 +1818,7 @@ title) def __str__(self): - return self.astext() + return self.astext().encode("ascii", "backslashreplace") def __cmp__(self, other): """Test for equality and inequality of Link objects.

1 0

[Pywikipedia-l] SVN: [6256] branches/rewrite/pywikibot/scripts/solve_disambiguation.py
by russblau＠svn.wikimedia.org 14 Jan '09

14 Jan '09

Revision: 6256 Author: russblau Date: 2009-01-14 15:17:47 +0000 (Wed, 14 Jan 2009) Log Message: ----------- Branch for port to rewrite Added Paths: ----------- branches/rewrite/pywikibot/scripts/solve_disambiguation.py Copied: branches/rewrite/pywikibot/scripts/solve_disambiguation.py (from rev 6255, trunk/pywikipedia/solve_disambiguation.py) =================================================================== --- branches/rewrite/pywikibot/scripts/solve_disambiguation.py (rev 0) +++ branches/rewrite/pywikibot/scripts/solve_disambiguation.py 2009-01-14 15:17:47 UTC (rev 6256) @@ -0,0 +1,1012 @@ +#!/usr/bin/python +# -*- coding: utf-8 -*- +""" +Script to help a human solve disambiguations by presenting a set of options. + +Specify the disambiguation page on the command line, or enter it at the +prompt after starting the program. (If the disambiguation page title starts +with a '-', you cannot name it on the command line, but you can enter it at +the prompt.) The program will pick up the page, and look for all +alternative links, and show them with a number adjacent to them. It will +then automatically loop over all pages referring to the disambiguation page, +and show 30 characters of context on each side of the reference to help you +make the decision between the alternatives. It will ask you to type the +number of the appropriate replacement, and perform the change. + +It is possible to choose to replace only the link (just type the number) or +replace both link and link-text (type 'r' followed by the number). + +Multiple references in one page will be scanned in order, but typing 'n' (next) +on any one of them will leave the complete page unchanged. To leave only some reference unchanged, use the 's' (skip) option. + +Command line options: + + -pos:XXXX adds XXXX as an alternative disambiguation + + -just only use the alternatives given on the command line, do not + read the page for other possibilities + + -primary "primary topic" disambiguation (Begriffsklärung nach Modell 2). + That's titles where one topic is much more important, the + disambiguation page is saved somewhere else, and the important + topic gets the nice name. + + -primary:XY like the above, but use XY as the only alternative, instead of + searching for alternatives in [[Keyword (disambiguation)]]. + Note: this is the same as -primary -just -pos:XY + + -file:XYZ reads a list of pages from a text file. XYZ is the name of the + file from which the list is taken. If XYZ is not given, the user is asked for a filename. + Page titles should be inside [[double brackets]]. + The -pos parameter won't work if -file is used. + + -always:XY instead of asking the user what to do, always perform the same + action. For example, XY can be "r0", "u" or "2". Be careful with + this option, and check the changes made by the bot. Note that + some choices for XY don't make sense and will result in a loop, + e.g. "l" or "m". + + -main only check pages in the main namespace, not in the talk, + wikipedia, user, etc. namespaces. + + -start:XY goes through all disambiguation pages in the category on your wiki + that is defined (to the bot) as the category containing disambiguation + pages, starting at XY. If only '-start' or '-start:' is given, it starts + at the beginning. + + -min:XX (XX being a number) only work on disambiguation pages for which + at least XX are to be worked on. + +To complete a move of a page, one can use: + + python solve_disambiguation.py -just -pos:New_Name Old_Name +""" +# +# (C) Rob W.W. Hooft, 2003 +# (C) Daniel Herding, 2004 +# (C) Andre Engels, 2003-2004 +# (C) WikiWichtel, 2004 +# +# Distributed under the terms of the MIT license. +# +__version__='$Id$' +# +# Standard library imports +import re, sys, codecs + +# Application specific imports +import wikipedia, pagegenerators, editarticle + +# Summary message when working on disambiguation pages +msg = { + 'ar': u'توضيح بمساعدة روبوت: %s - غير الوصلة أو الوصلات إلى %s', + 'cs': u'Odstranění linku na rozcestník [[%s]] s použitím robota - Změněn(y) odkaz(y) na %s', + 'en': u'Robot-assisted disambiguation: %s - Changed link(s) to %s', + 'es': u'Bot:Desambiguación asistida: %s - Cambiando enlace(s) para %s', + 'da': u'Retter flertydigt link til: %s - Ændrede link(s) til %s', + 'de': u'Bot-unterstützte Begriffsklärung: %s - Link(s) ersetzt durch %s', + 'fi': u'Täsmennystä botin avulla: %s korvattiin link(e)illä %s', + 'fr': u'Homonymie résolue à l\'aide du robot: %s - Modifications du (des) lien(s) pour %s', + 'he': u'תיקון קישור לדף פירושונים באמצעות בוט: %s', + 'hu': u'Bottal végzett egyértelműsítés: %s –> %s', + 'ia': u'Disambiguation assistite per robot: %s - Changed link(s) to %s', + 'it': u'Sistemazione automatica della disambigua: %s - Inversione di redirect %s', + 'lt': u'Nuorodų į nukrepiamąjį straipsnį keitimas: %s - Pakeistos nuorodos į %s', + 'kk': u'Айрықты мағыналарды бот көмегімен шешу: %s - Changed link(s) to %s', + 'ko': u'로봇의 도움을 받아 동음이의 처리 : [[%s]] - %s 문서로 링크 걸음', + 'nl': u'Robot-geholpen doorverwijzing: [[%s]] - Link(s) veranderd naar %s', + 'no': u'bot: Retter lenke til peker: %s - Endret lenke(r) til %s', + 'pl': u'Wspomagane przez robota ujednoznacznienie: %s - Zmieniono link(i) %s', + 'pt': u'Desambiguação assistida por bot: %s link(s) mudado(s) para %s', + 'ru': u'Разрешение значений с помощью бота: %s - Changed link(s) to %s', + 'sr': u'Решавање вишезначних одредница помоћу бота: %s - Changed link(s) to %s', + 'sv': u'Länkar direkt till rätt artikel för: %s - Bytte länk(ar) till %s', + } + +# Summary message when working on disambiguation pages and the link is removed +msg_unlink = { + 'ar': u'توضيح بمساعدة روبوت: %s - أزال الوصلة أو الوصلات.', + 'cs': u'Odstranění linku na rozcestník [[%s]] s použitím robota - Odstraněn(y) odkaz(y)', + 'en': u'Robot-assisted disambiguation: %s - Removed link(s).', + 'da': u'Retter flertydigt link til: %s - Fjernede link(s)', + 'de': u'Bot-unterstützte Begriffsklärung: %s - Link(s) entfernt', + 'fi': u'Täsmennystä botin avulla: %s - poistettiin linkkejä.', + 'fr': u'Homonymie résolue à l\'aide du robot: %s - Retrait du (des) lien(s)', + 'he': u'הסרת קישור לדף פירושונים באמצעות בוט: %s', + 'hu': u'Bottal végzett egyértelműsítés: %s – hivatkozások eltávolítása', + 'ia': u'Disambiguation assistite per robot: %s - Removed link(s).', + 'it': u'Sistemazione automatica della disambigua: %s - Collegamenti rimossi', + 'lt': u'Nuorodų į nukrepiamąjį straipsnį keitimas: %s - Pašalintos nuorodos', + 'kk': u'Айрықты мағыналарды бот көмегімен шешу: %s - Removed link(s).', + 'ko': u'로봇의 도움을 받아 동음이의 처리: [[%s]] - 링크 제거', + 'nl': u'Robot-geholpen doorverwijzing: [[%s]] - Link(s) weggehaald.', + 'no': u'bot: Retter lenke til peker: %s - Fjernet lenke(r)', + 'pl': u'Wspomagane przez robota ujednoznacznienie: %s - Usunięto link(i)', + 'pt': u'Desambiguação assistida por bot: %s link(s) removido(s)', + 'ru': u'Разрешение значений с помощью бота: %s - Removed link(s)', + 'sr': u'Решавање вишезначних одредница помоћу бота: %s - Removed link(s)', + 'sv': u'Länkar direkt till rätt artikel för: %s - Tog bort länk(ar)', + } + +# Summary message when working on redirects +msg_redir = { + 'ar': u'توضيح بمساعدة روبوت: %s - غير الوصلة أو الوصلات إلى %s', + 'cs': u'Robot opravil přesměrování na %s - Změněn(y) odkaz(y) na %s', + 'en': u'Robot-assisted disambiguation: %s - Changed link(s) to %s', + 'da': u'Retter flertydigt link til: %s - Ændrede link(s) til %s', + 'de': u'Bot-unterstützte Redirectauflösung: %s - Link(s) ersetzt durch %s', + 'fi': u'Täsmennystä botin avulla: %s korvattiin link(e)illä %s', + 'fr': u'Correction de lien vers redirect: %s - Modifications du (des) lien(s) pour %s', + 'he': u'תיקון קישור לדף פירושונים באמצעות בוט: %s שונה ל%s', + 'hu': u'Bottal végzett egyértelműsítés: %s –> %s', + 'ia': u'Resolution de redirectiones assistite per robot: %s - Changed link(s) to %s', + 'it': u'Sistemazione automatica del redirect: %s - Inversione di redirect %s', + 'lt': u'Nuorodų į peradresavimo straipsnį keitimas: %s - Pakeistos nuorodos į %s', + 'kk': u'Айрықты мағыналарды бот көмегімен шешу: %s - Changed link(s) to %s', + 'ko': u'로봇의 도움을 받아 동음이의 처리: [[%s]] - %s 문서로 링크 걸음', + 'nl': u'Robot-geholpen redirect-oplossing: [[%s]] - Link(s) veranderd naar %s', + 'no': u'bot: Endrer omdirigeringslenke: %s - Endret lenke(r) til %s', + 'pl': u'Wspomagane przez robota ujednoznacznienie: %s - Zmieniono link(i) %s', + 'pt': u'Desambiguação assistida por bot: %s link(s) mudados para %s', + 'ru': u'Разрешение значений с помощью бота: %s - Changed link(s) to %s', + 'sr': u'Решавање вишезначних одредница помоћу бота: %s - Changed link(s) to %s', + 'sv': u'Länkar direkt till rätt artikel för: %s - Bytte länk(ar) till %s', + } + +# Summary message when working on redirects and the link is removed +msg_redir_unlink = { + 'ar': u'توضيح بمساعدة روبوت: %s - أزال الوصلة أو الوصلات', + 'cs': u'Robot opravil přesměrování na %s - Odstraněn(y) odkaz(y)', + 'en': u'Robot-assisted disambiguation: %s - Removed link(s)', + 'da': u'Retter flertydigt link til: %s - Fjernede link(s)', + 'de': u'Bot-unterstützte Redirectauflösung: %s - Link(s) entfernt', + 'fr': u'Correction de lien vers redirect: %s - Retrait du (des) lien(s)', + 'fi': u'Täsmennystä botin avulla: %s - poistettiin linkkejä', + 'he': u'הסרת קישור לדף פירושונים באמצעות בוט: %s', + 'hu': u'Bottal támogatott egyértelműsítés: %s – hivatkozások eltávolítása', + 'ia': u'Resolution de redirectiones assistite per robot: %s - Removed link(s).', + 'it': u'Sistemazione automatica del redirect: %s - Collegamenti rimossi', + 'lt': u'Nuorodų į peradresavimo straipsnį keitimas: %s - Pašalintos nuorodos', + 'kk': u'Айрықты мағыналарды бот көмегімен шешу: %s - Removed link(s).', + 'ko': u'로봇의 도움을 받아 동음이의 처리: [[%s]] - 링크 제거', + 'nl': u'Robot-geholpen redirect-oplossing: [[%s]] - Link(s) weggehaald', + 'no': u'bot: Endrer omdirigeringslenke: %s - Fjernet lenke(r)', + 'pl': u'Wspomagane przez robota ujednoznacznienie: %s - Usunięto link(i)', + 'pt': u'Desambiguação assistida por bot: %s link(s) removidos', + 'ru': u'Разрешение значений с помощью бота: %s - Removed link(s)', + 'sr': u'Решавање вишезначних одредница помоћу бота: %s - Removed link(s)', + 'sv': u'Länkar direkt till rätt artikel för: %s - Tog bort länk(ar)', + } + +# Summary message to (unknown) +unknown_msg = { + 'ar' : u'(غير معروف)', + 'en' : u'(unknown)', + 'fi' : u'(tuntematon)', + 'hu' : u'(ismeretlen)', + 'pt' : u'(desconhecido)', + } + +# disambiguation page name format for "primary topic" disambiguations +# (Begriffsklärungen nach Modell 2) +primary_topic_format = { + 'ar': u'%s_(توضيح)', + 'cs': u'%s_(rozcestník)', + 'de': u'%s_(Begriffsklärung)', + 'en': u'%s_(disambiguation)', + 'fi': u'%s_(täsmennyssivu)', + 'hu': u'%s_(egyértelműsítő lap)', + 'ia': u'%s_(disambiguation)', + 'it': u'%s_(disambigua)', + 'lt': u'%s_(reikšmės)', + 'kk': u'%s_(айрық)', + 'ko': u'%s_(동음이의)', + 'nl': u'%s_(doorverwijspagina)', + 'no': u'%s_(peker)', + 'pl': u'%s_(ujednoznacznienie)', + 'pt': u'%s_(desambiguação)', + 'he': u'%s_(פירושונים)', + 'ru': u'%s_(значения)', + 'sr': u'%s_(вишезначна одредница)', + 'sv': u'%s_(olika betydelser)', + } + +# List pages that will be ignored if they got a link to a disambiguation +# page. An example is a page listing disambiguations articles. +# Special chars should be encoded with unicode (\x##) and space used +# instead of _ + +ignore_title = { + 'wikipedia': { + 'ar': [ + u'تصنيف:صفحات توضيح', + ], + 'cs': [ + u'Wikipedie:Chybějící interwiki/.+', + u'Wikipedie:Rozcestníky', + u'Wikipedie diskuse:Rozcestníky', + u'Wikipedie:Seznam nejvíce odkazovaných rozcestníků', + u'Wikipedie:Seznam rozcestníků/první typ', + u'Wikipedie:Seznam rozcestníků/druhý typ', + u'Wikipedista:Zirland/okres', + ], + 'da': [ + u'Wikipedia:Links til sider med flertydige titler' + ], + 'de': [ + u'Benutzer:Katharina/Begriffsklärungen', + u'Benutzer:Kirschblut/.+buchstabenkürzel', + u'Benutzer:Noisper/Dingliste/[A-Z]', + u'Benutzer:SirJective/.+', + u'Benutzer:SrbBot/Index/.+', + u'Benutzer Diskussion:.+', + u'GISLexikon $[A-Z]$', + u'Lehnwort', + u'Liste griechischer Wortstämme in deutschen Fremdwörtern', + u'Liste von Gräzismen', + u'Portal:Abkürzungen/.+', + u'Wikipedia:Archiv:.+', + u'Wikipedia:Artikelwünsche/Ding-Liste/[A-Z]', + u'Wikipedia:Begriffsklärung.*', + u'Wikipedia:Dreibuchstabenkürzel von [A-Z][A-Z][A-Z] bis [A-Z][A-Z][A-Z]', + u'Wikipedia:Interwiki-Konflikte', + u'Wikipedia:Kurze Artikel', + u'Wikipedia:Liste aller 2-Buchstaben-Kombinationen', + u'Wikipedia:Liste mathematischer Themen/BKS', + u'Wikipedia:Liste mathematischer Themen/Redirects', + u'Wikipedia:Löschkandidaten/.+', + u'Wikipedia:Qualitätsoffensive/UNO', #requested by Benutzer:Addicted + u'Wikipedia:WikiProjekt Altertumswissenschaft/.+', + u'Wikipedia:WikiProjekt Verwaiste Seiten/Begriffsklärungen', + ], + 'en': [ + u'Wikipedia:Links to disambiguating pages', + u'Wikipedia:Disambiguation pages with links', + u'Wikipedia:Multiple-place names $[A-Z]$', + u'Wikipedia:Non-unique personal name', + u"User:Jerzy/Disambiguation Pages i've Editted", + u'User:Gareth Owen/inprogress', + u'TLAs from [A-Z][A-Z][A-Z] to [A-Z][A-Z][A-Z]', + u'List of all two-letter combinations', + u'User:Daniel Quinlan/redirects.+', + u'User:Oliver Pereira/stuff', + u'Wikipedia:French Wikipedia language links', + u'Wikipedia:Polish language links', + u'Wikipedia:Undisambiguated abbreviations/.+', + u'List of acronyms and initialisms', + u'Wikipedia:Usemod article histories', + u'User:Pizza Puzzle/stuff', + u'List of generic names of political parties', + u'Talk:List of initialisms/marked', + u'Talk:List of initialisms/sorted', + u'Talk:Programming language', + u'Talk:SAMPA/To do', + u"Wikipedia:Outline of Roget's Thesaurus", + u'User:Wik/Articles', + u'User:Egil/Sandbox', + u'Wikipedia talk:Make only links relevant to the context', + u'Wikipedia:Common words, searching for which is not possible', + ], + 'fi': [ + u'Wikipedia:Luettelo täsmennyssivuista', + u'Wikipedia:Luettelo (täsmennyssivuista)', + u'Wikipedia:Täsmennyssivu', + ], + 'fr': [ + u'Wikipédia:Liens aux pages d\'homonymie', + u'Wikipédia:Homonymie', + u'Wikipédia:Homonymie/Homonymes dynastiques', + u'Wikipédia:Prise de décision, noms des membres de dynasties/liste des dynastiens', + u'Liste de toutes les combinaisons de deux lettres', + u'Wikipédia:Log d\'upload/.*', + u'Sigles de trois lettres de [A-Z]AA à [A-Z]ZZ', + u'Wikipédia:Pages sans interwiki,.' + ], + 'fy': [ + u'Wikipedy:Fangnet', + ], + 'ia': [ + u'Categoria:Disambiguation', + u'Wikipedia:.+', + u'Usator:.+', + u'Discussion Usator:.+', + ], + 'it': [ + u'Aiuto:Disambigua/Disorfanamento', + u'Discussioni utente:.+', + u'Utente:Civvì/disorfanamento', + ], + 'kk': [ + u'Санат:Айрықты бет', + ], + 'ko': [ + u'위키백과:(동음이의) 문서의 목록', + u'위키백과:동음이의어 문서의 목록', + ], + 'lt': [ + u'Wikipedia:Rodomi nukreipiamieji straipsniai', + ], + 'nl': [ + u"Gebruiker:.*", + u"Overleg gebruiker:.+[aA]rchief.*", + u"Overleg gebruiker:Pven", + u"Portaal:.+[aA]rchief.*", + u"Wikipedia:Humor en onzin.*", + u"Wikipedia:Links naar doorverwijspagina's/Winkeldochters.*", + u"Wikipedia:Project aanmelding bij startpagina's", + u"Wikipedia:Wikiproject Roemeense gemeenten/Doorverwijspagina's", + u'Categorie:Doorverwijspagina', + u'Lijst van Nederlandse namen van pausen', + u'Overleg Wikipedia:Discussie spelling 2005', + u'Overleg Wikipedia:Doorverwijspagina', + u'Overleg Wikipedia:Logboek.*', + u'Wikipedia:Logboek.*', + u'Overleg gebruiker:Sybren/test.*', + u'Overleg gebruiker:[0-9][0-9]?[0-9]?\.[0-9][0-9]?[0-9]?\.[0-9][0-9]?[0-9]?\.[0-9][0-9]?[0-9]?', + u'Overleg:Lage Landen (staatkunde)', + u'Wikipedia:.*[aA]rchief.*', + u'Wikipedia:Doorverwijspagina', + u'Wikipedia:Lijst van alle tweeletter-combinaties', + u'Wikipedia:Onderhoudspagina', + u'Wikipedia:Ongelijke redirects', + u'Wikipedia:Protection log', + u'Wikipedia:Te verwijderen.*', + u'Wikipedia:Top 1000 van meest bekeken artikelen', + u'Wikipedia:Wikipedianen met een encyclopedisch artikel', + u'Wikipedia:Woorden die niet als zoekterm gebruikt kunnen worden', + ], + 'pl': [ + u'Wikipedysta:.+', + u'Dyskusja.+:.+', + ], + 'pt': [ + u'Usuário:.+', + u'Usuário Discussão:.+', + u'Discussão:.+', + u'Lista de combinações de duas letras', + u'Wikipedia:Lista de páginas de desambiguação.+', + u'Wikipedia:Páginas para eliminar/.+', + ], + 'ru': [ + u'Категория:Disambig', + u'Википедия:Страницы разрешения неоднозначностей', + u'Википедия:Вики-уборка/Статьи без языковых ссылок', + u'Википедия:Страницы с пометкой «(значения)»', + u'Список общерусских фамилий', + ], + }, + 'memoryalpha': { + 'en': [ + u'Memory Alpha:Links to disambiguating pages' + ], + 'de': [ + u'Memory Alpha:Liste der Wortklärungsseiten' + ], + }, +} + +def firstcap(string): + return string[0].upper()+string[1:] + +def correctcap(link, text): + # If text links to a page with title link uncapitalized, uncapitalize link, otherwise capitalize it + linkupper = link.title() + linklower = linkupper[0].lower() + linkupper[1:] + if text.find("[[%s]]"%linklower) > -1 or text.find("[[%s|"%linklower) > -1: + return linklower + else: + return linkupper + +class ReferringPageGeneratorWithIgnore: + def __init__(self, disambPage, primary=False, minimum = 0): + self.disambPage = disambPage + # if run with the -primary argument, enable the ignore manager + self.primaryIgnoreManager = PrimaryIgnoreManager(disambPage, + enabled=primary) + self.minimum = minimum + + def __iter__(self): + # TODO: start yielding before all referring pages have been found + refs = [page for page in self.disambPage.getReferences(follow_redirects = False, withTemplateInclusion = False)] + wikipedia.output(u"Found %d references." % len(refs)) + # Remove ignorables + if ignore_title.has_key(self.disambPage.site().family.name) and ignore_title[self.disambPage.site().family.name].has_key(self.disambPage.site().lang): + for ig in ignore_title[self.disambPage.site().family.name][self.disambPage.site().lang]: + for i in range(len(refs)-1, -1, -1): + if re.match(ig, refs[i].title()): + if wikipedia.verbose: + wikipedia.output('Ignoring page %s' + % refs[i].title()) + del refs[i] + elif self.primaryIgnoreManager.isIgnored(refs[i]): + #wikipedia.output('Ignoring page %s because it was skipped before' % refs[i].title()) + del refs[i] + if len(refs) < self.minimum: + wikipedia.output(u"Found only %d pages to work on; skipping." % len(refs)) + return + wikipedia.output(u"Will work on %d pages." % len(refs)) + for ref in refs: + yield ref + +class PrimaryIgnoreManager(object): + ''' + If run with the -primary argument, reads from a file which pages should + not be worked on; these are the ones where the user pressed n last time. + If run without the -primary argument, doesn't ignore any pages. + ''' + def __init__(self, disambPage, enabled = False): + self.disambPage = disambPage + self.enabled = enabled + + self.ignorelist = [] + filename = wikipedia.config.datafilepath('disambiguations', + self.disambPage.titleForFilename() + '.txt') + try: + # The file is stored in the disambiguation/ subdir. Create if necessary. + f = codecs.open(filename, 'r', 'utf-8') + for line in f.readlines(): + # remove trailing newlines and carriage returns + while line[-1] in ['\n', '\r']: + line = line[:-1] + #skip empty lines + if line != '': + self.ignorelist.append(line) + f.close() + except IOError: + pass + + def isIgnored(self, refPage): + return self.enabled and refPage.urlname() in self.ignorelist + + def ignore(self, refPage): + if self.enabled: + # Skip this occurence next time. + filename = wikipedia.config.datafilepath('disambiguations', + self.disambPage.urlname() + '.txt') + try: + # Open file for appending. If none exists yet, create a new one. + # The file is stored in the disambiguation/ subdir. Create if necessary. + f = codecs.open(filename, 'a', 'utf-8') + f.write(refPage.urlname() + '\n') + f.close() + except IOError: + pass + + +class DisambiguationRobot(object): + ignore_contents = { + 'de':(u'{{[Ii]nuse}}', + u'{{[Ll]öschen}}', + ), + 'fi':(u'{{[Tt]yöstetään}}', + ), + 'kk':(u'{{[Ii]nuse}}', + u'{{[Pp]rocessing}}', + ), + 'nl':(u'{{wiu2}}', + u'{{nuweg}}', + ), + 'ru':(u'{{[Ii]nuse}}', + u'{{[Pp]rocessing}}', + ), + } + + primary_redir_template = { + # Page.templates() format, first letter uppercase + 'hu': u'Egyért-redir', + } + + def __init__(self, always, alternatives, getAlternatives, generator, primary, main_only, minimum = 0): + self.always = always + self.alternatives = alternatives + self.getAlternatives = getAlternatives + self.generator = generator + self.primary = primary + self.main_only = main_only + self.minimum = minimum + + self.mysite = wikipedia.getSite() + self.mylang = self.mysite.language() + self.comment = None + + self.setupRegexes() + + def checkContents(self, text): + ''' + For a given text, returns False if none of the regular + expressions given in the dictionary at the top of this class + matches a substring of the text. + Otherwise returns the substring which is matched by one of + the regular expressions. + ''' + for ig in self.ignore_contents_regexes: + match = ig.search(text) + if match: + return match.group() + return None + + def makeAlternativesUnique(self): + # remove duplicate entries + result={} + for i in self.alternatives: + result[i]=None + self.alternatives = result.keys() + + def listAlternatives(self): + list = u'\n' + for i in range(len(self.alternatives)): + list += (u"%3i - %s\n" % (i, self.alternatives[i])) + wikipedia.output(list) + + def setupRegexes(self): + # compile regular expressions + self.ignore_contents_regexes = [] + if self.ignore_contents.has_key(self.mylang): + for ig in self.ignore_contents[self.mylang]: + self.ignore_contents_regexes.append(re.compile(ig)) + + linktrail = self.mysite.linktrail() + self.trailR = re.compile(linktrail) + # The regular expression which finds links. Results consist of four groups: + # group title is the target page title, that is, everything before | or ]. + # group section is the page section. It'll include the # to make life easier for us. + # group label is the alternative link title, that's everything between | and ]. + # group linktrail is the link trail, that's letters after ]] which are part of the word. + # note that the definition of 'letter' varies from language to language. + self.linkR = re.compile(r'\[\[(?P<title>[^\]\|#]*)(?P<section>#[^\]\|]*)?(\|(?P<label>[^\]]*))?\]\](?P<linktrail>' + linktrail + ')') + + def treat(self, refPage, disambPage): + """ + Parameters: + disambPage - The disambiguation page or redirect we don't want anything + to link on + refPage - A page linking to disambPage + Returns False if the user pressed q to completely quit the program. + Otherwise, returns True. + """ + # TODO: break this function up into subroutines! + + include = False + unlink = False + new_targets = [] + try: + text=refPage.get(throttle=False) + ignoreReason = self.checkContents(text) + if ignoreReason: + wikipedia.output('\n\nSkipping %s because it contains %s.\n\n' % (refPage.title(), ignoreReason)) + else: + include = True + except wikipedia.IsRedirectPage: + wikipedia.output(u'%s is a redirect to %s' % (refPage.title(), disambPage.title())) + if disambPage.isRedirectPage(): + target = self.alternatives[0] + choice = wikipedia.inputChoice(u'Do you want to make redirect %s point to %s?' % (refPage.title(), target), ['yes', 'no'], ['y', 'N'], 'N') + if choice == 'y': + redir_text = '#%s [[%s]]' % (self.mysite.redirect(default=True), target) + try: + refPage.put_async(redir_text,comment=self.comment) + except wikipedia.PageNotSaved, error: + wikipedia.output(u'Page not saved: %s' % error.args) + else: + choice = wikipedia.inputChoice(u'Do you want to work on pages linking to %s?' % refPage.title(), ['yes', 'no', 'change redirect'], ['y', 'N', 'c'], 'N') + if choice == 'y': + gen = ReferringPageGeneratorWithIgnore(refPage, self.primary) + preloadingGen = pagegenerators.PreloadingGenerator(gen) + for refPage2 in preloadingGen: + # run until the user selected 'quit' + if not self.treat(refPage2, refPage): + break + elif choice == 'c': + text=refPage.get(throttle=False,get_redirect=True) + include = "redirect" + except wikipedia.NoPage: + wikipedia.output(u'Page [[%s]] does not seem to exist?! Skipping.' % refPage.title()) + include = False + if include in (True, "redirect"): + # make a backup of the original text so we can show the changes later + original_text = text + n = 0 + curpos = 0 + edited = False + # This loop will run until we have finished the current page + while True: + m = self.linkR.search(text, pos = curpos) + if not m: + if n == 0: + wikipedia.output(u"No changes necessary in %s" % refPage.title()) + return True + else: + # stop loop and save page + break + # Make sure that next time around we will not find this same hit. + curpos = m.start() + 1 + # ignore interwiki links and links to sections of the same page + if m.group('title') == '' or self.mysite.isInterwikiLink(m.group('title')): + continue + else: + try: + linkPage = wikipedia.Page(disambPage.site(), m.group('title')) + # Check whether the link found is to disambPage. + except wikipedia.InvalidTitle: + continue + if linkPage != disambPage: + continue + + n += 1 + # how many bytes should be displayed around the current link + context = 60 + # This loop will run while the user doesn't choose an option + # that will actually change the page + while True: + # Show the title of the page where the link was found. + # Highlight the title in purple. + wikipedia.output(u"\n\n>>> \03{lightpurple}%s\03{default} <<<" % refPage.title()) + + # at the beginning of the link, start red color. + # at the end of the link, reset the color to default + wikipedia.output(text[max(0, m.start() - context) : m.start()] + '\03{lightred}' + text[m.start() : m.end()] + '\03{default}' + text[m.end() : m.end() + context]) + + if not self.always: + if edited: + choice = wikipedia.input(u"Option (#, r#, s=skip link, e=edit page, n=next page, u=unlink, q=quit\n" + " m=more context, l=list, a=add new, x=save in this form):") + else: + choice = wikipedia.input(u"Option (#, r#, s=skip link, e=edit page, n=next page, u=unlink, q=quit\n" + " m=more context, d=show disambiguation page, l=list, a=add new):") + else: + choice = self.always + if choice in ['a', 'A']: + newAlternative = wikipedia.input(u'New alternative:') + self.alternatives.append(newAlternative) + self.listAlternatives() + elif choice in ['e', 'E']: + editor = editarticle.TextEditor() + newText = editor.edit(text, jumpIndex = m.start(), highlight = disambPage.title()) + # if user didn't press Cancel + if newText and newText != text: + text = newText + break + elif choice in ['d', 'D']: + editor = editarticle.TextEditor() + if disambPage.isRedirectPage(): + disambredir = disambPage.getRedirectTarget() + disambigText = editor.edit(disambredir.get(), jumpIndex = m.start(), highlight = disambredir.title()) + else: + disambigText = editor.edit(disambPage.get(), jumpIndex = m.start(), highlight = disambPage.title()) + elif choice in ['l', 'L']: + self.listAlternatives() + elif choice in ['m', 'M']: + # show more text around the link we're working on + context *= 2 + else: + break + + if choice in ['e', 'E']: + # user has edited the page and then pressed 'OK' + edited = True + curpos = 0 + continue + elif choice in ['n', 'N']: + # skip this page + if self.primary: + # If run with the -primary argument, skip this occurence next time. + self.primaryIgnoreManager.ignore(refPage) + return True + elif choice in ['q', 'Q']: + # quit the program + return False + elif choice in ['s', 'S']: + # Next link on this page + n -= 1 + continue + elif choice in ['x', 'X'] and edited: + # Save the page as is + break + + # The link looks like this: + # [[page_title|link_text]]trailing_chars + page_title = m.group('title') + link_text = m.group('label') + + if not link_text: + # or like this: [[page_title]]trailing_chars + link_text = page_title + if m.group('section') == None: + section = '' + else: + section = m.group('section') + trailing_chars = m.group('linktrail') + if trailing_chars: + link_text += trailing_chars + + if choice in ['u', 'U']: + # unlink - we remove the section if there's any + text = text[:m.start()] + link_text + text[m.end():] + unlink = True + continue + else: + if len(choice)>0 and choice[0] == 'r': + # we want to throw away the original link text + replaceit = True + choice = choice[1:] + elif include == "redirect": + replaceit = True + else: + replaceit = False + + try: + choice=int(choice) + except ValueError: + wikipedia.output(u"Unknown option") + # step back to ask the user again what to do with the current link + curpos -= 1 + continue + if choice >= len(self.alternatives) or choice < 0: + wikipedia.output(u"Choice out of range. Please select a number between 0 and %i." % (len(self.alternatives) - 1)) + # show list of possible choices + self.listAlternatives() + # step back to ask the user again what to do with the current link + curpos -= 1 + continue + new_page_title = self.alternatives[choice] + repPl = wikipedia.Page(disambPage.site(), new_page_title) + if (new_page_title[0].isupper()) or (link_text[0].isupper()): + new_page_title = repPl.title() + else: + new_page_title = repPl.title() + new_page_title = new_page_title[0].lower() + new_page_title[1:] + if new_page_title not in new_targets: + new_targets.append(new_page_title) + if replaceit and trailing_chars: + newlink = "[[%s%s]]%s" % (new_page_title, section, trailing_chars) + elif replaceit or (new_page_title == link_text and not section): + newlink = "[[%s]]" % new_page_title + # check if we can create a link with trailing characters instead of a pipelink + elif len(new_page_title) <= len(link_text) and firstcap(link_text[:len(new_page_title)]) == firstcap(new_page_title) and re.sub(self.trailR, '', link_text[len(new_page_title):]) == '' and not section: + newlink = "[[%s]]%s" % (link_text[:len(new_page_title)], link_text[len(new_page_title):]) + else: + newlink = "[[%s%s|%s]]" % (new_page_title, section, link_text) + text = text[:m.start()] + newlink + text[m.end():] + continue + + wikipedia.output(text[max(0,m.start()-30):m.end()+30]) + if text == original_text: + wikipedia.output(u'\nNo changes have been made:\n') + else: + wikipedia.output(u'\nThe following changes have been made:\n') + wikipedia.showDiff(original_text, text) + wikipedia.output(u'') + # save the page + self.setSummaryMessage(disambPage, new_targets, unlink) + try: + refPage.put_async(text,comment=self.comment) + except wikipedia.LockedPage: + wikipedia.output(u'Page not saved: page is locked') + except wikipedia.PageNotSaved, error: + wikipedia.output(u'Page not saved: %s' % error.args) + return True + + def findAlternatives(self, disambPage): + if disambPage.isRedirectPage() and not self.primary: + if self.primary_redir_template.has_key(disambPage.site().lang) and self.primary_redir_template[disambPage.site().lang] in disambPage.templates(get_redirect = True): + baseTerm = disambPage.title() + for template in disambPage.templatesWithParams(get_redirect = True): + if template[0] == self.primary_redir_template[disambPage.site().lang] and len(template[1]) > 0: + baseTerm = template[1][1] + disambTitle = primary_topic_format[self.mylang] % baseTerm + try: + disambPage2 = wikipedia.Page(self.mysite, disambTitle) + links = disambPage2.linkedPages() + links = [correctcap(l,disambPage2.get()) for l in links] + except wikipedia.NoPage: + wikipedia.output(u"No page at %s, using redirect target." % disambTitle) + links = disambPage.linkedPages()[:1] + links = [correctcap(l,disambPage.get(get_redirect = True)) for l in links] + self.alternatives += links + else: + try: + target = disambPage.getRedirectTarget().title() + self.alternatives.append(target) + except wikipedia.NoPage: + wikipedia.output(u"The specified page was not found.") + user_input = wikipedia.input(u"""\ +Please enter the name of the page where the redirect should have pointed at, +or press enter to quit:""") + if user_input == "": + sys.exit(1) + else: + self.alternatives.append(user_input) + except wikipedia.IsNotRedirectPage: + wikipedia.output( + u"The specified page is not a redirect. Skipping.") + return False + elif self.getAlternatives: + try: + if self.primary: + try: + disambPage2 = wikipedia.Page(self.mysite, + primary_topic_format[self.mylang] + % disambPage.title() + ) + links = disambPage2.linkedPages() + links = [correctcap(l,disambPage2.get()) for l in links] + except wikipedia.NoPage: + wikipedia.output(u"Page does not exist, using the first link in page %s." % disambPage.title()) + links = disambPage.linkedPages()[:1] + links = [correctcap(l,disambPage.get()) for l in links] + else: + try: + links = disambPage.linkedPages() + links = [correctcap(l,disambPage.get()) for l in links] + except wikipedia.NoPage: + wikipedia.output(u"Page does not exist, skipping.") + return False + except wikipedia.IsRedirectPage: + wikipedia.output(u"Page is a redirect, skipping.") + return False + self.alternatives += links + return True + + def setSummaryMessage(self, disambPage, new_targets = [], unlink = False): + # make list of new targets + targets = '' + for page_title in new_targets: + targets += u'[[%s]], ' % page_title + # remove last comma + targets = targets[:-2] + + if not targets: + targets = wikipedia.translate(self.mysite, unknown_msg) + + # first check whether user has customized the edit comment + if wikipedia.config.disambiguation_comment.has_key(self.mysite.family.name) and wikipedia.config.disambiguation_comment[self.mysite.family.name].has_key(self.mylang): + try: + self.comment = wikipedia.translate(self.mysite, + wikipedia.config.disambiguation_comment[ + self.mysite.family.name] + ) % (disambPage.title(), targets) + #Backwards compatibility, type error probably caused by too many arguments for format string + except TypeError: + self.comment = wikipedia.translate(self.mysite, + wikipedia.config.disambiguation_comment[ + self.mysite.family.name] + ) % disambPage.title() + elif disambPage.isRedirectPage(): + # when working on redirects, there's another summary message + if unlink and not new_targets: + self.comment = wikipedia.translate(self.mysite, msg_redir_unlink) % disambPage.title() + else: + self.comment = wikipedia.translate(self.mysite, msg_redir) % (disambPage.title(), targets) + else: + if unlink and not new_targets: + self.comment = wikipedia.translate(self.mysite, msg_unlink) % disambPage.title() + else: + self.comment = wikipedia.translate(self.mysite, msg) % (disambPage.title(), targets) + + def run(self): + if self.main_only: + if not ignore_title.has_key(self.mysite.family.name): + ignore_title[self.mysite.family.name] = {} + if not ignore_title[self.mysite.family.name].has_key(self.mylang): + ignore_title[self.mysite.family.name][self.mylang] = [] + ignore_title[self.mysite.family.name][self.mylang] += [ + u'%s:' % namespace for namespace in self.mysite.namespaces()] + + for disambPage in self.generator: + self.primaryIgnoreManager = PrimaryIgnoreManager(disambPage, enabled=self.primary) + + if not self.findAlternatives(disambPage): + continue + + self.makeAlternativesUnique() + # sort possible choices + if wikipedia.config.sort_ignore_case: + self.alternatives.sort(lambda x,y: cmp(x.lower(), y.lower())) + else: + self.alternatives.sort() + self.listAlternatives() + + gen = ReferringPageGeneratorWithIgnore(disambPage, self.primary, minimum = self.minimum) + preloadingGen = pagegenerators.PreloadingGenerator(gen) + for refPage in preloadingGen: + if not self.primaryIgnoreManager.isIgnored(refPage): + # run until the user selected 'quit' + if not self.treat(refPage, disambPage): + break + + # clear alternatives before working on next disambiguation page + self.alternatives = [] + +def main(): + # the option that's always selected when the bot wonders what to do with + # a link. If it's None, the user is prompted (default behaviour). + always = None + alternatives = [] + getAlternatives = True + # if the -file argument is used, page titles are dumped in this array. + # otherwise it will only contain one page. + generator = None + # This temporary array is used to read the page title if one single + # page to work on is specified by the arguments. + pageTitle = [] + primary = False + main_only = False + + # For sorting the linked pages, case can be ignored + ignoreCase = False + minimum = 0 + + for arg in wikipedia.handleArgs(): + if arg.startswith('-primary:'): + primary = True + getAlternatives = False + alternatives.append(arg[9:]) + elif arg == '-primary': + primary = True + elif arg.startswith('-always:'): + always = arg[8:] + elif arg.startswith('-file'): + if len(arg) == 5: + generator = pagegenerators.TextfilePageGenerator(filename = None) + else: + generator = pagegenerators.TextfilePageGenerator(filename = arg[6:]) + elif arg.startswith('-pos:'): + if arg[5]!=':': + mysite = wikipedia.getSite() + page = wikipedia.Page(mysite, arg[5:]) + if page.exists(): + alternatives.append(page.title()) + else: + answer = wikipedia.inputChoice(u'Possibility %s does not actually exist. Use it anyway?' + % page.title(), ['yes', 'no'], ['y', 'N'], 'N') + if answer == 'y': + alternatives.append(page.title()) + else: + alternatives.append(arg[5:]) + elif arg == '-just': + getAlternatives = False + elif arg == '-main': + main_only = True + elif arg.startswith('-min:'): + minimum = int(arg[5:]) + elif arg.startswith('-start'): + try: + if len(arg) <= len('-start:'): + generator = pagegenerators.CategorizedPageGenerator(wikipedia.getSite().disambcategory()) + else: + generator = pagegenerators.CategorizedPageGenerator(wikipedia.getSite().disambcategory(), start = arg[7:]) + generator = pagegenerators.NamespaceFilterPageGenerator(generator, [0]) + except wikipedia.NoPage: + print "Disambiguation category for your wiki is not known." + raise + elif arg.startswith("-"): + print "Unrecognized command line argument: %s" % arg + # show help text and exit + wikipedia.showHelp() + else: + pageTitle.append(arg) + + # if the disambiguation page is given as a command line argument, + # connect the title's parts with spaces + if pageTitle != []: + pageTitle = ' '.join(pageTitle) + page = wikipedia.Page(wikipedia.getSite(), pageTitle) + generator = iter([page]) + + # if no disambiguation pages was given as an argument, and none was + # read from a file, query the user + if not generator: + pageTitle = wikipedia.input(u'On which disambiguation page do you want to work?') + page = wikipedia.Page(wikipedia.getSite(), pageTitle) + generator = iter([page]) + + bot = DisambiguationRobot(always, alternatives, getAlternatives, generator, primary, main_only, minimum = minimum) + bot.run() + + + +if __name__ == "__main__": + try: + main() + finally: + wikipedia.stopme()

1 0

← Newer
1
...
15
16
17
18
19
20
21
...
25
Older →

Jump to page:

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

pywikibot January 2009