Pywikibot-commits January 2014

pywikibot-commits@lists.wikimedia.org

1 participants
130 discussions

[Gerrit] remove deprecate_arg "throttle" for Page.delete() - change (pywikibot/core)
by jenkins-bot (Code Review) 02 Jan '14

02 Jan '14

jenkins-bot has submitted this change and it was merged. Change subject: remove deprecate_arg "throttle" for Page.delete() ...................................................................... remove deprecate_arg "throttle" for Page.delete() Change-Id: Ifceb46aefbdfe3c708c18b0bdb2d550a9bf035fd --- M pywikibot/page.py 1 file changed, 1 insertion(+), 1 deletion(-) Approvals: Ladsgroup: Looks good to me, approved jenkins-bot: Verified diff --git a/pywikibot/page.py b/pywikibot/page.py index 6504d10..1bb8618 100644 --- a/pywikibot/page.py +++ b/pywikibot/page.py @@ -1270,7 +1270,7 @@ noredirect=deleteAndMove) @deprecate_arg("throttle", None) - def delete(self, reason=None, prompt=True, throttle=None, mark=False): + def delete(self, reason=None, prompt=True, mark=False): """Deletes the page from the wiki. Requires administrator status. @param reason: The edit summary for the deletion, or rationale -- To view, visit https://gerrit.wikimedia.org/r/104802 To unsubscribe, visit https://gerrit.wikimedia.org/r/settings Gerrit-MessageType: merged Gerrit-Change-Id: Ifceb46aefbdfe3c708c18b0bdb2d550a9bf035fd Gerrit-PatchSet: 1 Gerrit-Project: pywikibot/core Gerrit-Branch: master Gerrit-Owner: Xqt <info(a)gno.de> Gerrit-Reviewer: Ladsgroup <ladsgroup(a)gmail.com> Gerrit-Reviewer: Legoktm <legoktm.wikipedia(a)gmail.com> Gerrit-Reviewer: Merlijn van Deen <valhallasw(a)arctus.nl> Gerrit-Reviewer: Russell Blau <russblau(a)imapmail.org> Gerrit-Reviewer: jenkins-bot

1 0

[Gerrit] Do not require login for pure logging purposes - change (pywikibot/core)
by jenkins-bot (Code Review) 02 Jan '14

02 Jan '14

jenkins-bot has submitted this change and it was merged. Change subject: Do not require login for pure logging purposes ...................................................................... Do not require login for pure logging purposes If the user is not logged in, then don't display their talk page status, as this would require a log-in. This is annoying if one just needs read-only access, or needs to read a wiki that has a broken login (e.g. toolserver) Change-Id: Ia46d1938221ef70aaf2a752f1b9a4cc5493d34dc --- M pywikibot/bot.py 1 file changed, 8 insertions(+), 1 deletion(-) Approvals: Xqt: Looks good to me, approved jenkins-bot: Verified diff --git a/pywikibot/bot.py b/pywikibot/bot.py index 13aaf41..2d478c5 100644 --- a/pywikibot/bot.py +++ b/pywikibot/bot.py @@ -272,7 +272,14 @@ log(u' %s' % ver) # messages on bot discussion page? - log(u'MESSAGES: %s' % ('unanswered' if site.messages() else 'none')) + if site.logged_in(): + if site.messages(): + messagestate = 'unanswered' + else: + messagestate = 'none' + else: + messagestate = 'unknown (not logged in)' + log(u'MESSAGES: %s' % messagestate) log(u'=== ' * 14) -- To view, visit https://gerrit.wikimedia.org/r/104962 To unsubscribe, visit https://gerrit.wikimedia.org/r/settings Gerrit-MessageType: merged Gerrit-Change-Id: Ia46d1938221ef70aaf2a752f1b9a4cc5493d34dc Gerrit-PatchSet: 2 Gerrit-Project: pywikibot/core Gerrit-Branch: master Gerrit-Owner: Merlijn van Deen <valhallasw(a)arctus.nl> Gerrit-Reviewer: Ladsgroup <ladsgroup(a)gmail.com> Gerrit-Reviewer: Xqt <info(a)gno.de> Gerrit-Reviewer: jenkins-bot

1 0

[Gerrit] category: Dummy added -hist option for move action - change (pywikibot/core)
by jenkins-bot (Code Review) 02 Jan '14

02 Jan '14

jenkins-bot has submitted this change and it was merged. Change subject: category: Dummy added -hist option for move action ...................................................................... category: Dummy added -hist option for move action Added boilerplate code for -hist option which is not yet ported from compat to core. Raise a not Implemented error if it the option is used. Change-Id: I2f583377b3b32127ce968ed05306cda38247d1b0 --- M scripts/category.py 1 file changed, 11 insertions(+), 2 deletions(-) Approvals: Xqt: Looks good to me, approved jenkins-bot: Verified diff --git a/scripts/category.py b/scripts/category.py index e009f86..73988e4 100755 --- a/scripts/category.py +++ b/scripts/category.py @@ -38,6 +38,8 @@ English. Options for "move" action: + * -hist - Creates a nice wikitable on the talk page of target category + that contains detailed page history of the source category. * -nodelete - Don't delete the old category after move Options for several actions: @@ -391,7 +393,7 @@ def __init__(self, oldCatTitle, newCatTitle, batchMode=False, editSummary='', inPlace=False, moveCatPage=True, deleteEmptySourceCat=True, titleRegex=None, - useSummaryForDeletion=True): + useSummaryForDeletion=True, withHistory=True): self.editSummary = editSummary self.oldCat = pywikibot.Category( pywikibot.Link('Category:' + oldCatTitle)) @@ -402,8 +404,11 @@ self.deleteEmptySourceCat = deleteEmptySourceCat self.titleRegex = titleRegex self.useSummaryForDeletion = useSummaryForDeletion + self.withHistory = withHistory def run(self): + if self.withHistory: + raise NotImplementedError("History printing is not yet enabled.") site = pywikibot.getSite() newCat = pywikibot.Category( pywikibot.Link('Category:' + self.newCatTitle)) @@ -873,6 +878,7 @@ recurse = False titleRegex = None pagesonly = False + withHistory = True # This factory is responsible for processing command line arguments # that are also used by other scripts and that determine on which pages @@ -943,6 +949,8 @@ create_pages = True elif arg == '-redirect': follow_redirects = True + elif arg == '-hist': + withHistory = True else: genFactory.handleArg(arg) pywikibot.Site().login() @@ -976,7 +984,8 @@ bot = CategoryMoveRobot(oldCatTitle, newCatTitle, batchMode, editSummary, inPlace, deleteEmptySourceCat=deleteEmptySourceCat, - titleRegex=titleRegex) + titleRegex=titleRegex, + withHistory=withHistory) bot.run() elif action == 'tidy': catTitle = pywikibot.input(u'Which category do you want to tidy up?') -- To view, visit https://gerrit.wikimedia.org/r/104813 To unsubscribe, visit https://gerrit.wikimedia.org/r/settings Gerrit-MessageType: merged Gerrit-Change-Id: I2f583377b3b32127ce968ed05306cda38247d1b0 Gerrit-PatchSet: 2 Gerrit-Project: pywikibot/core Gerrit-Branch: master Gerrit-Owner: Pyfisch <pyfisch(a)gmail.com> Gerrit-Reviewer: Ladsgroup <ladsgroup(a)gmail.com> Gerrit-Reviewer: Merlijn van Deen <valhallasw(a)arctus.nl> Gerrit-Reviewer: Pyfisch <pyfisch(a)gmail.com> Gerrit-Reviewer: Xqt <info(a)gno.de> Gerrit-Reviewer: jenkins-bot

1 0

[Gerrit] Weblib: add docs, replace string concat with urlencode - change (pywikibot/core)
by jenkins-bot (Code Review) 02 Jan '14

02 Jan '14

jenkins-bot has submitted this change and it was merged. Change subject: Weblib: add docs, replace string concat with urlencode ...................................................................... Weblib: add docs, replace string concat with urlencode Change-Id: I18c8b7b4c47aba68cffd3435be7fdf4056e3620d --- M pywikibot/weblib.py 1 file changed, 39 insertions(+), 19 deletions(-) Approvals: Xqt: Looks good to me, approved jenkins-bot: Verified diff --git a/pywikibot/weblib.py b/pywikibot/weblib.py index d068925..c2ad86e 100644 --- a/pywikibot/weblib.py +++ b/pywikibot/weblib.py @@ -11,21 +11,31 @@ # __version__ = '$Id$' -import pywikibot +import urllib from pywikibot.comms import http def getInternetArchiveURL(url, timestamp=None): - """Return archived URL by Internet Archive.""" - # See [[:mw:Archived Pages]] and http://archive.org/help/wayback_api.php + """Return archived URL by Internet Archive. + + Parameters: + url - url to search an archived version for + timestamp - requested archive date. The version closest to that moment + is returned. Format: YYYYMMDDhhmmss or part thereof. + + See [[:mw:Archived Pages]] and http://archive.org/help/wayback_api.php + for more details. + """ import json - query = u'http://archive.org/wayback/available?' - query += u'url=' - query += url - if not timestamp is None: - query += u'&timestamp=' - query += timestamp - jsontext = http.request(uri=query, site=None) + uri = u'http://archive.org/wayback/available?' + + query = {'url': url} + + if timestamp is not None: + query['timestamp'] = timestamp + + uri = uri + urllib.urlencode(query) + jsontext = http.request(uri=uri, site=None) if "closest" in jsontext: data = json.loads(jsontext) return data['archived_snapshots']['closest']['url'] @@ -34,17 +44,27 @@ def getWebCitationURL(url, timestamp=None): - """Return archived URL by Web Citation.""" - # See http://www.webcitation.org/doc/WebCiteBestPracticesGuide.pdf + """Return archived URL by Web Citation. + + Parameters: + url - url to search an archived version for + timestamp - requested archive date. The version closest to that moment + is returned. Format: YYYYMMDDhhmmss or part thereof. + + See http://www.webcitation.org/doc/WebCiteBestPracticesGuide.pdf + for more details + """ import xml.etree.ElementTree as ET - query = u'http://www.webcitation.org/query?' - query += u'returnxml=true' - query += u'&url=' - query += url + uri = u'http://www.webcitation.org/query?' + + query = {'returnxml': 'true', + 'url': url} + if not timestamp is None: - query += u'&date=' - query += timestamp - xmltext = http.request(uri=query, site=None) + query['date'] = timestamp + + uri = uri + urllib.urlencode(query) + xmltext = http.request(uri=uri, site=None) if "success" in xmltext: data = ET.fromstring(xmltext) return data.find('.//webcite_url').text -- To view, visit https://gerrit.wikimedia.org/r/104804 To unsubscribe, visit https://gerrit.wikimedia.org/r/settings Gerrit-MessageType: merged Gerrit-Change-Id: I18c8b7b4c47aba68cffd3435be7fdf4056e3620d Gerrit-PatchSet: 2 Gerrit-Project: pywikibot/core Gerrit-Branch: master Gerrit-Owner: Merlijn van Deen <valhallasw(a)arctus.nl> Gerrit-Reviewer: Ladsgroup <ladsgroup(a)gmail.com> Gerrit-Reviewer: Xqt <info(a)gno.de> Gerrit-Reviewer: jenkins-bot

1 0

[Gerrit] Add tests for weblib - change (pywikibot/core)
by jenkins-bot (Code Review) 02 Jan '14

02 Jan '14

jenkins-bot has submitted this change and it was merged. Change subject: Add tests for weblib ...................................................................... Add tests for weblib Change-Id: I417d488ebebc39d05a4ce533b1bbf787801d69b0 --- A tests/weblib_tests.py 1 file changed, 38 insertions(+), 0 deletions(-) Approvals: Xqt: Looks good to me, approved jenkins-bot: Verified diff --git a/tests/weblib_tests.py b/tests/weblib_tests.py new file mode 100644 index 0000000..08c4804 --- /dev/null +++ b/tests/weblib_tests.py @@ -0,0 +1,38 @@ +# -*- coding: utf-8 -*- +# +# (C) Pywikipedia bot team, 2014 +# +# Distributed under the terms of the MIT license. +# +__version__ = '$Id$' + +from urlparse import urlparse +import pywikibot.weblib as weblib +from utils import unittest + + +class TestArchiveSites(unittest.TestCase): + def testInternetArchiveNewest(self): + archivedversion = weblib.getInternetArchiveURL('http://google.com') + parsed = urlparse(archivedversion) + self.assertIn(parsed.scheme, [u'http', u'https']) + self.assertEqual(parsed.netloc, u'web.archive.org') + self.assertTrue(parsed.path.endswith('www.google.com/')) + + def testInternetArchiveOlder(self): + archivedversion = weblib.getInternetArchiveURL('http://google.com', '200606') + parsed = urlparse(archivedversion) + self.assertIn(parsed.scheme, [u'http', u'https']) + self.assertEqual(parsed.netloc, u'web.archive.org') + self.assertTrue(parsed.path.endswith('www.google.com/')) + self.assertIn('200606', parsed.path) + + def testWebCiteOlder(self): + archivedversion = weblib.getWebCitationURL('http://google.com', '20130101') + self.assertEqual(archivedversion, 'http://www.webcitation.org/6DHSeh2L0') + +if __name__ == '__main__': + try: + unittest.main() + except SystemExit: + pass -- To view, visit https://gerrit.wikimedia.org/r/104803 To unsubscribe, visit https://gerrit.wikimedia.org/r/settings Gerrit-MessageType: merged Gerrit-Change-Id: I417d488ebebc39d05a4ce533b1bbf787801d69b0 Gerrit-PatchSet: 2 Gerrit-Project: pywikibot/core Gerrit-Branch: master Gerrit-Owner: Merlijn van Deen <valhallasw(a)arctus.nl> Gerrit-Reviewer: Ladsgroup <ladsgroup(a)gmail.com> Gerrit-Reviewer: Xqt <info(a)gno.de> Gerrit-Reviewer: jenkins-bot

1 0

[Gerrit] Prevent Page.change_category from readding newCat. - change (pywikibot/core)
by jenkins-bot (Code Review) 01 Jan '14

01 Jan '14

jenkins-bot has submitted this change and it was merged. Change subject: Prevent Page.change_category from readding newCat. ...................................................................... Prevent Page.change_category from readding newCat. Currently change_category also adds newCat if it is already present at the target page. This behavior had to be fixed manually by calling functions like category.py and causes extra code and complexity. Change-Id: I95ba291e78c2f187f4d4a881b39fa44096cca9b6 --- M pywikibot/page.py 1 file changed, 4 insertions(+), 0 deletions(-) Approvals: Merlijn van Deen: Looks good to me, approved jenkins-bot: Verified diff --git a/pywikibot/page.py b/pywikibot/page.py index f0150cd..2cda41e 100644 --- a/pywikibot/page.py +++ b/pywikibot/page.py @@ -1479,6 +1479,10 @@ % (self.title(asLink=True), oldCat.title())) return + # This prevents the bot from adding newCat if it is already present. + if newCat in cats: + newCat = None + if inPlace or self.namespace() == 10: oldtext = self.get(get_redirect=True) newtext = pywikibot.replaceCategoryInPlace(oldtext, oldCat, newCat) -- To view, visit https://gerrit.wikimedia.org/r/104812 To unsubscribe, visit https://gerrit.wikimedia.org/r/settings Gerrit-MessageType: merged Gerrit-Change-Id: I95ba291e78c2f187f4d4a881b39fa44096cca9b6 Gerrit-PatchSet: 1 Gerrit-Project: pywikibot/core Gerrit-Branch: master Gerrit-Owner: Pyfisch <pyfisch(a)gmail.com> Gerrit-Reviewer: Ladsgroup <ladsgroup(a)gmail.com> Gerrit-Reviewer: Merlijn van Deen <valhallasw(a)arctus.nl> Gerrit-Reviewer: Xqt <info(a)gno.de> Gerrit-Reviewer: jenkins-bot

1 0

[Gerrit] [BUGFIX] change Site.lang to Site.code - change (pywikibot/core)
by jenkins-bot (Code Review) 01 Jan '14

01 Jan '14

jenkins-bot has submitted this change and it was merged. Change subject: [BUGFIX] change Site.lang to Site.code ...................................................................... [BUGFIX] change Site.lang to Site.code The i18n files use WMF language codes instead of ISO 639 language codes. This means we also have to use these in our translations. e.g. site code of the Alemannic wikipedia is 'als' whereas the language code is 'gsw'. The i18n files use 'als', while we now try to lookup 'gsw', which does not exist. Change-Id: I3bd186c06ef3b0506411f944f36f1b999fb35dfe --- M pywikibot/i18n.py 1 file changed, 8 insertions(+), 8 deletions(-) Approvals: Merlijn van Deen: Looks good to me, approved jenkins-bot: Verified diff --git a/pywikibot/i18n.py b/pywikibot/i18n.py index bb2b26d..40835c0 100644 --- a/pywikibot/i18n.py +++ b/pywikibot/i18n.py @@ -259,9 +259,9 @@ family = pywikibot.config.family # If a site is given instead of a code, use its language - if hasattr(code, 'lang'): + if hasattr(code, 'code'): family = code.family.name - code = code.lang + code = code.code # Check whether xdict has multiple projects if type(xdict) == dict: @@ -336,8 +336,8 @@ code_needed = False # If a site is given instead of a code, use its language - if hasattr(code, 'lang'): - lang = code.lang + if hasattr(code, 'code'): + lang = code.code # check whether we need the language code back elif type(code) == list: lang = code.pop() @@ -432,8 +432,8 @@ if type(parameters) == dict: param = parameters # If a site is given instead of a code, use its language - if hasattr(code, 'lang'): - code = code.lang + if hasattr(code, 'code'): + code = code.code # we send the code via list and get the alternate code back code = [code] trans = twtranslate(code, twtitle, None) @@ -484,8 +484,8 @@ package = twtitle.split("-")[0] transdict = getattr(__import__("i18n", fromlist=[package]), package).msg # If a site is given instead of a code, use its language - if hasattr(code, 'lang'): - code = code.lang + if hasattr(code, 'code'): + code = code.code return code in transdict and twtitle in transdict[code] -- To view, visit https://gerrit.wikimedia.org/r/104800 To unsubscribe, visit https://gerrit.wikimedia.org/r/settings Gerrit-MessageType: merged Gerrit-Change-Id: I3bd186c06ef3b0506411f944f36f1b999fb35dfe Gerrit-PatchSet: 2 Gerrit-Project: pywikibot/core Gerrit-Branch: master Gerrit-Owner: Xqt <info(a)gno.de> Gerrit-Reviewer: Ladsgroup <ladsgroup(a)gmail.com> Gerrit-Reviewer: Merlijn van Deen <valhallasw(a)arctus.nl> Gerrit-Reviewer: Russell Blau <russblau(a)imapmail.org> Gerrit-Reviewer: Xqt <info(a)gno.de> Gerrit-Reviewer: jenkins-bot

1 0

[Gerrit] category.py: Replaced catlib except in move bot. - change (pywikibot/core)
by jenkins-bot (Code Review) 01 Jan '14

01 Jan '14

jenkins-bot has submitted this change and it was merged. Change subject: category.py: Replaced catlib except in move bot. ...................................................................... category.py: Replaced catlib except in move bot. Removed catlib with other functions. Used easier syntax without Link class to create Category objects. Change-Id: I5360fd4bebeb2d736fb5b86fc6d71a921acbebfb --- M scripts/category.py 1 file changed, 14 insertions(+), 14 deletions(-) Approvals: Merlijn van Deen: Looks good to me, approved jenkins-bot: Verified diff --git a/scripts/category.py b/scripts/category.py index a72d33a..e009f86 100755 --- a/scripts/category.py +++ b/scripts/category.py @@ -500,7 +500,7 @@ self.overwrite = overwrite self.showImages = showImages self.site = pywikibot.getSite() - self.cat = catlib.Category(pywikibot.Link('Category:' + catTitle)) + self.cat = pywikibot.Category(self.site, catTitle) self.list = pywikibot.Page(self.site, listTitle) self.subCats = subCats self.talkPages = talkPages @@ -554,7 +554,7 @@ pagesonly=False): self.editSummary = editSummary self.site = pywikibot.getSite() - self.cat = catlib.Category(pywikibot.Link('Category:' + catTitle)) + self.cat = pywikibot.Category(self.site, catTitle) # get edit summary message self.useSummaryForDeletion = useSummaryForDeletion self.batchMode = batchMode @@ -574,9 +574,9 @@ for article in articles: if not self.titleRegex or re.search(self.titleRegex, article.title()): - catlib.change_category(article, self.cat, None, - comment=self.editSummary, - inPlace=self.inPlace) + article.change_category(self.cat, None, + comment=self.editSummary, + inPlace=self.inPlace) if self.pagesonly: return @@ -587,9 +587,9 @@ % self.cat.title()) else: for subcategory in subcategories: - catlib.change_category(subcategory, self.cat, None, - comment=self.editSummary, - inPlace=self.inPlace) + subcategory.change_category(self.cat, None, + comment=self.editSummary, + inPlace=self.inPlace) # Deletes the category page if self.cat.exists() and self.cat.isEmptyCategory(): if self.useSummaryForDeletion and self.editSummary: @@ -708,8 +708,8 @@ if current_cat == original_cat: pywikibot.output('No changes necessary.') else: - catlib.change_category(article, original_cat, current_cat, - comment=self.editSummary) + article.change_category(original_cat, current_cat, + comment=self.editSummary) flag = True elif choice in ['j', 'J']: newCatTitle = pywikibot.input(u'Please enter the category the ' @@ -721,8 +721,8 @@ flag = True elif choice in ['r', 'R']: # remove the category tag - catlib.change_category(article, original_cat, None, - comment=self.editSummary) + article.change_category(original_cat, None, + comment=self.editSummary) flag = True elif choice == '?': contextLength += 500 @@ -755,7 +755,7 @@ flag = True def run(self): - cat = catlib.Category(pywikibot.Link('Category:' + self.catTitle)) + cat = pywikibot.Category(self.site, self.catTitle) articles = set(cat.articles()) if len(articles) == 0: @@ -847,7 +847,7 @@ * maxDepth - the limit beyond which no subcategories will be listed """ - cat = catlib.Category(pywikibot.Link('Category:' + self.catTitle)) + cat = pywikibot.Category(self.site, catTitle) tree = self.treeview(cat) if self.filename: pywikibot.output(u'Saving results in %s' % self.filename) -- To view, visit https://gerrit.wikimedia.org/r/104738 To unsubscribe, visit https://gerrit.wikimedia.org/r/settings Gerrit-MessageType: merged Gerrit-Change-Id: I5360fd4bebeb2d736fb5b86fc6d71a921acbebfb Gerrit-PatchSet: 2 Gerrit-Project: pywikibot/core Gerrit-Branch: master Gerrit-Owner: Pyfisch <pyfisch(a)gmail.com> Gerrit-Reviewer: Ladsgroup <ladsgroup(a)gmail.com> Gerrit-Reviewer: Merlijn van Deen <valhallasw(a)arctus.nl> Gerrit-Reviewer: Xqt <info(a)gno.de> Gerrit-Reviewer: jenkins-bot

1 0

[Gerrit] weblinkchecker.py : XML and archived URL - change (pywikibot/core)
by jenkins-bot (Code Review) 01 Jan '14

01 Jan '14

jenkins-bot has submitted this change and it was merged. Change subject: weblinkchecker.py : XML and archived URL ...................................................................... weblinkchecker.py : XML and archived URL Same as the follow for compat: * I7ba4f460897316ae1f5cbcca0080f8c3262d9abf : read XML dump * I46c1737aea471691cd90f9ec21e3592ce0c69fde : Internet Archive and Web Citation Bug: 55039 Bug: 58815 Change-Id: I7279da01b0527c974ea53dc1f234a9268dbc8d43 --- A pywikibot/weblib.py M scripts/weblinkchecker.py 2 files changed, 123 insertions(+), 29 deletions(-) Approvals: Merlijn van Deen: Looks good to me, approved jenkins-bot: Verified diff --git a/pywikibot/weblib.py b/pywikibot/weblib.py new file mode 100644 index 0000000..d068925 --- /dev/null +++ b/pywikibot/weblib.py @@ -0,0 +1,52 @@ +# -*- coding: utf-8 -*- +""" +Functions for manipulating external links +or querying third-party sites. + +""" +# +# (C) Pywikibot team, 2013 +# +# Distributed under the terms of the MIT license. +# +__version__ = '$Id$' + +import pywikibot +from pywikibot.comms import http + + +def getInternetArchiveURL(url, timestamp=None): + """Return archived URL by Internet Archive.""" + # See [[:mw:Archived Pages]] and http://archive.org/help/wayback_api.php + import json + query = u'http://archive.org/wayback/available?' + query += u'url=' + query += url + if not timestamp is None: + query += u'&timestamp=' + query += timestamp + jsontext = http.request(uri=query, site=None) + if "closest" in jsontext: + data = json.loads(jsontext) + return data['archived_snapshots']['closest']['url'] + else: + return None + + +def getWebCitationURL(url, timestamp=None): + """Return archived URL by Web Citation.""" + # See http://www.webcitation.org/doc/WebCiteBestPracticesGuide.pdf + import xml.etree.ElementTree as ET + query = u'http://www.webcitation.org/query?' + query += u'returnxml=true' + query += u'&url=' + query += url + if not timestamp is None: + query += u'&date=' + query += timestamp + xmltext = http.request(uri=query, site=None) + if "success" in xmltext: + data = ET.fromstring(xmltext) + return data.find('.//webcite_url').text + else: + return None diff --git a/scripts/weblinkchecker.py b/scripts/weblinkchecker.py index fe138c7..40f283a 100644 --- a/scripts/weblinkchecker.py +++ b/scripts/weblinkchecker.py @@ -36,6 +36,11 @@ -namespace Only process templates in the namespace with the given number or name. This parameter may be used multiple times. +-xml Should be used instead of a simple page fetching method from + pagegenerators.py for performance and load issues + +-xmlstart Page to start with when using an XML dump + -ignore HTTP return codes to ignore. Can be provided several times : -ignore:401 -ignore:500 @@ -112,6 +117,8 @@ from pywikibot import i18n from pywikibot import config from pywikibot import pagegenerators +from pywikibot import xmlreader +from pywikibot import weblib docuReplacements = { '&params;': pagegenerators.parameterHelp @@ -177,29 +184,45 @@ yield m.group('urlb') -class InternetArchiveConsulter: - def __init__(self, url): - self.url = url +class XmlDumpPageGenerator: + """Xml generator that yiels pages containing a web link""" - def getArchiveURL(self): - pywikibot.output(u'Consulting the Internet Archive for %s' % self.url) - archiveURL = 'http://web.archive.org/web/*/%s' % self.url + def __init__(self, xmlFilename, xmlStart, namespaces): + self.xmlStart = xmlStart + self.namespaces = namespaces + self.skipping = bool(xmlStart) + self.site = pywikibot.getSite() + + dump = xmlreader.XmlDump(xmlFilename) + self.parser = dump.parse() + + def __iter__(self): + return self + + def next(self): try: - f = urllib2.urlopen(archiveURL) - except urllib2.HTTPError: - # The Internet Archive yields a 403 error when the site was not - # archived due to robots.txt restrictions. - return - except UnicodeEncodeError: - return - data = f.read() - if f.headers.get('content-encoding', None) == 'gzip': - # Since 2008, the Internet Archive returns pages in GZIPed - # compression format. Unfortunatelly urllib2 doesn't handle - # the decompression for us, so we have to do it ourselves. - data = gzip.GzipFile(fileobj=StringIO.StringIO(data)).read() - if "Search Results for " in data: - return archiveURL + for entry in self.parser: + if self.skipping: + if entry.title != self.xmlStart: + continue + self.skipping = False + page = pywikibot.Page(self.site, entry.title) + if not self.namespaces == []: + if page.namespace() not in self.namespaces: + continue + found = False + for url in weblinksIn(entry.text): + found = True + if found: + return page + except KeyboardInterrupt: + try: + if not self.skipping: + pywikibot.output( + u'To resume, use "-xmlstart:%s" on the command line.' + % entry.title) + except NameError: + pass class LinkChecker(object): @@ -509,10 +532,10 @@ def __init__(self, reportThread): self.reportThread = reportThread - site = pywikibot.getSite() + self.site = pywikibot.getSite() self.semaphore = threading.Semaphore() self.datfilename = pywikibot.config.datafilepath( - 'deadlinks', 'deadlinks-%s-%s.dat' % (site.family.name, site.code)) + 'deadlinks', 'deadlinks-%s-%s.dat' % (self.site.family.name, self.site.code)) # Count the number of logged links, so that we can insert captions # from time to time self.logCount = 0 @@ -528,7 +551,6 @@ """ Logs an error report to a text file in the deadlinks subdirectory. """ - site = pywikibot.getSite() if archiveURL: errorReport = u'* %s ([%s archive])\n' % (url, archiveURL) else: @@ -541,8 +563,8 @@ pywikibot.output(u"** Logging link for deletion.") txtfilename = pywikibot.config.datafilepath('deadlinks', 'results-%s-%s.txt' - % (site.family.name, - site.lang)) + % (self.site.family.name, + self.site.lang)) txtfile = codecs.open(txtfilename, 'a', 'utf-8') self.logCount += 1 if self.logCount % 30 == 0: @@ -573,8 +595,9 @@ # We'll list it in a file so that it can be removed manually. if timeSinceFirstFound > 60 * 60 * 24 * day: # search for archived page - iac = InternetArchiveConsulter(url) - archiveURL = iac.getArchiveURL() + archiveURL = pywikibot.weblib.getInternetArchiveURL(url) + if archiveURL is None: + archiveURL = pywikibot.weblib.getWebCitationURL(url) self.log(url, error, page, archiveURL) else: self.historyDict[url] = [(page.title(), now, error)] @@ -781,6 +804,7 @@ def main(): gen = None singlePageTitle = [] + xmlFilename = None # Which namespaces should be processed? # default to [] which means all namespaces will be processed namespaces = [] @@ -807,6 +831,17 @@ HTTPignore.append(int(arg[8:])) elif arg.startswith('-day:'): day = int(arg[5:]) + elif arg.startswith('-xmlstart'): + if len(arg) == 9: + xmlStart = pywikibot.input( + u'Please enter the dumped article to start with:') + else: + xmlStart = arg[10:] + elif arg.startswith('-xml'): + if len(arg) == 4: + xmlFilename = i18n.input('pywikibot-enter-xml-filename') + else: + xmlFilename = arg[5:] else: if not genFactory.handleArg(arg): singlePageTitle.append(arg) @@ -816,6 +851,13 @@ page = pywikibot.Page(pywikibot.getSite(), singlePageTitle) gen = iter([page]) + if xmlFilename: + try: + xmlStart + except NameError: + xmlStart = None + gen = XmlDumpPageGenerator(xmlFilename, xmlStart, namespaces) + if not gen: gen = genFactory.getCombinedGenerator() if gen: @@ -824,7 +866,7 @@ # fetch at least 240 pages simultaneously from the wiki, but more if # a high thread number is set. pageNumber = max(240, config.max_external_links * 2) - gen = pagegenerators.PreloadingGenerator(gen, pageNumber=pageNumber) + gen = pagegenerators.PreloadingGenerator(gen, step=pageNumber) gen = pagegenerators.RedirectFilterPageGenerator(gen) bot = WeblinkCheckerRobot(gen, HTTPignore) try: -- To view, visit https://gerrit.wikimedia.org/r/104015 To unsubscribe, visit https://gerrit.wikimedia.org/r/settings Gerrit-MessageType: merged Gerrit-Change-Id: I7279da01b0527c974ea53dc1f234a9268dbc8d43 Gerrit-PatchSet: 5 Gerrit-Project: pywikibot/core Gerrit-Branch: master Gerrit-Owner: Beta16 <l.rabinelli(a)gmail.com> Gerrit-Reviewer: Beta16 <l.rabinelli(a)gmail.com> Gerrit-Reviewer: Ladsgroup <ladsgroup(a)gmail.com> Gerrit-Reviewer: Legoktm <legoktm.wikipedia(a)gmail.com> Gerrit-Reviewer: Merlijn van Deen <valhallasw(a)arctus.nl> Gerrit-Reviewer: Xqt <info(a)gno.de> Gerrit-Reviewer: jenkins-bot

1 0

[Gerrit] [BUGFIX] fix TypeError in format string - change (pywikibot/compat)
by jenkins-bot (Code Review) 01 Jan '14

01 Jan '14

jenkins-bot has submitted this change and it was merged. Change subject: [BUGFIX] fix TypeError in format string ...................................................................... [BUGFIX] fix TypeError in format string Change-Id: I73a3b2b0f5481efd9365b7625fbecbab9f58359a --- M wikipedia.py 1 file changed, 1 insertion(+), 1 deletion(-) Approvals: Xqt: Looks good to me, approved jenkins-bot: Verified diff --git a/wikipedia.py b/wikipedia.py index 9946d0d..b1e1a57 100644 --- a/wikipedia.py +++ b/wikipedia.py @@ -10348,7 +10348,7 @@ format_values = dict(num=num, sec=sec) output(u'\03{lightblue}' u'Waiting for %(num)i pages to be put. ' - u'Estimated time remaining: %(sec)s%' + u'Estimated time remaining: %(sec)s' '\03{default}' % format_values) while(_putthread.isAlive()): -- To view, visit https://gerrit.wikimedia.org/r/104799 To unsubscribe, visit https://gerrit.wikimedia.org/r/settings Gerrit-MessageType: merged Gerrit-Change-Id: I73a3b2b0f5481efd9365b7625fbecbab9f58359a Gerrit-PatchSet: 1 Gerrit-Project: pywikibot/compat Gerrit-Branch: master Gerrit-Owner: Xqt <info(a)gno.de> Gerrit-Reviewer: Xqt <info(a)gno.de> Gerrit-Reviewer: jenkins-bot

1 0

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

Pywikibot-commits January 2014