jenkins-bot has submitted this change and it was merged.
Change subject: Report malformed URLs
......................................................................
Report malformed URLs
Don't throw URL exception in the
checker thread if the URL cannot be
parsed.
Introduce NotAnURLError exception
to allow information about malformed URLs
to be passed to the reporting facility.
Change-Id: I93d45db6dec10210ff760154111853f53a042755
---
M weblinkchecker.py
1 file changed, 11 insertions(+), 0 deletions(-)
Approvals:
John Vandenberg: Looks good to me, approved
saper: Looks good to me, but someone else must approve
jenkins-bot: Verified
diff --git a/weblinkchecker.py b/weblinkchecker.py
index e7f2a90..8b54517 100644
--- a/weblinkchecker.py
+++ b/weblinkchecker.py
@@ -218,6 +218,10 @@
pass
+class NotAnURLError(BaseException):
+ pass
+
+
class LinkChecker(object):
"""
Given a HTTP URL, tries to load the page from the Internet and checks if it
@@ -259,6 +263,8 @@
return httplib.HTTPConnection(self.host)
elif self.scheme == 'https':
return httplib.HTTPSConnection(self.host)
+ else:
+ raise NotAnURLError(self.url)
def getEncodingUsedByServer(self):
if not self.serverEncoding:
@@ -489,6 +495,11 @@
linkChecker = LinkChecker(self.url, HTTPignore=self.HTTPignore)
try:
ok, message = linkChecker.check()
+ except NotAnURLError as e:
+ ok, message = False, i18n.twtranslate(pywikibot.getSite(),
+ 'weblinkchecker-badurl_msg',
+ {'URL': self.url})
+
except:
pywikibot.output('Exception while processing URL %s in page %s'
% (self.url, self.page.title()))
--
To view, visit https://gerrit.wikimedia.org/r/175638
To unsubscribe, visit https://gerrit.wikimedia.org/r/settings
Gerrit-MessageType: merged
Gerrit-Change-Id: I93d45db6dec10210ff760154111853f53a042755
Gerrit-PatchSet: 3
Gerrit-Project: pywikibot/compat
Gerrit-Branch: master
Gerrit-Owner: saper <saper(a)saper.info>
Gerrit-Reviewer: John Vandenberg <jayvdb(a)gmail.com>
Gerrit-Reviewer: Ladsgroup <ladsgroup(a)gmail.com>
Gerrit-Reviewer: jenkins-bot <>
Gerrit-Reviewer: saper <saper(a)saper.info>
jenkins-bot has submitted this change and it was merged.
Change subject: api.py: improve QueryGenerator efficiency
......................................................................
api.py: improve QueryGenerator efficiency
Limit management is currently done on the first module, while
the other limited modules are left with default limits.
Limit for such modules is set to max possible value, in order to reduce
the number of requests and make queries faster.
Change-Id: I1c9d96b7bfb121a1b58bd6361dee69691ec5703c
---
M pywikibot/data/api.py
1 file changed, 15 insertions(+), 0 deletions(-)
Approvals:
John Vandenberg: Looks good to me, approved
XZise: Looks good to me, but someone else must approve
jenkins-bot: Verified
diff --git a/pywikibot/data/api.py b/pywikibot/data/api.py
index 01d36d9..98217c3 100644
--- a/pywikibot/data/api.py
+++ b/pywikibot/data/api.py
@@ -1230,21 +1230,36 @@
limited_modules = (
set(self.modules) & self.site._paraminfo.query_modules_with_limits
)
+
if not limited_modules:
self.limited_module = None
elif len(limited_modules) == 1:
self.limited_module = limited_modules.pop()
else:
# Select the first limited module in the request.
+ # Query will continue as needed until limit (if any) for this module
+ # is reached.
for module in self.modules:
if module in self.site._paraminfo.query_modules_with_limits:
self.limited_module = module
+ limited_modules.remove(module)
break
pywikibot.log('%s: multiple requested query modules support limits'
"; using the first such module '%s' of %r"
% (self.__class__.__name__, self.limited_module,
self.modules))
+ # Set limits for all remaining limited modules to max value.
+ # Default values will only cause more requests and make the query
+ # slower.
+ for module in limited_modules:
+ param = self.site._paraminfo.parameter(module, 'limit')
+ prefix = self.site._paraminfo[module]['prefix']
+ if self.site.logged_in() and self.site.has_right('apihighlimits'):
+ self.request[prefix + 'limit'] = int(param['highmax'])
+ else:
+ self.request[prefix + 'limit'] = int(param["max"])
+
self.api_limit = None
if self.limited_module:
--
To view, visit https://gerrit.wikimedia.org/r/173630
To unsubscribe, visit https://gerrit.wikimedia.org/r/settings
Gerrit-MessageType: merged
Gerrit-Change-Id: I1c9d96b7bfb121a1b58bd6361dee69691ec5703c
Gerrit-PatchSet: 2
Gerrit-Project: pywikibot/core
Gerrit-Branch: master
Gerrit-Owner: Mpaa <mpaa.wiki(a)gmail.com>
Gerrit-Reviewer: John Vandenberg <jayvdb(a)gmail.com>
Gerrit-Reviewer: Ladsgroup <ladsgroup(a)gmail.com>
Gerrit-Reviewer: Merlijn van Deen <valhallasw(a)arctus.nl>
Gerrit-Reviewer: Mpaa <mpaa.wiki(a)gmail.com>
Gerrit-Reviewer: XZise <CommodoreFabianus(a)gmx.de>
Gerrit-Reviewer: jenkins-bot <>
Build Update for jayvdb/pywikibot-core
-------------------------------------
Build: #104
Status: Errored
Duration: 46 minutes and 50 seconds
Commit: c7e63a1 (review/mpaa/api-buffering-11)
Author: Mpaa
Message: api.py: buffer data in QueryGenerator
QueryGenerator yields all items in resultdata, but there is no
guarantee that all data for an item arrived in one response.
Some data will appear in the following response, etc.
(see commit 3d2ca97aaac17a28e177fa89fbdd3364e7f53c0c)
This patch buffers results until all data are for an item are fetched.
This is based on the fact that the API, when query-continuing, keeps on
repeating the same pages until all requested data are fetched.
Change-Id: Iccb3a96b0248fdab0650edfda23d05ecec0dadbd
View the changeset: https://github.com/jayvdb/pywikibot-core/commit/c7e63a1601ed
View the full build log and details: https://travis-ci.org/jayvdb/pywikibot-core/builds/42273762
--
You can configure recipients for build notifications in your .travis.yml file. See http://docs.travis-ci.com/user/notifications
Build Update for jayvdb/pywikibot-core
-------------------------------------
Build: #103
Status: Still Failing
Duration: 1 hour, 1 minute, and 3 seconds
Commit: b8ac8d3 (master)
Author: Fabian Neundorf
Message: [FIX] Replace: Switch 'use_regex' and 'flags'
In Replacment.compile both parameters have been used the other way
around compared to the rest of the script, which also meant that one
call could produce erroneous results.
Change-Id: Ifa133e3685baa4baffd9f5437f4a720832f49c4f
View the changeset: https://github.com/jayvdb/pywikibot-core/compare/7d1d1c761b69...b8ac8d3e7561
View the full build log and details: https://travis-ci.org/jayvdb/pywikibot-core/builds/42264995
--
You can configure recipients for build notifications in your .travis.yml file. See http://docs.travis-ci.com/user/notifications
Build Update for jayvdb/pywikibot-core
-------------------------------------
Build: #102
Status: Failed
Duration: 48 minutes and 23 seconds
Commit: 826f592 (review/mpaa/api-buffering-10)
Author: Mpaa
Message: api.py: buffer data in QueryGenerator
QueryGenerator yields all items in resultdata, but there is no
guarantee that all data for an item arrived in one response.
Some data will appear in the following response, etc.
(see commit 3d2ca97aaac17a28e177fa89fbdd3364e7f53c0c)
This patch buffers results until all data are for an item are fetched.
This is based on the fact that the API, when query-continuing, keeps on
repeating the same pages until all requested data are fetched.
Change-Id: Iccb3a96b0248fdab0650edfda23d05ecec0dadbd
View the changeset: https://github.com/jayvdb/pywikibot-core/commit/826f592b1ab4
View the full build log and details: https://travis-ci.org/jayvdb/pywikibot-core/builds/42264897
--
You can configure recipients for build notifications in your .travis.yml file. See http://docs.travis-ci.com/user/notifications
Build Update for wikimedia/pywikibot-core
-------------------------------------
Build: #1732
Status: Fixed
Duration: 52 minutes and 59 seconds
Commit: b8ac8d3 (master)
Author: Fabian Neundorf
Message: [FIX] Replace: Switch 'use_regex' and 'flags'
In Replacment.compile both parameters have been used the other way
around compared to the rest of the script, which also meant that one
call could produce erroneous results.
Change-Id: Ifa133e3685baa4baffd9f5437f4a720832f49c4f
View the changeset: https://github.com/wikimedia/pywikibot-core/compare/5c9ec00ee45f...b8ac8d3e…
View the full build log and details: https://travis-ci.org/wikimedia/pywikibot-core/builds/42262827
--
You can configure recipients for build notifications in your .travis.yml file. See http://docs.travis-ci.com/user/notifications
jenkins-bot has submitted this change and it was merged.
Change subject: [FIX] Replace: Switch 'use_regex' and 'flags'
......................................................................
[FIX] Replace: Switch 'use_regex' and 'flags'
In Replacment.compile both parameters have been used the other way
around compared to the rest of the script, which also meant that one
call could produce erroneous results.
Change-Id: Ifa133e3685baa4baffd9f5437f4a720832f49c4f
---
M scripts/replace.py
1 file changed, 1 insertion(+), 1 deletion(-)
Approvals:
John Vandenberg: Looks good to me, approved
jenkins-bot: Verified
diff --git a/scripts/replace.py b/scripts/replace.py
index 014e1dd..4de4424 100755
--- a/scripts/replace.py
+++ b/scripts/replace.py
@@ -178,7 +178,7 @@
self.edit_summary = edit_summary
self.default_summary = default_summary
- def compile(self, flags, use_regex):
+ def compile(self, use_regex, flags):
# Set the regular aexpression flags
flags |= re.UNICODE
--
To view, visit https://gerrit.wikimedia.org/r/176059
To unsubscribe, visit https://gerrit.wikimedia.org/r/settings
Gerrit-MessageType: merged
Gerrit-Change-Id: Ifa133e3685baa4baffd9f5437f4a720832f49c4f
Gerrit-PatchSet: 1
Gerrit-Project: pywikibot/core
Gerrit-Branch: master
Gerrit-Owner: XZise <CommodoreFabianus(a)gmx.de>
Gerrit-Reviewer: John Vandenberg <jayvdb(a)gmail.com>
Gerrit-Reviewer: Ladsgroup <ladsgroup(a)gmail.com>
Gerrit-Reviewer: Merlijn van Deen <valhallasw(a)arctus.nl>
Gerrit-Reviewer: jenkins-bot <>
jenkins-bot has submitted this change and it was merged.
Change subject: TestDryPageGenerator re-used QueryGenerator
......................................................................
TestDryPageGenerator re-used QueryGenerator
api_tests.TestDryPageGenerator pushed emulated query response data
directly into QueryGenerator as a dict, which also allowed the method
test_limits to reuse the generator. This breaks the definition of a
'generator', and relied on QueryGenerator not fetching data if 'data'
already existed.
Fixed the tests so that the insertion of the mock data doesnt depend
on those quirks of QueryGenerator which should be removed.
Change-Id: Ic4e92dc6eefe578fa12cab2307e45e7e85f96d40
---
M tests/api_tests.py
M tests/aspects.py
2 files changed, 35 insertions(+), 26 deletions(-)
Approvals:
XZise: Looks good to me, approved
jenkins-bot: Verified
diff --git a/tests/api_tests.py b/tests/api_tests.py
index 7c6c273..9746229 100644
--- a/tests/api_tests.py
+++ b/tests/api_tests.py
@@ -8,7 +8,8 @@
__version__ = '$Id$'
import datetime
-import pywikibot
+import types
+
import pywikibot.data.api as api
from pywikibot.tools import MediaWikiVersion
@@ -248,14 +249,21 @@
dry = True
+ # api.py sorts 'pages' using the string key, which is not a
+ # numeric comparison.
+ titles = ("Broadcaster (definition)", "Wiktionary", "Broadcaster.com",
+ "Wikipedia:Disambiguation")
+
def setUp(self):
super(TestDryPageGenerator, self).setUp()
mysite = self.get_site()
self.gen = api.PageGenerator(site=mysite,
generator="links",
titles="User:R'n'B")
- # following test data is copied from an actual api.php response
- self.gen.data = {
+ # following test data is copied from an actual api.php response,
+ # but that query no longer matches this dataset.
+ # http://en.wikipedia.org/w/api.php?action=query&generator=links&titles=User:…
+ self.gen.request.submit = types.MethodType(lambda self: {
"query": {"pages": {"296589": {"pageid": 296589,
"ns": 0,
"title": "Broadcaster.com"
@@ -274,58 +282,55 @@
}
}
}
- }
+ }, self.gen.request)
# On a dry site, the namespace objects only have canonical names.
# Add custom_name for this site namespace, to match the live site.
if 'Wikipedia' not in self.site._namespaces:
self.site._namespaces[4].custom_name = 'Wikipedia'
- def testGeneratorResults(self):
+ def test_results(self):
"""Test that PageGenerator yields pages with expected attributes."""
- titles = ["Broadcaster.com", "Broadcaster (definition)",
- "Wiktionary", "Wikipedia:Disambiguation"]
- mysite = self.get_site()
- results = [p for p in self.gen]
- self.assertEqual(len(results), 4)
- for page in results:
- self.assertEqual(type(page), pywikibot.Page)
- self.assertEqual(page.site, mysite)
- self.assertIn(page.title(), titles)
+ self.assertPagelistTitles(self.gen, self.titles)
def test_initial_limit(self):
self.assertEqual(self.gen.limit, None) # limit is initally None
- def test_limit_as_number(self):
+ def test_set_limit_as_number(self):
for i in range(-2, 4):
self.gen.set_maximum_items(i)
self.assertEqual(self.gen.limit, i)
- def test_limit_as_string(self):
+ def test_set_limit_as_string(self):
for i in range(-2, 4):
self.gen.set_maximum_items(str(i))
self.assertEqual(self.gen.limit, i)
- def test_wrong_limit_setting(self):
+ def test_set_limit_not_number(self):
with self.assertRaisesRegex(
ValueError,
"invalid literal for int\(\) with base 10: 'test'"):
self.gen.set_maximum_items('test')
- def test_limits(self):
+ def test_limit_equal_total(self):
"""Test that PageGenerator yields the requested amount of pages."""
- for i in range(4, 0, -1):
- self.gen.set_maximum_items(i) # set total amount of pages
- results = [p for p in self.gen]
- self.assertEqual(len(results), i)
+ self.gen.set_maximum_items(4)
+ self.assertPagelistTitles(self.gen, self.titles)
+ def test_limit_one(self):
+ """Test that PageGenerator yields the requested amount of pages."""
+ self.gen.set_maximum_items(1)
+ self.assertPagelistTitles(self.gen, self.titles[0:1])
+
+ def test_limit_zero(self):
+ """Test that a limit of zero is the same as limit None."""
self.gen.set_maximum_items(0)
- results = [p for p in self.gen]
- self.assertEqual(len(results), 4) # total=0 but 4 expected (really?)
+ self.assertPagelistTitles(self.gen, self.titles)
+ def test_limit_omit(self):
+ """Test that limit omitted is the same as limit None."""
self.gen.set_maximum_items(-1)
- results = [p for p in self.gen]
- self.assertEqual(len(results), 4) # total=-1 but 4 expected
+ self.assertPagelistTitles(self.gen, self.titles)
class TestPropertyGenerator(TestCase):
diff --git a/tests/aspects.py b/tests/aspects.py
index 4aad55e..4eadc78 100644
--- a/tests/aspects.py
+++ b/tests/aspects.py
@@ -169,6 +169,10 @@
working_set = collections.deque(titles)
for page in gen:
+ self.assertIsInstance(page, pywikibot.Page)
+ if site:
+ self.assertEqual(page.site, site)
+
title = page.title()
self.assertIn(title, titles)
if is_tuple:
--
To view, visit https://gerrit.wikimedia.org/r/175954
To unsubscribe, visit https://gerrit.wikimedia.org/r/settings
Gerrit-MessageType: merged
Gerrit-Change-Id: Ic4e92dc6eefe578fa12cab2307e45e7e85f96d40
Gerrit-PatchSet: 4
Gerrit-Project: pywikibot/core
Gerrit-Branch: master
Gerrit-Owner: John Vandenberg <jayvdb(a)gmail.com>
Gerrit-Reviewer: John Vandenberg <jayvdb(a)gmail.com>
Gerrit-Reviewer: Ladsgroup <ladsgroup(a)gmail.com>
Gerrit-Reviewer: Merlijn van Deen <valhallasw(a)arctus.nl>
Gerrit-Reviewer: XZise <CommodoreFabianus(a)gmx.de>
Gerrit-Reviewer: jenkins-bot <>
jenkins-bot has submitted this change and it was merged.
Change subject: mediawiki_messages reuses QueryGenerator
......................................................................
mediawiki_messages reuses QueryGenerator
When mediawiki_messages is called with a specific set of keys requested,
it inefficiently and incorrectly iterates the generator for each key.
This breaks a fundamental assumption of a 'generator' - that it can not
be re-used. It is assuming that QueryGenerator was able to fetch all
the requested messages without continuation, as only the last fetch
will exist in QueryGenerator.data after the first iteration is complete.
Change-Id: Ia50ec73d33caa548227dfcbd52119117720787dc
---
M pywikibot/site.py
1 file changed, 9 insertions(+), 12 deletions(-)
Approvals:
XZise: Looks good to me, approved
jenkins-bot: Verified
diff --git a/pywikibot/site.py b/pywikibot/site.py
index d83a098..bca8612 100644
--- a/pywikibot/site.py
+++ b/pywikibot/site.py
@@ -1831,22 +1831,19 @@
amlang=self.lang,
)
+ for msg in msg_query:
+ if 'missing' not in msg:
+ self._msgcache[msg['name']] = msg['*']
+
# Return all messages
if keys == u'*' or keys == [u'*']:
- for msg in msg_query:
- if 'missing' not in msg:
- self._msgcache[msg['name']] = msg['*']
return self._msgcache
- # Return only given keys
else:
- for _key in keys:
- for msg in msg_query:
- if msg['name'] == _key and 'missing' not in msg:
- self._msgcache[_key] = msg['*']
- break
- else:
- raise KeyError("Site %(self)s has no message '%(_key)s'"
- % locals())
+ # Check requested keys
+ for key in keys:
+ if key not in self._msgcache:
+ raise KeyError("Site %s has no message '%s'"
+ % (self, key))
return dict((_key, self._msgcache[_key]) for _key in keys)
--
To view, visit https://gerrit.wikimedia.org/r/176007
To unsubscribe, visit https://gerrit.wikimedia.org/r/settings
Gerrit-MessageType: merged
Gerrit-Change-Id: Ia50ec73d33caa548227dfcbd52119117720787dc
Gerrit-PatchSet: 1
Gerrit-Project: pywikibot/core
Gerrit-Branch: master
Gerrit-Owner: John Vandenberg <jayvdb(a)gmail.com>
Gerrit-Reviewer: Ladsgroup <ladsgroup(a)gmail.com>
Gerrit-Reviewer: Merlijn van Deen <valhallasw(a)arctus.nl>
Gerrit-Reviewer: XZise <CommodoreFabianus(a)gmx.de>
Gerrit-Reviewer: jenkins-bot <>