jenkins-bot has submitted this change and it was merged. ( https://gerrit.wikimedia.org/r/439643 )
Change subject: Fix handling of API continuation in PropertyGenerator
......................................................................
Fix handling of API continuation in PropertyGenerator
test_preload_langlinks_count was marked as allowed_failure. It relies
on site.preloadpages, which relies on PropertyGenerator, which had
a bug in it.
The issue was that the default implementation of QueryGenerator
returns incomplete results: Sometimes the API splits long pageprops
into several API requests. But QueryGenerator returns the result as
soon it gets it, without waiting for it to complete in the next
request. In other words, PropertyGenerator used to yield more than
expected, but each yield had less than expected fields. Therefore
site.preloadpages returned more pages than expected, but the
preloaded data on each page could be incomplete. There are many other
function that rely on PropertyGenerator and might be effected
by the bug, but I have not performed any test to confirm that.
I don't know if other generators also have a similar bug or not. This
patch only tries to fix PropertyGenerator. If such issues are found
in other generators as well, then it might make sense to move some of
the newly introduced PropertyGenerator methods into QueryGenerator.
api.py:
- QueryGenerator:
Do a refactoring to move a small part of `__iter__` into a
separate function (_extract_results). We can override and extend
this function later in the PropertyGenerator subclass.
- PropertyGenerator:
Override `__iter__` and `_extract_results` in a way to prevent
yielding premature results. A result is considered fully fetched
if it is not found in the next request results.
site_tests.py:
- Remove the allowed_failure decorator and reenable the test.
- Silence the output.
Bug: T196876
Change-Id: I4ed74ab3af6d320242beaef1a55f20dc489fe29b
---
M pywikibot/data/api.py
M tests/site_tests.py
2 files changed, 83 insertions(+), 36 deletions(-)
Approvals:
Xqt: Looks good to me, approved
jenkins-bot: Verified
diff --git a/pywikibot/data/api.py b/pywikibot/data/api.py
index 4e677cb..de17ab5 100644
--- a/pywikibot/data/api.py
+++ b/pywikibot/data/api.py
@@ -2750,7 +2750,7 @@
value = str(value)
self.request[key] = value
- def _handle_query_limit(self, prev_limit, new_limit, count, had_data):
+ def _handle_query_limit(self, prev_limit, new_limit, had_data):
"""Handle query limit."""
if self.query_limit is None:
return prev_limit, new_limit
@@ -2761,7 +2761,7 @@
elif self.limit > 0:
if had_data:
# self.resultkey in data in last request.submit()
- new_limit = min(self.query_limit, self.limit - count)
+ new_limit = min(self.query_limit, self.limit - self._count)
else:
# only "(query-)continue" returned. See Bug T74209.
# increase new_limit to advance faster until new
@@ -2791,7 +2791,7 @@
api=self.api_limit,
limit=self.limit,
new=new_limit,
- count=count,
+ count=self._count,
prefix=self.prefix,
value=self.request[self.prefix + 'limit']),
_logger)
@@ -2818,6 +2818,30 @@
_logger)
return resultdata
+ def _extract_results(self, resultdata):
+ """Extract results from resultdata."""
+ for item in resultdata:
+ result = self.result(item)
+ if self._namespaces:
+ if not self._check_result_namespace(result):
+ continue
+ yield result
+ if isinstance(item, dict) \
+ and set(self.continuekey) & set(item.keys()):
+ # if we need to count elements contained in items in
+ # self.data["query"]["pages"], we want to count
+ # item[self.continuekey] (e.g. 'revisions') and not
+ # self.resultkey (i.e. 'pages')
+ for key in set(self.continuekey) & set(item.keys()):
+ self._count += len(item[key])
+ # otherwise we proceed as usual
+ else:
+ self._count += 1
+ # note: self.limit could be -1
+ if self.limit and 0 < self.limit <= self._count:
+ raise RuntimeError(
+ 'QueryGenerator._extract_results reached the limit')
+
def __iter__(self):
"""Submit request and iterate the response based on self.resultkey.
@@ -2827,10 +2851,10 @@
previous_result_had_data = True
prev_limit = new_limit = None
- count = 0
+ self._count = 0
while True:
prev_limit, new_limit = self._handle_query_limit(
- prev_limit, new_limit, count, previous_result_had_data)
+ prev_limit, new_limit, previous_result_had_data)
if not hasattr(self, "data"):
self.data = self.request.submit()
if not self.data or not isinstance(self.data, dict):
@@ -2847,26 +2871,11 @@
for item in self.data['query']['normalized']}
else:
self.normalized = {}
- for item in resultdata:
- result = self.result(item)
- if self._namespaces:
- if not self._check_result_namespace(result):
- continue
- yield result
- if isinstance(item, dict) \
- and set(self.continuekey) & set(item.keys()):
- # if we need to count elements contained in items in
- # self.data["query"]["pages"], we want to count
- # item[self.continuekey] (e.g. 'revisions') and not
- # self.resultkey (i.e. 'pages')
- for key in set(self.continuekey) & set(item.keys()):
- count += len(item[key])
- # otherwise we proceed as usual
- else:
- count += 1
- # note: self.limit could be -1
- if self.limit and self.limit > 0 and count >= self.limit:
- return
+ try:
+ for result in self._extract_results(resultdata):
+ yield result
+ except RuntimeError:
+ return
# self.resultkey in data in last request.submit()
previous_result_had_data = True
else:
@@ -3020,6 +3029,46 @@
"""The requested property names."""
return self._props
+ def __iter__(self):
+ """Yield results."""
+ self._previous_dicts = {}
+ for result in super(PropertyGenerator, self).__iter__():
+ yield result
+ for result in self._previous_dicts.values():
+ yield result
+
+ def _extract_results(self, resultdata):
+ """Yield completed page_data of consecutive API requests."""
+ for d in self._fully_retrieved_data_dicts(resultdata):
+ yield d
+ for data_dict in super(PropertyGenerator, self)._extract_results(
+ resultdata
+ ):
+ d = self._previous_dicts.setdefault(data_dict['title'], data_dict)
+ if d is not data_dict:
+ self._update_old_result_dict(d, data_dict)
+
+ def _fully_retrieved_data_dicts(self, resultdata):
+ """Yield items of self._previous_dicts that are not in resultdata."""
+ resuldata_titles = {d['title'] for d in resultdata}
+ for prev_title, prev_dict in self._previous_dicts.copy().items():
+ if prev_title not in resuldata_titles:
+ yield prev_dict
+ del self._previous_dicts[prev_title]
+
+ @staticmethod
+ def _update_old_result_dict(old_dict, new_dict):
+ """Update old result dict with new_dict."""
+ for k, v in new_dict.items():
+ if k not in old_dict:
+ old_dict[k] = v
+ continue
+ if isinstance(v, list):
+ old_dict[k].extend(v)
+ continue
+ assert isinstance(v, (UnicodeType, int)), (
+ 'continued API result had an unexpected type: %s' % type(v))
+
class ListGenerator(QueryGenerator):
diff --git a/tests/site_tests.py b/tests/site_tests.py
index cfaa47e..64db605 100644
--- a/tests/site_tests.py
+++ b/tests/site_tests.py
@@ -31,7 +31,7 @@
UnicodeType as unicode,
)
-from tests import unittest_print
+from tests import patch, unittest_print
from tests.aspects import (
unittest, TestCase, DeprecationTestCase,
TestCaseBase,
@@ -3060,16 +3060,14 @@
if count >= 6:
break
- @allowed_failure
- def test_preload_langlinks_count(self):
+ @patch.object(pywikibot, 'output')
+ def test_preload_langlinks_count(self, output_mock):
"""Test preloading continuation works."""
- # FIXME: test fails
mysite = self.get_site()
mainpage = self.get_mainpage()
- count = 0
- links = mysite.pagelinks(mainpage, total=20)
- pages = list(mysite.preloadpages(links, groupsize=5,
- langlinks=True))
+ links = list(mysite.pagelinks(mainpage, total=20))
+ pages = list(mysite.preloadpages(links, groupsize=5, langlinks=True))
+ self.assertEqual(len(links), len(pages))
for page in pages:
self.assertIsInstance(page, pywikibot.Page)
self.assertIsInstance(page.exists(), bool)
@@ -3077,9 +3075,9 @@
self.assertEqual(len(page._revisions), 1)
self.assertIsNotNone(page._revisions[page._revid].text)
self.assertFalse(hasattr(page, '_pageprops'))
- count += 1
-
- self.assertEqual(len(list(links)), count)
+ if pages:
+ self.assertRegex(
+ output_mock.call_args[0][0], r'Retrieving \d pages from ')
def _test_preload_langlinks_long(self):
"""Test preloading continuation works."""
--
To view, visit https://gerrit.wikimedia.org/r/439643
To unsubscribe, or for help writing mail filters, visit https://gerrit.wikimedia.org/r/settings
Gerrit-Project: pywikibot/core
Gerrit-Branch: master
Gerrit-MessageType: merged
Gerrit-Change-Id: I4ed74ab3af6d320242beaef1a55f20dc489fe29b
Gerrit-Change-Number: 439643
Gerrit-PatchSet: 13
Gerrit-Owner: Dalba <dalba.wiki(a)gmail.com>
Gerrit-Reviewer: Dalba <dalba.wiki(a)gmail.com>
Gerrit-Reviewer: John Vandenberg <jayvdb(a)gmail.com>
Gerrit-Reviewer: Xqt <info(a)gno.de>
Gerrit-Reviewer: Zoranzoki21 <zorandori4444(a)gmail.com>
Gerrit-Reviewer: jenkins-bot
jenkins-bot has submitted this change and it was merged. ( https://gerrit.wikimedia.org/r/439622 )
Change subject: [tox] Ignore presence of implicit parameters errors
......................................................................
[tox] Ignore presence of implicit parameters errors
With Python 2.7+ implicit parameters are allowed. After dropping support of
Python 2.6 this error code should be ignored by flake8-string-format .
Change-Id: I208b75d1711e2244aab29e90bfad5860387d6af0
---
M tox.ini
1 file changed, 5 insertions(+), 6 deletions(-)
Approvals:
Dalba: Looks good to me, approved
jenkins-bot: Verified
diff --git a/tox.ini b/tox.ini
index 6517474..bef79b1 100644
--- a/tox.ini
+++ b/tox.ini
@@ -136,6 +136,8 @@
# H301: Do not import more than one module per line; Pywikibot uses H306 (Alphabetically order your imports by the full module path)
# W503: line break before binary operator; against current PEP 8 recommendation
# P101: format string does contain unindexed parameters
+# P102: docstring does contain unindexed parameters
+# P103: other string does contain unindexed parameters
# The following are to be fixed
# D102: Missing docstring in public method
@@ -143,14 +145,13 @@
# N802: function name should be lowercase
# N803: argument name should be lowercase
# N806: variable in function should be lowercase
-# P102,P103: string does contain unindexed parameters; see I36355923
# Errors occured after upgrade to pydocstyle 2.0.0 (T164142)
# D401: First line should be in imperative mood; try rephrasing
# D413: Missing blank line after last section
# D412: No blank lines allowed between a section header and its content
-ignore = D105,D211,FI10,FI12,FI13,FI15,FI16,FI17,FI5,H101,H236,H301,H404,H405,H903,N802,D401,D413,D103,D412,P101,W503
+ignore = D105,D211,FI10,FI12,FI13,FI15,FI16,FI17,FI5,H101,H236,H301,H404,H405,H903,N802,D401,D413,D103,D412,P101,P102,P103,W503
exclude = .tox,.git,./*.egg,ez_setup.py,build,externals,user-config.py,./scripts/i18n/*,scripts/userscripts/*
min-version = 2.7
max_line_length = 100
@@ -160,7 +161,7 @@
generate_family_file.py : T001
pwb.py : T001
# pydocstyle cannot handle multiple __all__ variables
- pywikibot/__init__.py : P103, D999, N806
+ pywikibot/__init__.py : D999, N806
pywikibot/comms/http.py : T001
pywikibot/compat/catlib.py : N803
pywikibot/config2.py : N806
@@ -213,7 +214,7 @@
scripts/imagecopy_self.py : N801, N803, N806
scripts/imagerecat.py : N803, N806
scripts/imagetransfer.py : E241, N803, N806
- scripts/interwiki.py : P102, N803, N806
+ scripts/interwiki.py : N803, N806
scripts/isbn.py : N803, N806
scripts/maintenance/* : T001
scripts/makecat.py : D103
@@ -237,8 +238,6 @@
tests/* : N813
tests/page_tests.py : E241
tests/pwb/* : T001
- # invalidly detected as {} format string:
- tests/textlib_tests.py : P103
tests/ui_tests.py : D102, D103, N801
[pep8]
--
To view, visit https://gerrit.wikimedia.org/r/439622
To unsubscribe, or for help writing mail filters, visit https://gerrit.wikimedia.org/r/settings
Gerrit-Project: pywikibot/core
Gerrit-Branch: master
Gerrit-MessageType: merged
Gerrit-Change-Id: I208b75d1711e2244aab29e90bfad5860387d6af0
Gerrit-Change-Number: 439622
Gerrit-PatchSet: 1
Gerrit-Owner: Xqt <info(a)gno.de>
Gerrit-Reviewer: Dalba <dalba.wiki(a)gmail.com>
Gerrit-Reviewer: John Vandenberg <jayvdb(a)gmail.com>
Gerrit-Reviewer: Xqt <info(a)gno.de>
Gerrit-Reviewer: Zoranzoki21 <zorandori4444(a)gmail.com>
Gerrit-Reviewer: jenkins-bot
jenkins-bot has submitted this change and it was merged. ( https://gerrit.wikimedia.org/r/416370 )
Change subject: mysql.py: add PyMySql as pure-Python MySQL client library
......................................................................
mysql.py: add PyMySql as pure-Python MySQL client library
Add PyMySql as pure-Python MySQL client library.
PyMySql will be the preferred library to be imported.
Fallback libraries:
- MySqlDb: as today
- oursql: discontinued
Bug: T142021
Bug: T89976
Change-Id: I28725fbe6ea81900c06ca3ccbf02cdc8704fd66a
---
M pywikibot/data/mysql.py
M requirements.txt
M tox.ini
3 files changed, 56 insertions(+), 36 deletions(-)
Approvals:
Zhuyifei1999: Looks good to me, but someone else must approve
Dvorapa: Looks good to me, approved
jenkins-bot: Verified
diff --git a/pywikibot/data/mysql.py b/pywikibot/data/mysql.py
index 02dfd7b..c3757ce 100644
--- a/pywikibot/data/mysql.py
+++ b/pywikibot/data/mysql.py
@@ -1,27 +1,32 @@
# -*- coding: utf-8 -*-
"""Miscellaneous helper functions for mysql queries."""
#
-# (C) Pywikibot team, 2016-2017
+# (C) Pywikibot team, 2016-2018
#
# Distributed under the terms of the MIT license.
#
from __future__ import absolute_import, unicode_literals
-# Requires oursql <https://pythonhosted.org/oursql/> or
-# MySQLdb <https://sourceforge.net/projects/mysql-python/>
-try:
- import oursql as mysqldb
-except ImportError:
- import MySQLdb as mysqldb
-
import pywikibot
+# Requires PyMySql as first choice or
+# MySQLdb <https://sourceforge.net/projects/mysql-python/>
+try:
+ import pymysql as mysqldb
+except ImportError:
+ import MySQLdb as mysqldb
+ pywikibot.warning('PyMySql not found.')
+ pywikibot.warning('MySQLdb is deprecated. Use PyMySql instead.')
+else:
+ mysqldb.install_as_MySQLdb()
+
from pywikibot import config2 as config
+from pywikibot.tools import deprecated_args, UnicodeType
-def mysql_query(query, params=(), dbname=None, encoding='utf-8', verbose=None):
- """
- Yield rows from a MySQL query.
+@deprecated_args(encoding=None)
+def mysql_query(query, params=None, dbname=None, verbose=None):
+ """Yield rows from a MySQL query.
An example query that yields all ns0 pages might look like::
@@ -31,44 +36,59 @@
FROM page
WHERE page_namespace = 0;
+ From MediaWiki 1.5, all projects use Unicode (UTF-8) character encoding.
+ Cursor charset is utf8.
+
@param query: MySQL query to execute
- @type query: str
+ @type query: str (unicode in py2)
@param params: input parametes for the query, if needed
- @type params: tuple
+ if list or tuple, %s shall be used as placeholder in the query string.
+ if a dict, %(key)s shall be used as placeholder in the query string.
+ @type params: tuple, list or dict of str (unicode in py2)
@param dbname: db name
@type dbname: str
- @param encoding: encoding used by the database
- @type encoding: str
@param verbose: if True, print query to be executed;
if None, config.verbose_output will be used.
@type verbose: None or bool
@return: generator which yield tuples
"""
+ CHARSET = 'utf8'
+
+ # These are specified in config2.py or user-config.py
if verbose is None:
verbose = config.verbose_output
if config.db_connect_file is None:
- conn = mysqldb.connect(config.db_hostname,
- db=config.db_name_format.format(dbname),
- user=config.db_username,
- passwd=config.db_password,
- port=config.db_port)
+ credentials = {'user': config.db_username,
+ 'passwd': config.db_password}
else:
- conn = mysqldb.connect(config.db_hostname,
- db=config.db_name_format.format(dbname),
- read_default_file=config.db_connect_file,
- port=config.db_port)
+ credentials = {'read_default_file': config.db_connect_file}
+
+ conn = mysqldb.connect(config.db_hostname,
+ db=config.db_name_format.format(dbname),
+ port=config.db_port,
+ charset=CHARSET,
+ **credentials)
cursor = conn.cursor()
- if verbose:
- pywikibot.output('Executing query:\n%s' % query)
- query = query.encode(encoding)
- params = tuple(p.encode(encoding) for p in params)
- if params:
- cursor.execute(query, params)
- else:
- cursor.execute(query)
+ if verbose:
+ try:
+ _query = cursor.mogrify(query, params)
+ except AttributeError: # if MySQLdb is used.
+ # Not exactly the same encoding handling as cursor.execute()
+ # Here it is just for the sake of verbose.
+ _query = query
+ if params is not None:
+ _query = query.format(params)
+
+ if not isinstance(_query, UnicodeType):
+ _query = UnicodeType(_query, encoding='utf-8')
+ _query = _query.strip()
+ _query = '\n'.join(' {0}'.format(l) for l in _query.splitlines())
+ pywikibot.output('Executing query:\n%s' % _query)
+
+ cursor.execute(query, params)
for row in cursor:
yield row
diff --git a/requirements.txt b/requirements.txt
index e44dee9..75c17c1 100644
--- a/requirements.txt
+++ b/requirements.txt
@@ -66,10 +66,9 @@
# textlib.py and patrol.py
mwparserfromhell>=0.3.3
-# The mysql generator in pagegenerators depends on either oursql or MySQLdb
-# pywikibot prefers oursql. Both are Python 2 only; T89976.
-oursql ; python_version < '3'
-mysqlclient ; python_version >= '3'
+# The mysql generator in pagegenerators depends on either PyMySQL or MySQLdb
+# pywikibot prefers PyMySQL over MySQLdb (Python 2 only)
+PyMySQL
# scripts/script_wui.py depends on Lua, which is not available using pip
# but can be obtained from: https://github.com/bastibe/lunatic-python
diff --git a/tox.ini b/tox.ini
index ef0b533..6517474 100644
--- a/tox.ini
+++ b/tox.ini
@@ -166,6 +166,7 @@
pywikibot/config2.py : N806
pywikibot/cosmetic_changes.py : N803, N806
pywikibot/data/api.py : N803, N806
+ pywikibot/data/mysql.py : N806
pywikibot/date.py : E241, N803, N806
pywikibot/diff.py : N806
pywikibot/editor.py : N803, N806
--
To view, visit https://gerrit.wikimedia.org/r/416370
To unsubscribe, or for help writing mail filters, visit https://gerrit.wikimedia.org/r/settings
Gerrit-Project: pywikibot/core
Gerrit-Branch: master
Gerrit-MessageType: merged
Gerrit-Change-Id: I28725fbe6ea81900c06ca3ccbf02cdc8704fd66a
Gerrit-Change-Number: 416370
Gerrit-PatchSet: 15
Gerrit-Owner: Mpaa <mpaa.wiki(a)gmail.com>
Gerrit-Reviewer: Dvorapa <dvorapa(a)seznam.cz>
Gerrit-Reviewer: Framawiki <framawiki(a)tools.wmflabs.org>
Gerrit-Reviewer: John Vandenberg <jayvdb(a)gmail.com>
Gerrit-Reviewer: Mpaa <mpaa.wiki(a)gmail.com>
Gerrit-Reviewer: Xqt <info(a)gno.de>
Gerrit-Reviewer: Zhuyifei1999 <zhuyifei1999(a)gmail.com>
Gerrit-Reviewer: jenkins-bot
jenkins-bot has submitted this change and it was merged. ( https://gerrit.wikimedia.org/r/439526 )
Change subject: [bugfix] Keep category.py compatible for Python 2 and Python 3
......................................................................
[bugfix] Keep category.py compatible for Python 2 and Python 3
There is a different behaviour between py2 and py3:
5/2 gives 2 in py2 but 2.5 in py3
math.ceil gives a floating number in py2 and int in py 3
Bug: T196865
Change-Id: Ic688ce719ebf686ff9496a8b74377e0b04f27e32
---
M scripts/category.py
1 file changed, 1 insertion(+), 1 deletion(-)
Approvals:
Dvorapa: Looks good to me, approved
jenkins-bot: Verified
diff --git a/scripts/category.py b/scripts/category.py
index efaeee6..f301bd7 100755
--- a/scripts/category.py
+++ b/scripts/category.py
@@ -1055,7 +1055,7 @@
# can we can output in two columns?
count = len(cat_list)
if count > 1 and len(max(cat_list, key=len)) <= 31:
- new_column = math.ceil(count / 2)
+ new_column = int(math.ceil(count / 2.0))
else:
new_column = 0
--
To view, visit https://gerrit.wikimedia.org/r/439526
To unsubscribe, or for help writing mail filters, visit https://gerrit.wikimedia.org/r/settings
Gerrit-Project: pywikibot/core
Gerrit-Branch: master
Gerrit-MessageType: merged
Gerrit-Change-Id: Ic688ce719ebf686ff9496a8b74377e0b04f27e32
Gerrit-Change-Number: 439526
Gerrit-PatchSet: 2
Gerrit-Owner: Xqt <info(a)gno.de>
Gerrit-Reviewer: Dvorapa <dvorapa(a)seznam.cz>
Gerrit-Reviewer: John Vandenberg <jayvdb(a)gmail.com>
Gerrit-Reviewer: Zoranzoki21 <zorandori4444(a)gmail.com>
Gerrit-Reviewer: jenkins-bot