jenkins-bot has submitted this change and it was merged. ( https://gerrit.wikimedia.org/r/407590 )
Change subject: diff_checker.py: Decode tokenizer strings using 'utf-8' encoding on Python 2 ......................................................................
diff_checker.py: Decode tokenizer strings using 'utf-8' encoding on Python 2
Apparently the tokenizer on Python3 has an internal mechanism to detect the right encoding and returns unicode objects.[1] But the tokenizer on Python 2 returns byte-strings which need to be explicitly decoded, otherwise the default encoding (sometimes 'ascii') is used that causes UnicodeDecodeError.
[1] See: https://docs.python.org/3/library/tokenize.html#tokenize.detect_encoding
Bug: T186301 Change-Id: I029ae20145bb634c72e2f7f24b8c749d5885fb25 --- M scripts/maintenance/diff_checker.py 1 file changed, 4 insertions(+), 0 deletions(-)
Approvals: jenkins-bot: Verified Xqt: Looks good to me, approved
diff --git a/scripts/maintenance/diff_checker.py b/scripts/maintenance/diff_checker.py index 734befd..a055143 100644 --- a/scripts/maintenance/diff_checker.py +++ b/scripts/maintenance/diff_checker.py @@ -30,8 +30,10 @@ from subprocess import check_output from sys import version_info if version_info.major == 3: + PY2 = False from tokenize import tokenize, STRING else: + PY2 = True from tokenize import generate_tokens as tokenize, STRING
from unidiff import PatchSet @@ -72,6 +74,8 @@ break if start[0] not in line_nos or type_ != STRING: continue + if PY2: + string = string.decode('utf-8') match = STRING_MATCH(string) if match.group('unicode_literal'): error = True
pywikibot-commits@lists.wikimedia.org