[Gerrit] pywikibot/core[master]: diff_checker.py: Decode tokenizer strings using 'utf-8' enco... - Pywikibot-commits

2 Feb 2018

jenkins-bot has submitted this change and it was merged. ( https://gerrit.wikimedia.org/r/407590 )
Change subject: diff_checker.py: Decode tokenizer strings using 'utf-8' encoding on Python 2
......................................................................
diff_checker.py: Decode tokenizer strings using 'utf-8' encoding on Python 2
Apparently the tokenizer on Python3 has an internal mechanism to detect the
right encoding and returns unicode objects.[1] But the tokenizer on Python 2
returns byte-strings which need to be explicitly decoded, otherwise the
default encoding (sometimes 'ascii') is used that causes UnicodeDecodeError.
[1] See:
https://docs.python.org/3/library/tokenize.html#tokenize.detect_encoding
Bug: T186301
Change-Id: I029ae20145bb634c72e2f7f24b8c749d5885fb25
---
M scripts/maintenance/diff_checker.py
1 file changed, 4 insertions(+), 0 deletions(-)
Approvals:
  jenkins-bot: Verified
  Xqt: Looks good to me, approved

diff --git a/scripts/maintenance/diff_checker.py b/scripts/maintenance/diff_checker.py
index 734befd..a055143 100644
--- a/scripts/maintenance/diff_checker.py
+++ b/scripts/maintenance/diff_checker.py
@@ -30,8 +30,10 @@
 from subprocess import check_output
 from sys import version_info
 if version_info.major == 3:
+    PY2 = False
     from tokenize import tokenize, STRING
 else:
+    PY2 = True
     from tokenize import generate_tokens as tokenize, STRING
from unidiff import PatchSet
@@ -72,6 +74,8 @@
                 break
             if start[0] not in line_nos or type_ != STRING:
                 continue
+            if PY2:
+                string = string.decode('utf-8')
             match = STRING_MATCH(string)
             if match.group('unicode_literal'):
                 error = True
-- 
To view, visit https://gerrit.wikimedia.org/r/407590
To unsubscribe, visit https://gerrit.wikimedia.org/r/settings

Gerrit-MessageType: merged
Gerrit-Change-Id: I029ae20145bb634c72e2f7f24b8c749d5885fb25
Gerrit-PatchSet: 4
Gerrit-Project: pywikibot/core
Gerrit-Branch: master
Gerrit-Owner: Dalba dalba.wiki@gmail.com
Gerrit-Reviewer: Dalba dalba.wiki@gmail.com
Gerrit-Reviewer: Xqt info@gno.de
Gerrit-Reviewer: jenkins-bot <>