jenkins-bot has submitted this change and it was merged.
Change subject: [FEAT] replace: Warn about format chars in string ......................................................................
[FEAT] replace: Warn about format chars in string
When a string contains formatting characters like U+200E it's not immediately visible and it might not match text. So this is warning about all chars in the Cf category as defined by the Unicode standard.
Change-Id: I38ffb9e63d6827de0a42ace39073105aa6761d2e --- M scripts/replace.py 1 file changed, 12 insertions(+), 0 deletions(-)
Approvals: John Vandenberg: Looks good to me, approved jenkins-bot: Verified
diff --git a/scripts/replace.py b/scripts/replace.py index 19a7855..7cb7764 100755 --- a/scripts/replace.py +++ b/scripts/replace.py @@ -137,6 +137,7 @@ import re import time import sys +import unicodedata
import pywikibot from pywikibot import i18n, textlib, pagegenerators, Bot @@ -666,6 +667,11 @@ return pattern
+def contains_format_characters(text): + """Return True when there are format characters (e.g. U+200E) in text.""" + return any(unicodedata.category(char) == 'Cf' for char in text) + + def main(*args): """ Process command line arguments and invoke bot. @@ -875,6 +881,12 @@ set_summary) for replacement in fix['replacements']: summary = None if len(replacement) < 3 else replacement[2] + if contains_format_characters(replacement[0]): + pywikibot.warning('The old string "{0}" contains formatting ' + 'characters like U+200E'.format(replacement[0])) + if contains_format_characters(replacement[1]): + pywikibot.warning('The new string "{0}" contains formatting ' + 'characters like U+200E'.format(replacement[1])) replacements.append(ReplacementListEntry( old=replacement[0], new=replacement[1],