jenkins-bot has submitted this change and it was merged. ( https://gerrit.wikimedia.org/r/423243 )
Change subject: textlib.py: Rewrite FILE_LINK_REGEX to use atomic groups ......................................................................
textlib.py: Rewrite FILE_LINK_REGEX to use atomic groups
Python's re engine does not support atomic grouping, mimic the behaviour using positive lookaheads and capturing groups.[1]
[1]: See http://www.rexegg.com/regex-tricks.html#pseudo-atomic-groups
Bug: T191113 Change-Id: I2eba916d0a171487d14396a3e785b3b253b827f1 --- M pywikibot/textlib.py 1 file changed, 24 insertions(+), 7 deletions(-)
Approvals: Xqt: Looks good to me, approved jenkins-bot: Verified
diff --git a/pywikibot/textlib.py b/pywikibot/textlib.py index 7a40a55..2c2b26c 100644 --- a/pywikibot/textlib.py +++ b/pywikibot/textlib.py @@ -105,13 +105,30 @@ # The namespace names must be substituted into this regex. # e.g. FILE_LINK_REGEX % 'File' or FILE_LINK_REGEX % '|'.join(site.namespaces) FILE_LINK_REGEX = r""" -[[\s*(?:%s)\s*:[^|]*?\s* - (| - ( ( [[ .*? ]] )? [^[]*? - | [ [^]]*? ] - )* - )? -]] + [[\s* + (?:%s) # namespace aliases + \s*: + (?=(?P<filename> + [^]|]* + ))(?P=filename) + ( + | + ( + ( + (?=(?P<inner_link> + [[.*?]] + ))(?P=inner_link) + )? + (?=(?P<other_chars> + [^[]]* + ))(?P=other_chars) + | + (?=(?P<not_wikilink> + [[^]]*] + ))(?P=not_wikilink) + )*? + )?? + ]] """
NON_LATIN_DIGITS = {
pywikibot-commits@lists.wikimedia.org