jenkins-bot has submitted this change and it was merged. ( https://gerrit.wikimedia.org/r/423463 )
Change subject: Use the old FILE_LINK_REGEX regex for Python versions older than 2.7.4 ......................................................................
Use the old FILE_LINK_REGEX regex for Python versions older than 2.7.4
This patch reverts c25f22164f88b1bf91dd329d0566afed685ce530 and limits the usage of the new regex introduced in 8ede851d8c0655b7eb021e69003e9 to Python 2.7.4+.
Although the old regex may get stuck in catastrophic backtracking, but the chances of that situation happening are much lower and we have been using if for a long time (since 2016-10-31).
Bug: T191161 Change-Id: I1567886eeacc82b3a393d818f98cdfb0216b1b31 --- M pywikibot/textlib.py 1 file changed, 35 insertions(+), 23 deletions(-)
Approvals: Xqt: Looks good to me, approved jenkins-bot: Verified
diff --git a/pywikibot/textlib.py b/pywikibot/textlib.py index 694e781..8f3da8a 100644 --- a/pywikibot/textlib.py +++ b/pywikibot/textlib.py @@ -103,32 +103,44 @@ # [[ or ]]. # The namespace names must be substituted into this regex. # e.g. FILE_LINK_REGEX % 'File' or FILE_LINK_REGEX % '|'.join(site.namespaces) -FILE_LINK_REGEX = r""" - [[\s* - (?:%s) # namespace aliases - \s*: - ((?=(?P<filename> - [^]|]+ # * quantifier may crash on Python 2.7.2 (T191161) - ))(?P=filename))? - ( - | +if sys.version_info[:3] >= (2, 7, 4): + FILE_LINK_REGEX = r""" + [[\s* + (?:%s) # namespace aliases + \s*: + (?=(?P<filename> + [^]|]* + ))(?P=filename) ( + | ( - (?=(?P<inner_link> - [[.*?]] - ))(?P=inner_link) - )? - ((?=(?P<other_chars> - [^[]]+ # * quantifier may crash on Python 2.7.2 (T191161) - ))(?P=other_chars))? - | - (?=(?P<not_wikilink> - [[^]]*] - ))(?P=not_wikilink) - )*? - )?? + ( + (?=(?P<inner_link> + [[.*?]] + ))(?P=inner_link) + )? + (?=(?P<other_chars> + [^[]]* + ))(?P=other_chars) + | + (?=(?P<not_wikilink> + [[^]]*] + ))(?P=not_wikilink) + )*? + )?? + ]] + """ +else: + # Python 2.7.2 and 2.7.3 re bug (T191161) + FILE_LINK_REGEX = r""" + [[\s*(?:%s)\s*:[^|]*?\s* + (| + ( ( [[ .*? ]] )? [^[]*? + | [ [^]]*? ] + )* + )? ]] -""" + """
NON_LATIN_DIGITS = { 'ckb': u'٠١٢٣٤٥٦٧٨٩',
pywikibot-commits@lists.wikimedia.org