jenkins-bot merged this change.
textlib.py: Rewrite FILE_LINK_REGEX to use atomic groups
Python's re engine does not support atomic grouping, mimic the
behaviour using positive lookaheads and capturing groups.[1]
[1]: See http://www.rexegg.com/regex-tricks.html#pseudo-atomic-groups
Bug: T191113
Change-Id: I2eba916d0a171487d14396a3e785b3b253b827f1
---
M pywikibot/textlib.py
1 file changed, 24 insertions(+), 7 deletions(-)
diff --git a/pywikibot/textlib.py b/pywikibot/textlib.py
index 7a40a55..2c2b26c 100644
--- a/pywikibot/textlib.py
+++ b/pywikibot/textlib.py
@@ -105,13 +105,30 @@
# The namespace names must be substituted into this regex.
# e.g. FILE_LINK_REGEX % 'File' or FILE_LINK_REGEX % '|'.join(site.namespaces)
FILE_LINK_REGEX = r"""
-\[\[\s*(?:%s)\s*:[^|]*?\s*
- (\|
- ( ( \[\[ .*? \]\] )? [^[]*?
- | \[ [^]]*? \]
- )*
- )?
-\]\]
+ \[\[\s*
+ (?:%s) # namespace aliases
+ \s*:
+ (?=(?P<filename>
+ [^]|]*
+ ))(?P=filename)
+ (
+ \|
+ (
+ (
+ (?=(?P<inner_link>
+ \[\[.*?\]\]
+ ))(?P=inner_link)
+ )?
+ (?=(?P<other_chars>
+ [^\[\]]*
+ ))(?P=other_chars)
+ |
+ (?=(?P<not_wikilink>
+ \[[^]]*\]
+ ))(?P=not_wikilink)
+ )*?
+ )??
+ \]\]
"""
NON_LATIN_DIGITS = {
To view, visit change 423243. To unsubscribe, visit settings.