jenkins-bot has submitted this change and it was merged. (
https://gerrit.wikimedia.org/r/423243 )
Change subject: textlib.py: Rewrite FILE_LINK_REGEX to use atomic groups
......................................................................
textlib.py: Rewrite FILE_LINK_REGEX to use atomic groups
Python's re engine does not support atomic grouping, mimic the
behaviour using positive lookaheads and capturing groups.[1]
[1]: See
http://www.rexegg.com/regex-tricks.html#pseudo-atomic-groups
Bug: T191113
Change-Id: I2eba916d0a171487d14396a3e785b3b253b827f1
---
M pywikibot/textlib.py
1 file changed, 24 insertions(+), 7 deletions(-)
Approvals:
Xqt: Looks good to me, approved
jenkins-bot: Verified
diff --git a/pywikibot/textlib.py b/pywikibot/textlib.py
index 7a40a55..2c2b26c 100644
--- a/pywikibot/textlib.py
+++ b/pywikibot/textlib.py
@@ -105,13 +105,30 @@
# The namespace names must be substituted into this regex.
# e.g. FILE_LINK_REGEX % 'File' or FILE_LINK_REGEX %
'|'.join(site.namespaces)
FILE_LINK_REGEX = r"""
-\[\[\s*(?:%s)\s*:[^|]*?\s*
- (\|
- ( ( \[\[ .*? \]\] )? [^[]*?
- | \[ [^]]*? \]
- )*
- )?
-\]\]
+ \[\[\s*
+ (?:%s) # namespace aliases
+ \s*:
+ (?=(?P<filename>
+ [^]|]*
+ ))(?P=filename)
+ (
+ \|
+ (
+ (
+ (?=(?P<inner_link>
+ \[\[.*?\]\]
+ ))(?P=inner_link)
+ )?
+ (?=(?P<other_chars>
+ [^\[\]]*
+ ))(?P=other_chars)
+ |
+ (?=(?P<not_wikilink>
+ \[[^]]*\]
+ ))(?P=not_wikilink)
+ )*?
+ )??
+ \]\]
"""
NON_LATIN_DIGITS = {
--
To view, visit
https://gerrit.wikimedia.org/r/423243
To unsubscribe, visit
https://gerrit.wikimedia.org/r/settings
Gerrit-Project: pywikibot/core
Gerrit-Branch: master
Gerrit-MessageType: merged
Gerrit-Change-Id: I2eba916d0a171487d14396a3e785b3b253b827f1
Gerrit-Change-Number: 423243
Gerrit-PatchSet: 3
Gerrit-Owner: Dalba <dalba.wiki(a)gmail.com>
Gerrit-Reviewer: John Vandenberg <jayvdb(a)gmail.com>
Gerrit-Reviewer: Xqt <info(a)gno.de>
Gerrit-Reviewer: Zoranzoki21 <zorandori4444(a)gmail.com>
Gerrit-Reviewer: jenkins-bot <>