jenkins-bot merged this change.

View Change

Approvals: Xqt: Looks good to me, approved jenkins-bot: Verified
textlib.py: Rewrite FILE_LINK_REGEX to use atomic groups

Python's re engine does not support atomic grouping, mimic the
behaviour using positive lookaheads and capturing groups.[1]

[1]: See http://www.rexegg.com/regex-tricks.html#pseudo-atomic-groups

Bug: T191113
Change-Id: I2eba916d0a171487d14396a3e785b3b253b827f1
---
M pywikibot/textlib.py
1 file changed, 24 insertions(+), 7 deletions(-)

diff --git a/pywikibot/textlib.py b/pywikibot/textlib.py
index 7a40a55..2c2b26c 100644
--- a/pywikibot/textlib.py
+++ b/pywikibot/textlib.py
@@ -105,13 +105,30 @@
# The namespace names must be substituted into this regex.
# e.g. FILE_LINK_REGEX % 'File' or FILE_LINK_REGEX % '|'.join(site.namespaces)
FILE_LINK_REGEX = r"""
-\[\[\s*(?:%s)\s*:[^|]*?\s*
- (\|
- ( ( \[\[ .*? \]\] )? [^[]*?
- | \[ [^]]*? \]
- )*
- )?
-\]\]
+ \[\[\s*
+ (?:%s) # namespace aliases
+ \s*:
+ (?=(?P<filename>
+ [^]|]*
+ ))(?P=filename)
+ (
+ \|
+ (
+ (
+ (?=(?P<inner_link>
+ \[\[.*?\]\]
+ ))(?P=inner_link)
+ )?
+ (?=(?P<other_chars>
+ [^\[\]]*
+ ))(?P=other_chars)
+ |
+ (?=(?P<not_wikilink>
+ \[[^]]*\]
+ ))(?P=not_wikilink)
+ )*?
+ )??
+ \]\]
"""

NON_LATIN_DIGITS = {

To view, visit change 423243. To unsubscribe, visit settings.

Gerrit-Project: pywikibot/core
Gerrit-Branch: master
Gerrit-MessageType: merged
Gerrit-Change-Id: I2eba916d0a171487d14396a3e785b3b253b827f1
Gerrit-Change-Number: 423243
Gerrit-PatchSet: 3
Gerrit-Owner: Dalba <dalba.wiki@gmail.com>
Gerrit-Reviewer: John Vandenberg <jayvdb@gmail.com>
Gerrit-Reviewer: Xqt <info@gno.de>
Gerrit-Reviewer: Zoranzoki21 <zorandori4444@gmail.com>
Gerrit-Reviewer: jenkins-bot <>