Bugs item #3539444, was opened at 2012-07-02 06:25
Message generated for change (Comment added) made by eranroz
You can respond by visiting:
https://sourceforge.net/tracker/?func=detail&atid=603138&aid=353944…
Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: General
Group: None
Status: Open
Resolution: None
Priority: 5
Private: No
Submitted By: Nobody/Anonymous (nobody)
Assigned to: Nobody/Anonymous (nobody)
Summary: replace doesn't support optional groups
Initial Comment:
textlib.py (method replaceExcept) doesn't support optional capturing groups in
regex.
I tried to run replace.py with the following regex: "RISHMI(T |IM)?" =>
"RISHMI\1"
when running it on a page containing the following text "SOMETHING RISHMI
SOMETHING"
it crashes with the following error:
textlib.py, line 178, in replaceExcept
match.group(groupID) + \
TypeError: coercing to Unicode: need string or buffer, NoneType found
line 178 contains the statement:
replacement = replacement[:groupMatch.start()] + \
match.group(groupID) + \
replacement[groupMatch.end():]
textlib.py should check for match.group(groupID) ==None and if so, add here empty string
instead of match.group(groupID)
----------------------------------------------------------------------
Comment By: Eranroz (eranroz)
Date: 2012-07-04 05:03
Message:
This regex here is just an example, and probably a bad one (as the regex it
does nothing by this replacement). Your suggestion regarding the specific
regex (to use inner optional group within group) would probably fix this
specific regex, but this is workaround - replace.py should support
replacing capturing optional capturing group the same way re.findall
behaves.
The behaviour of replacing None to empty string is compatible with the
behaviour of re.findall (re.findall('a(b)?(c)','ac') => [('',
'c')]) and
with regex engines of most languages (in JS:
'ac'.replace(/a(b)?(c)/,'a$1c')), though python re isn't consistent
here
(re.sub('a(b)?(c)','X\\1','ac') - is error).
----------------------------------------------------------------------
Comment By: xqt (xqt)
Date: 2012-07-03 22:20
Message:
The group must exist to reuse it. What should this regex do in your
opinion. What about RISHMI(T |IM|)" or RISHM((?:T |IM)?)"? Errors should
never pass silently unless explicitly silenced (PEP 20). Maybe replacing
empty strings could lead to unwanted side effects but I have'nt thought
about it.
----------------------------------------------------------------------
You can respond by visiting:
https://sourceforge.net/tracker/?func=detail&atid=603138&aid=353944…