Bugs item #3539444, was opened at 2012-07-02 06:25 Message generated for change (Comment added) made by eranroz You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=603138&aid=3539444...
Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. Category: General Group: None Status: Open Resolution: None Priority: 5 Private: No Submitted By: Nobody/Anonymous (nobody) Assigned to: Nobody/Anonymous (nobody) Summary: replace doesn't support optional groups
Initial Comment: textlib.py (method replaceExcept) doesn't support optional capturing groups in regex.
I tried to run replace.py with the following regex: "RISHMI(T |IM)?" => "RISHMI\1" when running it on a page containing the following text "SOMETHING RISHMI SOMETHING" it crashes with the following error: textlib.py, line 178, in replaceExcept match.group(groupID) + \ TypeError: coercing to Unicode: need string or buffer, NoneType found
line 178 contains the statement: replacement = replacement[:groupMatch.start()] + \ match.group(groupID) + \ replacement[groupMatch.end():]
textlib.py should check for match.group(groupID) ==None and if so, add here empty string instead of match.group(groupID)
----------------------------------------------------------------------
Comment By: Eranroz (eranroz) Date: 2012-07-04 05:03
Message: This regex here is just an example, and probably a bad one (as the regex it does nothing by this replacement). Your suggestion regarding the specific regex (to use inner optional group within group) would probably fix this specific regex, but this is workaround - replace.py should support replacing capturing optional capturing group the same way re.findall behaves.
The behaviour of replacing None to empty string is compatible with the behaviour of re.findall (re.findall('a(b)?(c)','ac') => [('', 'c')]) and with regex engines of most languages (in JS: 'ac'.replace(/a(b)?(c)/,'a$1c')), though python re isn't consistent here (re.sub('a(b)?(c)','X\1','ac') - is error).
----------------------------------------------------------------------
Comment By: xqt (xqt) Date: 2012-07-03 22:20
Message: The group must exist to reuse it. What should this regex do in your opinion. What about RISHMI(T |IM|)" or RISHM((?:T |IM)?)"? Errors should never pass silently unless explicitly silenced (PEP 20). Maybe replacing empty strings could lead to unwanted side effects but I have'nt thought about it.
----------------------------------------------------------------------
You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=603138&aid=3539444...
pywikipedia-bugs@lists.wikimedia.org