https://bugzilla.wikimedia.org/show_bug.cgi?id=55184
Web browser: --- Bug ID: 55184 Summary: replace doesn't support optional groups Product: Pywikibot Version: unspecified Hardware: All OS: All Status: NEW Severity: normal Priority: Unprioritized Component: General Assignee: Pywikipedia-bugs@lists.wikimedia.org Reporter: legoktm.wikipedia@gmail.com Classification: Unclassified Mobile Platform: ---
Originally from: http://sourceforge.net/p/pywikipediabot/bugs/1484/ Reported by: Anonymous user Created on: 2012-07-02 13:25:23 Subject: replace doesn't support optional groups Original description: textlib.py (method replaceExcept) doesn't support optional capturing groups in regex.
I tried to run replace.py with the following regex: "RISHMI(T |IM)?" => "RISHMI\1" when running it on a page containing the following text "SOMETHING RISHMI SOMETHING" it crashes with the following error: textlib.py, line 178, in replaceExcept match.group(groupID) + \ TypeError: coercing to Unicode: need string or buffer, NoneType found
line 178 contains the statement: replacement = replacement[:groupMatch.start()] + \ match.group(groupID) + \ replacement[groupMatch.end():]
textlib.py should check for match.group(groupID) ==None and if so, add here empty string instead of match.group(groupID)
https://bugzilla.wikimedia.org/show_bug.cgi?id=55184
--- Comment #1 from Kunal Mehta (Legoktm) legoktm.wikipedia@gmail.com --- The group must exist to reuse it. What should this regex do in your opinion. What about RISHMI(T |IM|)" or RISHM((?:T |IM)?)"? Errors should never pass silently unless explicitly silenced (PEP 20). Maybe replacing empty strings could lead to unwanted side effects but I have'nt thought about it.
https://bugzilla.wikimedia.org/show_bug.cgi?id=55184
--- Comment #2 from Kunal Mehta (Legoktm) legoktm.wikipedia@gmail.com --- This regex here is just an example, and probably a bad one (as the regex it does nothing by this replacement). Your suggestion regarding the specific regex (to use inner optional group within group) would probably fix this specific regex, but this is workaround - replace.py should support replacing capturing optional capturing group the same way re.findall behaves.
The behaviour of replacing None to empty string is compatible with the behaviour of re.findall (re.findall('a(b)?(c)','ac') => [('', 'c')]) and with regex engines of most languages (in JS: 'ac'.replace(/a(b)?(c)/,'a$1c')), though python re isn't consistent here (re.sub('a(b)?(c)','X\\1','ac') - is error).
https://bugzilla.wikimedia.org/show_bug.cgi?id=55184
Kunal Mehta (Legoktm) legoktm.wikipedia@gmail.com changed:
What |Removed |Added ---------------------------------------------------------------------------- See Also| |https://sourceforge.net/p/p | |ywikipediabot/bugs/1484
https://bugzilla.wikimedia.org/show_bug.cgi?id=55184
Nemo federicoleva@tiscali.it changed:
What |Removed |Added ---------------------------------------------------------------------------- Status|NEW |RESOLVED CC| |federicoleva@tiscali.it Resolution|--- |INVALID
--- Comment #3 from Nemo federicoleva@tiscali.it --- I had my own fights with this problem and my conclusion was that there's nothing to do about it but rewriting your regexes, it's how python works. Mostly, what's nasty is the idiotic error message.
pywikipedia-bugs@lists.wikimedia.org