https://bugzilla.wikimedia.org/show_bug.cgi?id=54562
Web browser: --- Bug ID: 54562 Summary: Bugfix for optional caputring group Product: Pywikibot Version: unspecified Hardware: All OS: All Status: NEW Severity: normal Priority: Unprioritized Component: General Assignee: Pywikipedia-bugs@lists.wikimedia.org Reporter: legoktm.wikipedia@gmail.com Classification: Unclassified Mobile Platform: ---
Originally from: http://sourceforge.net/p/pywikipediabot/patches/555/ Reported by: eranroz Created on: 2012-07-03 18:35:29 Subject: Bugfix for optional caputring group Original description: Patch for pywikibot/textlib.py for the replace function (replaceExcept) for supporting for empty/optional capturing groups. This is a bugfix for a crash that occur when using replace.py with a regex containing optional capturing group (eg AAA in this regex "bla(AAA)?bla" )
https://bugzilla.wikimedia.org/show_bug.cgi?id=54562
--- Comment #1 from Kunal Mehta (Legoktm) legoktm.wikipedia@gmail.com --- support for empty capturing group
https://bugzilla.wikimedia.org/show_bug.cgi?id=54562
--- Comment #2 from Kunal Mehta (Legoktm) legoktm.wikipedia@gmail.com --- Is this path for bug #3539444?
https://bugzilla.wikimedia.org/show_bug.cgi?id=54562
--- Comment #3 from Kunal Mehta (Legoktm) legoktm.wikipedia@gmail.com --- See my comment at the corresponding bug tracker. Maybe it would be ok to accept this patch, anyway I've asked for a third opinion in this matter.
https://bugzilla.wikimedia.org/show_bug.cgi?id=54562
--- Comment #4 from Kunal Mehta (Legoktm) legoktm.wikipedia@gmail.com --- I don't understand this bug. What is the traceback before this patch is implemented. And what should that replaceexcept() do in your special case Could you give me a full example. You may exclude this group by "bla(?:AAA)?bla"; would this help?
https://bugzilla.wikimedia.org/show_bug.cgi?id=54562
--- Comment #5 from Kunal Mehta (Legoktm) legoktm.wikipedia@gmail.com --- Yea, this is bugfix for 3539444 . In short: when running the following regex "ADMA (a)?poria" => "ADMA \1porya" on text containing ADMA poria (with no a before poria) it crashs with the following error doReplacements res = replace.ReplaceRobot.doReplacements(self,original_text) File "D:\myBot\python\pywikipedia-nightly\replace.py", line 390, in doReplacements allowoverlap=self.allowoverlap) File "D:\myBot\python\pywikipedia-nightly\pywikibot\textlib.py", line 179, in replaceExcept match.group(groupID) + \ TypeError: coercing to Unicode: need string or buffer, NoneType found
You may suggest to rewrite the specific regex and it may probably work, but it is just workaround - regex with optional capturing group is correct and should work properly. Longer story :) : In Hebrew Wikipedia there is a list of regexs that are used for replacements in all articles (almost). which is here: http://he.wikipedia.org/wiki/%D7%95%D7%A7:%D7%A8%D7%94 The columns in the table there are: ID | old | new | exceptText The list is used by C# bot implementation which isn't active, and by JS userscript implementation which is used for specific page replacements. I have ported it to work with replace.py, but if fails when it gets to replacement with optional capturing group. After my fix (locally) I ran it for 250 test edits and it worked properly without crashes
https://bugzilla.wikimedia.org/show_bug.cgi?id=54562
Kunal Mehta (Legoktm) legoktm.wikipedia@gmail.com changed:
What |Removed |Added ---------------------------------------------------------------------------- See Also| |https://sourceforge.net/p/p | |ywikipediabot/patches/555
https://bugzilla.wikimedia.org/show_bug.cgi?id=54562
Ricordisamoa ricordisamoa@live.it changed:
What |Removed |Added ---------------------------------------------------------------------------- CC| |ricordisamoa@live.it Summary|Bugfix for optional |Support optional capturing |caputring group |groups in replaceExcept
https://bugzilla.wikimedia.org/show_bug.cgi?id=54562
Amir Ladsgroup ladsgroup@gmail.com changed:
What |Removed |Added ---------------------------------------------------------------------------- Priority|Unprioritized |Normal CC| |ladsgroup@gmail.com
--- Comment #6 from Amir Ladsgroup ladsgroup@gmail.com --- The patch looks good to me
https://bugzilla.wikimedia.org/show_bug.cgi?id=54562
Ricordisamoa ricordisamoa@openmailbox.org changed:
What |Removed |Added ---------------------------------------------------------------------------- Component|General |textlib.py
pywikipedia-bugs@lists.wikimedia.org