Bugs item #3158761, was opened at 2011-01-15 10:17 Message generated for change (Settings changed) made by xqt You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=603138&aid=3158761...
Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. Category: None Group: None
Status: Closed Resolution: Duplicate
Priority: 5 Private: No Submitted By: Bináris (binbot)
Assigned to: xqt (xqt)
Summary: Template exception overworks in replace.py
Initial Comment: I correct spelling mistakes with replace.py, and use exception: 'exceptions': { 'inside-tags': [ 'hyperlink', 'template', ], etc. as shown at http://meta.wikimedia.org/wiki/Pywikipediabot/replace.py/it
This exception excludes a lot of text that should be replaced! After a long investigation I suspect that the problem may exist when the template is complicated, e. g. the article begins with an infobox. The bot probably thinks to be inside of the template when it is already closed.
Examples: In the last sentence of section http://hu.wikipedia.org/w/index.php?title=Nagyv%C3%A1rad&oldid=9085449#N... the word "telepitettek" was not found. The article begins with an infobox. In the middle of section http://hu.wikipedia.org/w/index.php?title=Opera_%28sz%C3%ADnm%C5%B1%29&o... the word "Szenitávnéji" was not found. The article has no infobox, but the text is preceeded by some templates with parameters, one of them at the very beginning. In section http://hu.wikipedia.org/w/index.php?title=Tennessee&oldid=9028125#Megy.C... the word "alapitási" was not found. The article begins with an infobox.
But: The bot made the replacement here: http://hu.wikipedia.org/w/index.php?title=Mozilla&diff=9106942&oldid... This is also preceeded by some templates, which have parameters, but the one at the beginning of the article has no parameters. Does this make the difference?
All the above mentioned instances were found by the bot when I commented the word "template" out of the exceptions. Not clear whether the bug is in replace.py or pagegenerators.
----------------------------------------------------------------------
Comment By: xqt (xqt)
Date: 2011-05-12 23:24
Message: duplicate to bug #2819291
----------------------------------------------------------------------
Comment By: Bináris (binbot) Date: 2011-02-09 10:38
Message: At least a comment, thank you for dealing with the problem. What I know, in the present form it definitely works wrongly.
----------------------------------------------------------------------
Comment By: Merlijn S. van Deen (valhallasw) Date: 2011-02-09 08:56
Message: Well... this is why we desperately need unit tests. In a quick response - I'm afraid the suggested fix' will break detection of nested templates. Or rather, a template like {{ blah | {{ yakk }} | more stuff }} will not be detected as a nested template, but as {{ blah | {{ yakk }}. Not a 100% sure on this, but this should be tested before applying the suggested fix.
----------------------------------------------------------------------
Comment By: Bináris (binbot) Date: 2011-02-09 04:16
Message: Would anyone please correct this bug? One character only. TIA
----------------------------------------------------------------------
Comment By: Bináris (binbot) Date: 2011-01-15 23:36
Message: Hurray, I have caught it! The bugfix is easy. In pywikibot/textlib.py, line 83, the outer brace is greedy. Changing 'template': re.compile(r'(?s){{(({{.*?}})|.)*}}'), to 'template': re.compile(r'(?s){{(({{.*?}})|.)*?}}'), solved the problem for me.
----------------------------------------------------------------------
You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=603138&aid=3158761...
pywikipedia-bugs@lists.wikimedia.org