I'm having trouble with this script, which I'm running on Appropedia.org... it's not a huge deal if it doesn't work, but I'd appreciate if anyone has the patience to help me understand how to debug this, or why it doesn't work.


I've narrowed it down to the \2 in the replace term, as the problem disappears when I remove it:

python replace.py -regex '(?si)\b(WordPress)\b(.*$)' '\1\2\n[[Category:Appropedia WordPress site]]' -excepttext:'(?si)\[\[\s*Category:\s*Appropedia WordPress site' -excepttext:'(?si)(\#redirect\s*\[\[)' -namespace:4 -namespace:12 -summary:'add [[Category:Appropedia WordPress site]] based on search and manual check.' -log:CategoryAdd -xml:currentdump.xml

Output is:

Reading XML dump...
Traceback (most recent call last):
  File "/home/cwg23/pwb/pagegenerators.py", line 1182, in __iter__
    for page in self.wrapped_gen:
  File "/home/cwg23/pwb/pagegenerators.py", line 1039, in NamespaceFilterPageGenerator
    for page in generator:
  File "/home/cwg23/pwb/pagegenerators.py", line 1084, in DuplicateFilterPageGenerator
    for page in generator:
  File "replace.py", line 217, in __iter__
    new_text = pywikibot.replaceExcept(new_text, old, new, self.excsInside, self.site)
  File "/home/cwg23/pwb/pywikibot/textlib.py", line 175, in replaceExcept
    match.group(groupID) + \
IndexError: no such group
no such group
0 pages were changed.


And then it gets interesting...  to speed things up while debugging, I made a modified replace script called replace2.py which only loads 2 pages at a time (by setting "maxquerysize = 2" in that file). Funny thing - I can run exactly the same command but with "replace2.py" and it works... up until it gets to a particular page. Then I press n and get the error. (Btw, I've run versions of this bot in the past with only the match & replace text changed, with no problems, so it makes sense that the error only occurs in specific conditions.)

The last page that it gives me is Appropedia:A Humourless Lot staging area - I assume the page where the problem occurs is one of the next 2 being loaded, and I don't know how to tell which pages they are. I can't see how the order of pages is determined, as it changed during my debugging/testing.

Thanks for any ideas.


--
Chris Watkins

Appropedia.org - Sharing knowledge to build rich, sustainable lives.