I'm having trouble with this script, which I'm running on
Appropedia.org...
it's not a huge deal if it doesn't work, but I'd appreciate if anyone has
the patience to help me understand how to debug this, or *why* it doesn't
work.
I've narrowed it down to the \2 in the replace term, as the problem
disappears when I remove it:
python replace.py -regex '(?si)\b(WordPress)\b(.*$)'
'\1\2\n[[Category:Appropedia WordPress site]]'
-excepttext:'(?si)\[\[\s*Category:\s*Appropedia WordPress site'
-excepttext:'(?si)(\#redirect\s*\[\[)' -namespace:4 -namespace:12
-summary:'add [[Category:Appropedia WordPress site]] based on search and
manual check.' -log:CategoryAdd -xml:currentdump.xml
Output is:
Reading XML dump...
Traceback (most recent call last):
File "/home/cwg23/pwb/pagegenerators.py", line 1182, in __iter__
for page in self.wrapped_gen:
File "/home/cwg23/pwb/pagegenerators.py", line 1039, in
NamespaceFilterPageGenerator
for page in generator:
File "/home/cwg23/pwb/pagegenerators.py", line 1084, in
DuplicateFilterPageGenerator
for page in generator:
File "replace.py", line 217, in __iter__
new_text = pywikibot.replaceExcept(new_text, old, new, self.excsInside,
self.site)
File "/home/cwg23/pwb/pywikibot/textlib.py", line 175, in replaceExcept
match.group(groupID) + \
IndexError: no such group
no such group
0 pages were changed.
And then it gets interesting... to speed things up while debugging, I made
a modified replace script called replace2.py which only loads 2 pages at a
time (by setting "maxquerysize = 2" in that file). Funny thing - I can run
exactly the same command but with "replace2.py" and it works... up until it
gets to a particular page. Then I press n and get the error. (Btw, I've run
versions of this bot in the past with only the match & replace text
changed, with no problems, so it makes sense that the error only occurs in
specific conditions.)
The last page that it gives me is Appropedia:A Humourless Lot staging
area<http://www.appropedia.org/Appropedia:A_Humourless_Lot_staging_area&…
I assume the page where the problem occurs is one of the next 2 being
loaded, and I don't know how to tell which pages they are. I can't see how
the order of pages is determined, as it changed during my debugging/testing.
Thanks for any ideas.
--
Chris Watkins
Appropedia.org - Sharing knowledge to build rich, sustainable lives.