Pietro Battiston wrote:
Logging an "exclude" file is - in my intentions - particularly related to xml dumps; I don't want to download again all the pages of a dump that I already know I won't change.
That said, I must admit I wondered when I found xml dump page generator is in replace.py instead than in pagegenerators.py. Is there a motivation? This too could be a general feature.
More specifically, the "exclude" logging feature may be general, but in the replace.py case (but maybe in general too) I think it should evolve to provide separate logging for:
- pages fixed or automatically skipped because they where already fixed
- pages skipped manually because replace doesn't apply
For example, if I use replace.py with a dump and then get a new dump, I'll delete log 1), but I'll want to keep, in most cases, log 2).
Inserting in wikipedia.py this stuff is maybe redundant for many bots, but I don't think it could be a bad idea. When I find time, I'll write a better patch (xml or still plaintext?).
Pietro Battiston
About log 1, it should probably be a "start" (or "xmlstart" or so if it conflicts with another parameter) parameter instead - "start from this page in the XML dump" - or "-continue" like interwiki.py (which requires logging). About log 2, these pages should probably contain some enhanced "NoBot" template (don't replace this and that) in the wiki.