Rotem Liss ha scritto:
Pietro Battiston wrote:
Logging an "exclude" file is - in my intentions - particularly related to xml dumps; I don't want to download again all the pages of a dump that I already know I won't change.
That said, I must admit I wondered when I found xml dump page generator is in replace.py instead than in pagegenerators.py. Is there a motivation? This too could be a general feature.
More specifically, the "exclude" logging feature may be general, but in the replace.py case (but maybe in general too) I think it should evolve to provide separate logging for:
- pages fixed or automatically skipped because they where already fixed
- pages skipped manually because replace doesn't apply
For example, if I use replace.py with a dump and then get a new dump, I'll delete log 1), but I'll want to keep, in most cases, log 2).
Inserting in wikipedia.py this stuff is maybe redundant for many bots, but I don't think it could be a bad idea. When I find time, I'll write a better patch (xml or still plaintext?).
Pietro Battiston
About log 1, it should probably be a "start" (or "xmlstart" or so if it conflicts with another parameter) parameter instead - "start from this page in the XML dump" - or "-continue" like interwiki.py (which requires logging).
Maybe, but in this case I'd like replace.py to save somewhere the last page it edited and the next time start from that one, even if it stopped working because it was killed! It could just overwrite instead of appending the "exclude" log, but... what would have we gained? Moreover, if I know some pages were already fixed, suppose, by another bot working on a category, I can concatenate his log to mine.
About log 2, these pages should probably contain some enhanced "NoBot" template (don't replace this and that) in the wiki.
Probably. Do such templates already exist somewhere?!
Pietro