I posted some days ago 3 patches to sourceforge, but I was told today that the bug-tracking tool isn't much looked at, so I subscribed and here I am.
2 of them are just improvements, 1) adding to replace.py the capability to remember already seen pages (on this patch, I'd like some feedback: as I'm intensively testing it, but maybe the implementation itself could be better) and making catlib.py less time and (servers') bandwith consuming.
The third (http://sourceforge.net/tracker/index.php?func=detail&aid=1843789&gro...) instead is quite critical: actually, image.py just doesn't work at all, and just changing one line makes it work again.
So, if there is anything that can speed up applying it, let me know.
Pietro Battiston
On Dec 8, 2007 3:29 PM, Pietro Battiston toobaz@email.it wrote:
So, if there is anything that can speed up applying it, let me know.
Pietro Battiston
Pywikipedia-l mailing list Pywikipedia-l@lists.wikimedia.org http://lists.wikimedia.org/mailman/listinfo/pywikipedia-l
You could apply for commit access. Applying patches usually takes a lot of time (although I know some people apply patches without testing them...). I'm not sure who you should ask for commit access. Probably Head (Wikipedian) or Merlijn (valhallasw).
Bryan
Pietro Battiston ha scritto:
I posted some days ago 3 patches to sourceforge, but I was told today that the bug-tracking tool isn't much looked at, so I subscribed and here I am.
Submitting in bug-tracker is the best way to see a bug fixed. It is think up just for that purpose. Unluckily, problems requires men and free time.
2 of them are just improvements, 1) adding to replace.py the capability to remember already seen pages (on this patch, I'd like some feedback: as I'm intensively testing it, but maybe the implementation itself could be better)
This feature may be general, not related only to replace.py.
Francesco Cosoleto ha scritto:
Pietro Battiston ha scritto:
- adding to replace.py the capability
to remember already seen pages (on this patch, I'd like some feedback: as I'm intensively testing it, but maybe the implementation itself could be better)
This feature may be general, not related only to replace.py.
Logging an "exclude" file is - in my intentions - particularly related to xml dumps; I don't want to download again all the pages of a dump that I already know I won't change.
That said, I must admit I wondered when I found xml dump page generator is in replace.py instead than in pagegenerators.py. Is there a motivation? This too could be a general feature.
More specifically, the "exclude" logging feature may be general, but in the replace.py case (but maybe in general too) I think it should evolve to provide separate logging for: 1) pages fixed or automatically skipped because they where already fixed 2) pages skipped manually because replace doesn't apply
For example, if I use replace.py with a dump and then get a new dump, I'll delete log 1), but I'll want to keep, in most cases, log 2).
Inserting in wikipedia.py this stuff is maybe redundant for many bots, but I don't think it could be a bad idea. When I find time, I'll write a better patch (xml or still plaintext?).
Pietro Battiston
Pietro Battiston wrote:
Logging an "exclude" file is - in my intentions - particularly related to xml dumps; I don't want to download again all the pages of a dump that I already know I won't change.
That said, I must admit I wondered when I found xml dump page generator is in replace.py instead than in pagegenerators.py. Is there a motivation? This too could be a general feature.
More specifically, the "exclude" logging feature may be general, but in the replace.py case (but maybe in general too) I think it should evolve to provide separate logging for:
- pages fixed or automatically skipped because they where already fixed
- pages skipped manually because replace doesn't apply
For example, if I use replace.py with a dump and then get a new dump, I'll delete log 1), but I'll want to keep, in most cases, log 2).
Inserting in wikipedia.py this stuff is maybe redundant for many bots, but I don't think it could be a bad idea. When I find time, I'll write a better patch (xml or still plaintext?).
Pietro Battiston
About log 1, it should probably be a "start" (or "xmlstart" or so if it conflicts with another parameter) parameter instead - "start from this page in the XML dump" - or "-continue" like interwiki.py (which requires logging). About log 2, these pages should probably contain some enhanced "NoBot" template (don't replace this and that) in the wiki.
Rotem Liss ha scritto:
Pietro Battiston wrote:
Logging an "exclude" file is - in my intentions - particularly related to xml dumps; I don't want to download again all the pages of a dump that I already know I won't change.
That said, I must admit I wondered when I found xml dump page generator is in replace.py instead than in pagegenerators.py. Is there a motivation? This too could be a general feature.
More specifically, the "exclude" logging feature may be general, but in the replace.py case (but maybe in general too) I think it should evolve to provide separate logging for:
- pages fixed or automatically skipped because they where already fixed
- pages skipped manually because replace doesn't apply
For example, if I use replace.py with a dump and then get a new dump, I'll delete log 1), but I'll want to keep, in most cases, log 2).
Inserting in wikipedia.py this stuff is maybe redundant for many bots, but I don't think it could be a bad idea. When I find time, I'll write a better patch (xml or still plaintext?).
Pietro Battiston
About log 1, it should probably be a "start" (or "xmlstart" or so if it conflicts with another parameter) parameter instead - "start from this page in the XML dump" - or "-continue" like interwiki.py (which requires logging).
Maybe, but in this case I'd like replace.py to save somewhere the last page it edited and the next time start from that one, even if it stopped working because it was killed! It could just overwrite instead of appending the "exclude" log, but... what would have we gained? Moreover, if I know some pages were already fixed, suppose, by another bot working on a category, I can concatenate his log to mine.
About log 2, these pages should probably contain some enhanced "NoBot" template (don't replace this and that) in the wiki.
Probably. Do such templates already exist somewhere?!
Pietro
pywikipedia-l@lists.wikimedia.org