https://bugzilla.wikimedia.org/show_bug.cgi?id=54574
--- Comment #3 from Kunal Mehta (Legoktm) <legoktm.wikipedia(a)gmail.com> ---
Thanks for the quick review. I will try to address the
various points and included a new version of the patch.
a. I added a bit more text to the source and reformatted
part of the code, but I didn't want to change existing
code more than needed.
b. generator:
\- checks if the filter file exists
\- reads it
\- runs the next generator and skips pages in memory
Previously, it first run the next generator and then deleted
from its result pages that were in the filter file
c. replace.py command line options
I added several command line options to define which
pages should be skipped the next time. One could edit
replace.py directly, but it seemed cleaner to provide
all options at command line level.
toobaz excluded pages where a replacement was manually
rejected \("N"\). The option "-exclude" will keep this
functionality.
Personally, I find it more useful to filter pages that
were edited in a previous run. This avoids that the bot
repeats the same edit later, after someone reverted
a previous edit. Option "-editonce" provides this.
"-treatonce" combines the two.
"-scanonce" avoids that the bot re-fetches the same page
in a 2nd run, even if the regex didn't match it in
the first run. \(I fixed an omission for "skipped" in
the second patch\)
Without the different options, the additions to replace.py
would be much shorter ..
d. I had to insert several "break" in replace.py to avoid
that nothing but "N" gets to the stage confusingly labeled
"choice must be 'N'" in the code.
e. FilterFileAppend is based on the function from
solve\_disambiguation. The advantage of writing each
page to the file is that it wont miss one if it's
interrupted or crashes. This mode from
solve\_disambiguation remains unchanged.
f. The same goes for the file format. Up to now, I didn't
have any problems with it and it worked ok with a
title "臺灣Taiwan&āàäà" I just tested. urlname was also
used by PrimaryIgnoreManager. For backward compatibility,
may it should be kept.
--
You are receiving this mail because:
You are the assignee for the bug.