I am just running replace.py -catr:%D0%92%D0%B5%D0%BD%D0%B3%D1%80%D0%B8%D1%8F . @ -lang:ru -excepttext:"[[hu:" -save:magyarok.txt -always to collect Hungary-related articles from Russian Wikipedia. This stuff has already been running for 10 hours mostly because of pywikibot.output. When the overwhelming majority of characters appears as yellow (substituted), this function slows down the program extremely. I see it writing text on my screen character by character.
This gives me the idea to introduce a new switch: -silent.
From line 474 on
# Show the title of the page we're working on. # Highlight the title in purple. pywikibot.output(u"\n\n>>> \03{lightpurple}%s\03{default} <<<" % page.title()) pywikibot.showDiff(original_text, new_text) should not be executed with this switch on. This is recommended only for such cases as mentioned above.
Because this can be dangereous, there could be a restriction that allows this switch to work only together with -always. OR, an even more restrictive rule: it can work only with -save/savenew (one could argue that any work on live wiki should appear on the screen).
Which one is better?
Errata: pywikibot.showdiff is the main agent to slow down the program. I am not sure if pywikibot.output is also so guilty. Now I restarted the bot with showdiff commented out.
But I think it would run faster without showdiff even with Latin letters, so it is worth to switch off.
On Thu, Mar 3, 2011 at 10:03 AM, Bináris wikiposta@gmail.com wrote:
Errata: pywikibot.showdiff is the main agent to slow down the program. I am not sure if pywikibot.output is also so guilty. Now I restarted the bot with showdiff commented out.
In the past the pywikibot output was indeed rather slow when there was much to transliterate, but the problems on that point have already been resolved in my edit #6275, from January 2009, when transliteration was changed from a sequence of elifs to a dictionary.
2011/3/3 Andre Engels andreengels@gmail.com
In the past the pywikibot output was indeed rather slow when there was much to transliterate, but the problems on that point have already been resolved in my edit #6275, from January 2009, when transliteration was changed from a sequence of elifs to a dictionary.
You are right, I made an experiment. The above mentioned command was killed
with ctrl C after 10 and a half hours, and has collected approx. 950 titles by that time. Now I commented showdiff out, but not the line pywikibot.output(page.title()). This time it collected 3200 titles in the first 24 minutes! So only showdiff must be switched off. The question is: should we restrict this option to -always or -always -save?
P.s. The script finished in merely 43 minutes, and collected 4246 articles. This is MUCH faster than with showdiff on in Latin wikis, so showdiff is the brake anyway. I will never collect articles with -save and -showdiff again. This is a new world. :-)
pywikipedia-l@lists.wikimedia.org