2018-09-16 22:03 GMT+02:00 Bináris <wikiposta(a)gmail.com>om>:
The bot scanned the latest huwiki dump for 14
hours(!). (Not the whole
dump, I used -xmlstart.) It went through 820 thousand pages and found 240+
matches (I displayed every 10th match).
Then the bot worked further 30-40 minutes to check the actual pages from
live wiki, this time with namespace filtering on. (I don't replace in this
phase, just save the list, so no human interaction is implied in this time.)
Guess the result! 62 out of 240 remained. This means that the bigger part
of these 14 hours went into /dev/null.
Now I realize how much time I wasted in the past 10 years. :-(
I was not quite right. With the modified code it took 12 hours instead of
14, 630,000 pages were scanned instead of 820,000 and 83 matches found
instead of 240+ (of which 62 are real). Bt this is still not the same.