On Sat, Oct 11, 2008 at 6:32 PM, Thomas Dalton <thomas.dalton(a)gmail.com>wrote;wrote:
2008/10/11 Nicolas Dumazet <nicdumz(a)gmail.com>om>:
So this increases the frequency of dumps for
small wikis, great.
But this means that the time beetween two dumps of the big wikis is
_at_least_ the sum of the times needed to dump each one of the big
wikis... more than 10, 12 weeks, not counting any failure ? I don't
think that you really want to do this
Exactly. The only way you can speed up the smaller dumps is the slow
down the bigger ones (or throw more money at the problem), and no-one
has given any reason why we should prioritise smaller dumps.
Processing a huge wiki for the bot owners, etc. takes a longer, not having a
fresh dump so often would not be felt until the jobs run on the previous are
complete.
On the other hand, jobs on smaller or medium complete far faster, so the bot
owners would be idle for much more time, than a bot owner working on a
larger wiki.
Downloading the whole Wikipedia article by article after a certain size
becomes too slow to be a comfortable option (it costs and wastes bandwith;
and more importantly the valuable time of the editor overseeing the given
bot downloading, analysing articles to do nothing until it finds its target
[as opposed to finding all the targets fast, and then working on just
them]).
Bence Damokos
_______________________________________________
Wikitech-l mailing list
Wikitech-l(a)lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l