On Sat, Oct 11, 2008 at 6:32 PM, Thomas Dalton thomas.dalton@gmail.comwrote:
2008/10/11 Nicolas Dumazet nicdumz@gmail.com:
So this increases the frequency of dumps for small wikis, great.
But this means that the time beetween two dumps of the big wikis is _at_least_ the sum of the times needed to dump each one of the big wikis... more than 10, 12 weeks, not counting any failure ? I don't think that you really want to do this
Exactly. The only way you can speed up the smaller dumps is the slow down the bigger ones (or throw more money at the problem), and no-one has given any reason why we should prioritise smaller dumps.
Processing a huge wiki for the bot owners, etc. takes a longer, not having a fresh dump so often would not be felt until the jobs run on the previous are complete. On the other hand, jobs on smaller or medium complete far faster, so the bot owners would be idle for much more time, than a bot owner working on a larger wiki. Downloading the whole Wikipedia article by article after a certain size becomes too slow to be a comfortable option (it costs and wastes bandwith; and more importantly the valuable time of the editor overseeing the given bot downloading, analysing articles to do nothing until it finds its target [as opposed to finding all the targets fast, and then working on just them]).
Bence Damokos
Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l