My longest bot job on enwiki lasts one week, way less than.... 12 weeks. Processing a few thousand pages on small wikis takes only a few hours. I don't know any bot job running for a longer time than the time between two dumps of the considered wiki.
The more time you put between two dumps, the more changes there are, the longer are usually the bot jobs. It also means that having a dump let's say every week for small wikis do not add much for bot jobs : if the job consists in fixing a single type of mistake, chances are that during one week, only tens of these mistakes would have been introduced, and the bot job is likely to run really quickly
For bot jobs, I really don't see any advantages in reducing the time between dumps for small wikis. There is not a lot of activity, meaning not a lot to do.
Other applications might require fresher updates of small wiki dumps, but I don't know any bot tasks needing a faster dump rate.
2008/10/12 Bence Damokos bdamokos@gmail.com:
On Sat, Oct 11, 2008 at 6:32 PM, Thomas Dalton thomas.dalton@gmail.comwrote:
2008/10/11 Nicolas Dumazet nicdumz@gmail.com:
So this increases the frequency of dumps for small wikis, great.
But this means that the time beetween two dumps of the big wikis is _at_least_ the sum of the times needed to dump each one of the big wikis... more than 10, 12 weeks, not counting any failure ? I don't think that you really want to do this
Exactly. The only way you can speed up the smaller dumps is the slow down the bigger ones (or throw more money at the problem), and no-one has given any reason why we should prioritise smaller dumps.
Processing a huge wiki for the bot owners, etc. takes a longer, not having a fresh dump so often would not be felt until the jobs run on the previous are complete. On the other hand, jobs on smaller or medium complete far faster, so the bot owners would be idle for much more time, than a bot owner working on a larger wiki. Downloading the whole Wikipedia article by article after a certain size becomes too slow to be a comfortable option (it costs and wastes bandwith; and more importantly the valuable time of the editor overseeing the given bot downloading, analysing articles to do nothing until it finds its target [as opposed to finding all the targets fast, and then working on just them]).
Bence Damokos
Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l