Look at this way: you can't get enwiki dumps more than once every six weeks. Each one TAKES SIX WEEKS. (modulo lots of stuff, I'm simplifying a bit ;-)
The example I have used before is going into my bank: in the main Queensway office, there will be 50-100 people on the queue. When there are 8-10 tellers, it will go well; except that some transactions (depositing some cash) take a minute or so, and some take many, many minutes. If there are 8 tellers, and 8 people in front of you with 20-30 minute transactions, you are toast. (They handle this by having fast lines for deposits and such ;-)
In general, one queue feeding multiple servers/threads works very nicely if the tasks are about the same size.
But what we have here is projects that take less than a minute, in the same queue with projects that take weeks. That is 5 orders of magnitude: in the time in takes to do the enwiki dump, the same thread could do ONE HUNDRED THOUSAND small projects.
Imagine walking into your bank with a 30 second transaction, and being told it couldn't be completed for 6 weeks because there were 3 officers available, and 5 people who needed complicated loan approvals on the queue in front of you.
That's the way the dumps are set up right now.
On Sat, Oct 11, 2008 at 2:49 AM, Thomas Dalton thomas.dalton@gmail.comwrote:
I'm trying to work out if it is actually desirable to separate the larger projects onto one thread. The only way you can have a smaller project dumped more often is the have the larger ones dumped less often, but do we really want less frequent enwiki dumps? By separateing them and sharing them fairly between the threads you can get more regular dumps, but the significant number is surely the amount of time between one dump of your favourite project and the next, which will only change if you share the projects unfairly. Why do we want small projects to be dumped more frequently than large projects?
I guess the answer, really, is to get more servers doing dumps - I'm sure that will come in time.
Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l