You also have to take demand into consideration; how many people are
waiting for dumps of enwiki, dewiki, etc. vs. how many are waiting for
the smaller wikis? (Not a rhetorical question, I'd be interested in the
answer.) To use the bank analogy, if everyone is waiting for a loan, you
don't move your loan officers to the teller windows just because they
can process small transactions faster. Note also that several dozen of
the smallest wikis have fewer than 5000 articles. If someone has a bot
or sysop account, they can get the current revision of every article
with a single API query. While a dump would be more efficient and
probably slightly faster, getting the current revision for every article
on a large wiki basically requires a dump.
Robert Ullmann wrote:
Look at this way: you can't get enwiki dumps more
than once every six weeks.
Each one TAKES SIX WEEKS. (modulo lots of stuff, I'm simplifying a bit ;-)
The example I have used before is going into my bank: in the main Queensway
office, there will be 50-100 people on the queue. When there are 8-10
tellers, it will go well; except that some transactions (depositing some
cash) take a minute or so, and some take many, many minutes. If there are 8
tellers, and 8 people in front of you with 20-30 minute transactions, you
are toast. (They handle this by having fast lines for deposits and such ;-)
In general, one queue feeding multiple servers/threads works very nicely if
the tasks are about the same size.
But what we have here is projects that take less than a minute, in the same
queue with projects that take weeks. That is 5 orders of magnitude: in the
time in takes to do the enwiki dump, the same thread could do ONE HUNDRED
THOUSAND small projects.
Imagine walking into your bank with a 30 second transaction, and being told
it couldn't be completed for 6 weeks because there were 3 officers
available, and 5 people who needed complicated loan approvals on the queue
in front of you.
That's the way the dumps are set up right now.
On Sat, Oct 11, 2008 at 2:49 AM, Thomas Dalton <thomas.dalton(a)gmail.com>wrote;wrote:
> I'm trying to work out if it is actually desirable to separate the
> larger projects onto one thread. The only way you can have a smaller
> project dumped more often is the have the larger ones dumped less
> often, but do we really want less frequent enwiki dumps? By
> separateing them and sharing them fairly between the threads you can
> get more regular dumps, but the significant number is surely the
> amount of time between one dump of your favourite project and the
> next, which will only change if you share the projects unfairly. Why
> do we want small projects to be dumped more frequently than large
> projects?
>
> I guess the answer, really, is to get more servers doing dumps - I'm
> sure that will come in time.
>
--
Alex (wikipedia:en:User:Mr.Z-man)