But server space saved by compression would be would be compensated by the
stability, and flexibility provided by this method. this would allow what
ever server is controlling the dump process to designate and delegate
parallel processes for the same dump. so block 1 could be on server 1 and
block 2 could be on server 3. that would give the flexibility to use as many
servers as are available for this task more efficiently. if block 200 of
en.wp breaks for some reason you dont have to rebuild the previous 199
blocks you can just delegate a server to rebuild that single block. that
would allow the dump process to be a little more crash friendly (even though
I know we dont want to admit crashes happen :) ) this also enables the dump
time in future dumps to be cut drasticlly. Id recommend either 10m or 10% of
the database which ever is larger for new dumps to screen out a majority of
the deletions. what are your thoughts on this process brion (and the rest of
the tech team)?
Betacommand
On Wed, Feb 25, 2009 at 9:00 AM, Thomas Dalton <thomas.dalton(a)gmail.com>wrote;wrote:
2009/2/25 Robert Ullmann <rlullmann(a)gmail.com>om>:
I suggest the history be partitioned into
"blocks" by *revision ID*
Like this: revision IDs (0)-999,999 go in "block 0", 1M to 2M-1 in
"block 1", and so on. The English Wiktionary at the moment would have
7 blocks; the English Wikipedia would have 273.
One problem with that is that you won't get such good compression
ratios. Most of the revisions of a single article are very similar to
the revisions before and after it, so they compress down very small.
If you break up the articles between different blocks you don't get
that advantage (at least, not to the same extent).
_______________________________________________
Wikitech-l mailing list
Wikitech-l(a)lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l