[Foundation-l] dumps

Anthony wikimail at inbox.org
Tue Feb 24 18:18:57 UTC 2009


On Tue, Feb 24, 2009 at 12:56 PM, Brian <Brian.Mingus at colorado.edu> wrote:

> Its not at all clear why the english wikipedia dump or other large
> dumps need to be compressed. It is far more absurd to spend hundreds
> of days compressing a file than it is to spend tens of days
> downloading one.


There's no reason it needs to take hundreds of days to compress even a
petabyte of data.  Bzip2 compression can be done in parallel, producing a
file which can be decompressed using standard uncompression software.

In any case, there are cost factors to be considered.  Depending on the
number of people downloading the file, compression might save a significant
amount of money.

There also are other, faster ways to make a file smaller besides compression
(like delta encoding), which are probably being looked into.


More information about the foundation-l mailing list