On Tue, Feb 24, 2009 at 12:56 PM, Brian <Brian.Mingus(a)colorado.edu> wrote:
Its not at all clear why the english wikipedia dump or
other large
dumps need to be compressed. It is far more absurd to spend hundreds
of days compressing a file than it is to spend tens of days
downloading one.
There's no reason it needs to take hundreds of days to compress even a
petabyte of data. Bzip2 compression can be done in parallel, producing a
file which can be decompressed using standard uncompression software.
In any case, there are cost factors to be considered. Depending on the
number of people downloading the file, compression might save a significant
amount of money.
There also are other, faster ways to make a file smaller besides compression
(like delta encoding), which are probably being looked into.