[Foundation-l] dumps

Brian Brian.Mingus at colorado.edu
Tue Feb 24 18:24:47 UTC 2009


I am of the understanding that the WMF's bandwidth is very cheap.

If you want to consider costs, I think its appropriate to consider the
costs not only to the WMF but to the user. Different compression
algorithms have different encode/decode ratios but if it takes a
cluster to compress a file there's a good chance you're going to want
one to decompress it. It may in fact be much more user friendly to
simply offer an enormous text file for download because users don't
have to unpack it.

Our mission is to spread knowledge. Compressing that knowledge has
been in the way of spreading it for years now. Its high time we gave
up!


On Tue, Feb 24, 2009 at 11:18 AM, Anthony <wikimail at inbox.org> wrote:
> On Tue, Feb 24, 2009 at 12:56 PM, Brian <Brian.Mingus at colorado.edu> wrote:
>
>> Its not at all clear why the english wikipedia dump or other large
>> dumps need to be compressed. It is far more absurd to spend hundreds
>> of days compressing a file than it is to spend tens of days
>> downloading one.
>
>
> There's no reason it needs to take hundreds of days to compress even a
> petabyte of data.  Bzip2 compression can be done in parallel, producing a
> file which can be decompressed using standard uncompression software.
>
> In any case, there are cost factors to be considered.  Depending on the
> number of people downloading the file, compression might save a significant
> amount of money.
>
> There also are other, faster ways to make a file smaller besides compression
> (like delta encoding), which are probably being looked into.
> _______________________________________________
> foundation-l mailing list
> foundation-l at lists.wikimedia.org
> Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l
>



More information about the foundation-l mailing list