[Foundation-l] Wikistats is back

Robert Rohde rarohde at gmail.com
Thu Dec 25 01:46:27 UTC 2008


On Wed, Dec 24, 2008 at 4:09 PM, Brian <Brian.Mingus at colorado.edu> wrote:
> Interesting. I realize that the dump is extremely large, but if 7zip is
> really the bottleneck then to me the solutions are straightforward:
>
> 1. Offer an uncompressed version of the dump for download. Bandwidth is
> cheap and downloads can be resumed, unlike this dump process
> 2. The WMF offers a service whereby the mail the uncompressed dump to you on
> a hard drive. You pay for the drive and a service charge.

I would estimate a complete, uncompressed enwiki dump in the present
format at ~3 TB in size.  ruwiki, which has about 5% as many revisions
as enwiki, has a 187 GB uncompressed dump.

At 3 TB, virtually any mechanism of distributing an uncompressed dump
would be very problematic.

7zip currently achieves greater than 99% size reduction.

-Robert Rohde



More information about the foundation-l mailing list