[Foundation-l] Wikistats is back

Brian Brian.Mingus at colorado.edu
Thu Dec 25 00:09:08 UTC 2008


Interesting. I realize that the dump is extremely large, but if 7zip is
really the bottleneck then to me the solutions are straightforward:

1. Offer an uncompressed version of the dump for download. Bandwidth is
cheap and downloads can be resumed, unlike this dump process
2. The WMF offers a service whereby the mail the uncompressed dump to you on
a hard drive. You pay for the drive and a service charge.

Cheers,


On Wed, Dec 24, 2008 at 5:03 PM, Erik Zachte <erikzachte at infodisiac.com>wrote:

> Hi Brian, Brion once explained to me that the post processing of the dump
> is
> the main bottleneck.
>
> Compressing articles with tens of thousands of revisions is a major
> resource
> drain.
> Right now every dump is even compressed twice, into bzip2 (for wider
> platform compatibility) and 7zip format (for 20 times smaller downloads).
> This may no longer be needed as 7zip presumably gained better support on
> major platforms over the years.
> Apart from that the job could gain from parallelization and better error
> recovery.
>
> Erik Zachte
>
> ________________________________________
>
> I am still quite shocked at the amount of time the english wikipedia takes
> to dump, especially since we seem to have close links to folks who work at
> mysql. To me it seems that one of two things must be the case:
>
> 1. Wikipedia has outgrown mysql, in the sense that, while we can put data
> in, we cannot get it all back out.
> 2. Despite aggressive hardware purchases over the years, the correct
> hardware has still not been purchased.
>
> I wonder which of these is the case. Presumably #2 ?
>
> Cheers,
> Brian
>
>
>
>
> _______________________________________________
> foundation-l mailing list
> foundation-l at lists.wikimedia.org
> Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l
>



-- 
(Not sent from my iPhone)


More information about the foundation-l mailing list