[Foundation-l] Wikistats is back

Brian Brian.Mingus at colorado.edu
Thu Dec 25 00:12:34 UTC 2008


Also, I wonder if these folks have been consulted for their expertise in
compressing wikipedia data: http://prize.hutter1.net/

On Wed, Dec 24, 2008 at 5:09 PM, Brian <Brian.Mingus at colorado.edu> wrote:

> Interesting. I realize that the dump is extremely large, but if 7zip is
> really the bottleneck then to me the solutions are straightforward:
>
> 1. Offer an uncompressed version of the dump for download. Bandwidth is
> cheap and downloads can be resumed, unlike this dump process
> 2. The WMF offers a service whereby the mail the uncompressed dump to you
> on a hard drive. You pay for the drive and a service charge.
>
> Cheers,
>
>
>
> On Wed, Dec 24, 2008 at 5:03 PM, Erik Zachte <erikzachte at infodisiac.com>wrote:
>
>> Hi Brian, Brion once explained to me that the post processing of the dump
>> is
>> the main bottleneck.
>>
>> Compressing articles with tens of thousands of revisions is a major
>> resource
>> drain.
>> Right now every dump is even compressed twice, into bzip2 (for wider
>> platform compatibility) and 7zip format (for 20 times smaller downloads).
>> This may no longer be needed as 7zip presumably gained better support on
>> major platforms over the years.
>> Apart from that the job could gain from parallelization and better error
>> recovery.
>>
>> Erik Zachte
>>
>> ________________________________________
>>
>> I am still quite shocked at the amount of time the english wikipedia takes
>> to dump, especially since we seem to have close links to folks who work at
>> mysql. To me it seems that one of two things must be the case:
>>
>> 1. Wikipedia has outgrown mysql, in the sense that, while we can put data
>> in, we cannot get it all back out.
>> 2. Despite aggressive hardware purchases over the years, the correct
>> hardware has still not been purchased.
>>
>> I wonder which of these is the case. Presumably #2 ?
>>
>> Cheers,
>> Brian
>>
>>
>>
>>
>> _______________________________________________
>> foundation-l mailing list
>> foundation-l at lists.wikimedia.org
>> Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l
>>
>
>
>
> --
> (Not sent from my iPhone)
>



-- 
(Not sent from my iPhone)


More information about the foundation-l mailing list