I'd like to add that the md5 of the *uncompressed* file is cd4eee6d3d745ce716db2931c160ee35 . That's what I got from both the uncompressed 7z and the uncompressed bz2. They matched, whew. Uncompressing and md5ing the bz2 took well over a week. Uncompressing and md5ing the 7z took less than a day.
You can find all the md5sums at
http://download.wikipedia.org/enwiki/20100130/enwiki-20100130-md5sums.txt
--tomasz
Anthony wrote:
Got an md5sum?<mailto:Wikitech-l@lists.wikimedia.org>
On Mon, Mar 29, 2010 at 5:46 PM, Tomasz Finc <tfinc@wikimedia.org <mailto:tfinc@wikimedia.org>> wrote:
I love lzma compression.
enwiki-20100130-pages-meta-history.xml.bz2 280.3 GB
enwiki-20100130-pages-meta-history.xml.7z 31.9 GB
Download at http://tinyurl.com/yeelbse
Enjoy!
--tomasz
Tomasz Finc wrote:
> Tomasz Finc wrote:
>> New full history en wiki snapshot is hot off the presses!
>>
>> It's currently being checksummed which will take a while for
280GB+ of
>> compressed data but for those brave souls willing to test please
grab it
>> from
>>
>>
http://download.wikipedia.org/enwiki/20100130/enwiki-20100130-pages-meta-history.xml.bz2
>>
>>
>> and give us feedback about its quality. This run took just over
a month
>> and gained a huge speed up after Tims work on re-compressing ES.
If we
>> see no hiccups with this data snapshot, I'll start mirroring it
to other
>> locations (internet archive, amazon public data sets, etc).
>>
>> For those not familiar, the last successful run that we've seen
of this
>> data goes all the way back to 2008-10-03. That's over 1.5 years of
>> people waiting to get access to these data bits.
>>
>> I'm excited to say that we seem to have it :)
>>
>> --tomasz
>
> We now have an md5sum for enwiki-20100130-pages-meta-history.xml.bz2.
>
> "65677bc275442c7579857cc26b355ded"
>
> Please verify against it before filing issues.
>
> --tomasz
>
>
> _______________________________________________
> Wikitech-l mailing list
> Wikitech-l@lists.wikimedia.org<mailto:Xmldatadumps-admin-l@lists.wikimedia.org>
> https://lists.wikimedia.org/mailman/listinfo/wikitech-l
_______________________________________________
Xmldatadumps-admin-l mailing list
Xmldatadumps-admin-l@lists.wikimedia.org