--- El mar, 16/3/10, Kevin Webb <kpwebb(a)gmail.com> escribió:
De: Kevin Webb <kpwebb(a)gmail.com>
Asunto: Re: [Xmldatadumps-admin-l] 2010-03-11 01:10:08: enwiki Checksumming
pages-meta-history.xml.bz2 :D
Para: "Tomasz Finc" <tfinc(a)wikimedia.org>
CC: "Wikimedia developers" <wikitech-l(a)lists.wikimedia.org>rg>,
xmldatadumps-admin-l(a)lists.wikimedia.org, Xmldatadumps-l(a)lists.wikimedia.org
Fecha: martes, 16 de marzo, 2010 21:10
I just managed to finish
decompression. That took about 54 hours on an
EC2 2.5x unit CPU. The final data size is 5469GB.
As the process just finished I haven't been able to check
the
integrity of the XML, however, the bzip stream itself
appears to be
good.
As was mentioned previously, it would be great if you could
compress
future archives using pbzib to allow for parallel
decompression. As I
understand it, the pbzip files are reverse compatible with
all
existing bzip2 utilities.
Yes, they're :-).
Regards,
F.
Thanks again for all your work on this!
Kevin
On Tue, Mar 16, 2010 at 4:05 PM, Tomasz Finc <tfinc(a)wikimedia.org>
wrote:
Tomasz Finc wrote:
> New full history en wiki snapshot is hot off the
presses!
>
> It's currently being checksummed which will take a
while for 280GB+ of
> compressed data but for those brave souls
willing
to test please grab it
took just over a month
> and gained a huge speed up after Tims work on
re-compressing ES. If we
> see no hiccups with this data snapshot,
I'll start
mirroring it to other
> locations (internet archive, amazon public
data
sets, etc).
>
> For those not familiar, the last successful run
that we've seen of this
> data goes all the way back to 2008-10-03.
That's
over 1.5 years of
people
waiting to get access to these data bits.
I'm excited to say that we seem to have it :)
So now that we've had it for a couple of days .. can I
get a status
report from someone about its quality?
Even if you had no issues please let us know so that
we start mirroring.
--tomasz
_______________________________________________
Xmldatadumps-admin-l mailing list
Xmldatadumps-admin-l(a)lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/xmldatadumps-admin-l
_______________________________________________
Xmldatadumps-admin-l mailing list
Xmldatadumps-admin-l(a)lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/xmldatadumps-admin-l