[Xmldatadumps-l] [Xmldatadumps-admin-l] 2010-03-11 01:10:08: enwiki Checksumming pages-meta-history.xml.bz2 :D

Tomasz Finc tfinc at wikimedia.org
Tue Mar 16 20:05:15 UTC 2010


Tomasz Finc wrote:
> New full history en wiki snapshot is hot off the presses!
> 
> It's currently being checksummed which will take a while for 280GB+ of 
> compressed data but for those brave souls willing to test please grab it 
> from
> 
> http://download.wikipedia.org/enwiki/20100130/enwiki-20100130-pages-meta-history.xml.bz2
> 
> and give us feedback about its quality. This run took just over a month 
> and gained a huge speed up after Tims work on re-compressing ES. If we 
> see no hiccups with this data snapshot, I'll start mirroring it to other 
> locations (internet archive, amazon public data sets, etc).
> 
> For those not familiar, the last successful run that we've seen of this 
> data goes all the way back to 2008-10-03. That's over 1.5 years of 
> people waiting to get access to these data bits.
> 
> I'm excited to say that we seem to have it :)

So now that we've had it for a couple of days .. can I get a status 
report from someone about its quality?

Even if you had no issues please let us know so that we start mirroring.

--tomasz



More information about the Xmldatadumps-l mailing list