Xmldatadumps-l April 2010

xmldatadumps-l@lists.wikimedia.org

11 participants
7 discussions

Current Status
by Conrad Irwin 04 May '10

04 May '10

I notice the dumps seem currently frozen, is this the best place to ask for information, or is it publicly available somewhere else? (in which case sorry for pestering). Conrad

4 8

suscribir
by jcms 29 Apr '10

29 Apr '10

-- Este mensaje le ha llegado mediante el servicio de correo electronico que ofrece Infomed para respaldar el cumplimiento de las misiones del Sistem a Nacional de Salud. La persona que envia este correo asume el compromiso de usar el servicio a tales fines y cumplir con las regulaciones establecidas Infomed: http://www.sld.cu/

1 0

Dumps are stopped
by Andreas Meier 28 Apr '10

28 Apr '10

Hello, the dump processes seem to have stopped again. Best regards Andreas

2 1

Current Status
by Erik Zachte 27 Apr '10

27 Apr '10

Tomasz: > Then I further split it for ops and general tech. > Let me know how well you think that has worked. I would favor combining the two. They are both very low traffic and I noticed other users were also confused in the past. But if the split up is handier for ops, it's no big deal. Erik Zachte

2 1

Changing lengths of full dump
by Erik Zachte 16 Apr '10

16 Apr '10

Jamie: > I thought the file size would grow fairly linearly with the page count, > but for the last 10% or so of the pages the file size hardly grew at all. Pages in the dump are in order of page id, and thus more or less in order of creation date. Pages in the end of the dump are small, more often stubs, with few revisions. Erik Zachte

2 1

Changing lengths of full dump
by Neil Harris 16 Apr '10

16 Apr '10

According to http://download.wikimedia.org/enwiki/20100130/ , the pages-meta-history.xml.bz2 file for that dump is 280.3 Gbytes in size. In the http://download.wikimedia.org/enwiki/20100312/ dump, the corresponding file is only 178.7 Gbytes. Is this the result of better compression, or has something gone wrong? Kind regards, Neil

4 4

Re: [Xmldatadumps-l] 2010-03-11 01:10:08: enwiki Checksumming pages-meta-history.xml.bz2 :D
by Tomasz Finc 08 Apr '10

08 Apr '10

Tomasz Finc wrote: > New full history en wiki snapshot is hot off the presses! > > It's currently being checksummed which will take a while for 280GB+ of > compressed data but for those brave souls willing to test please grab it > from > > http://download.wikipedia.org/enwiki/20100130/enwiki-20100130-pages-meta-hi… > > > and give us feedback about its quality. This run took just over a month > and gained a huge speed up after Tims work on re-compressing ES. If we > see no hiccups with this data snapshot, I'll start mirroring it to other > locations (internet archive, amazon public data sets, etc). > > For those not familiar, the last successful run that we've seen of this > data goes all the way back to 2008-10-03. That's over 1.5 years of > people waiting to get access to these data bits. > > I'm excited to say that we seem to have it :) > > --tomasz We now have an md5sum for enwiki-20100130-pages-meta-history.xml.bz2. "65677bc275442c7579857cc26b355ded" Please verify against it before filing issues. --tomasz

4 8

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

Xmldatadumps-l April 2010