On 13/04/10 16:03, Felipe Ortega wrote:
--- El mar, 13/4/10, Neil
Harris<usenet(a)tonal.clara.co.uk> escribió:
De: Neil Harris<usenet(a)tonal.clara.co.uk>
Asunto: [Xmldatadumps-l] Changing lengths of full dump
Para: Xmldatadumps-l(a)lists.wikimedia.org
Fecha: martes, 13 de abril, 2010 16:01
According to
http://download.wikimedia.org/enwiki/20100130/ , the
pages-meta-history.xml.bz2 file for that dump is 280.3
Gbytes in size.
In the
http://download.wikimedia.org/enwiki/20100312/ dump,
the
corresponding file is only 178.7 Gbytes.
Is this the result of better compression, or has something
gone wrong?
Hi Neal.
Some mails were just exchanged in this mailing list on this. Indeed, there was some
problem in the generation of the last dump.
Best,
F.
Thanks for letting me know.
Since dumps appear to be made incrementally on top of other dumps, there
seems to be a real risk of errors being compounded on top of errors.
Does anyone here know if there have been any attempts to validate the
current enwiki full dump against the database? For example, by selecting
N revisions from the dump at random, and verifying that they exist in
the DB, and vice versa for N revisions selected from the DB at random.
Kind regards,
Neil