On 13/04/10 16:03, Felipe Ortega wrote:
--- El mar, 13/4/10, Neil Harrisusenet@tonal.clara.co.uk escribió:
De: Neil Harrisusenet@tonal.clara.co.uk Asunto: [Xmldatadumps-l] Changing lengths of full dump Para: Xmldatadumps-l@lists.wikimedia.org Fecha: martes, 13 de abril, 2010 16:01 According to http://download.wikimedia.org/enwiki/20100130/ , the pages-meta-history.xml.bz2 file for that dump is 280.3 Gbytes in size.
In the http://download.wikimedia.org/enwiki/20100312/ dump, the corresponding file is only 178.7 Gbytes.
Is this the result of better compression, or has something gone wrong?
Hi Neal.
Some mails were just exchanged in this mailing list on this. Indeed, there was some problem in the generation of the last dump.
Best, F.
Thanks for letting me know.
Since dumps appear to be made incrementally on top of other dumps, there seems to be a real risk of errors being compounded on top of errors. Does anyone here know if there have been any attempts to validate the current enwiki full dump against the database? For example, by selecting N revisions from the dump at random, and verifying that they exist in the DB, and vice versa for N revisions selected from the DB at random.
Kind regards,
Neil