Yes, it's a known problem; you should be able to download the pieces
instead; yes code is being tested to detect truncated files and flag
them. In the meantime I have to do some other testing to see whether
we're running into some constraint running this many jobs at once, which
causes the bzips to die off or be killed off.
Ariel
Στις 05-07-2011, ημέρα Τρι, και ώρα 14:09 -0700, ο/η Eric Sun έγραψε:
The latest enwiki pages dump of
enwiki-latest-pages-articles.xml.bz2
in
http://dumps.wikimedia.org/enwiki/latest/ is only 5.8 GB.
Previous versions, e.g.
http://dumps.wikimedia.org/enwiki/20110526/ and
http://dumps.wikimedia.org/enwiki/20110405/
have been consistently around 6.7-6.8GB.
I saw this after noticing that many pages are missing from the newest
dump, e.g.
http://en.wikipedia.org/wiki/Liar_Liar and
http://en.wikipedia.org/wiki/Juan_que_re%C3%ADa.
Is this a known problem? Can anything be done to prevent this in the
future?
Thanks,
Eric
_______________________________________________
Xmldatadumps-l mailing list
Xmldatadumps-l(a)lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/xmldatadumps-l