Hi,
I was polling the http://download.wikimedia.org/enwiki/20100622/ page during the pages-meta-history.xml.bz2 database dump and here is some timestamped output from that page showing some errors that caused the dump to fail. Regarding the .bz2 dump format, Tomasz earlier suggested removing it and using .7z. I thought it might be good to keep the .bz2 format due to there being several programs that use it (ie. wikitaxi, bzreader). 7z format is probably the way to go though for the future, but I don't know if this would fix the database dump errors.
cheers, Jamie
-----------------------------------------------
20100719 2:22:14am
# 2010-07-02 14:33:44 in-progress All pages with complete page edit history (.bz2) 2010-07-19 09:22:11: enwiki 889057 pages (0.613/sec), 110108000 revs (75.931/sec), 83.6% prefetched, ETA 2010-08-28 05:12:01 [max 371385750]
* These dumps can be *very* large, uncompressing up to 20 times the archive download size. Suitable for archival and statistical use, most mirror sites won't want or need this. * pages-meta-history.xml.bz2 119.7 GB (written)
-----------------------------------------------
20100719 3:07:16am PST
# 2010-07-02 14:33:44 in-progress All pages with complete page edit history (.bz2) 2010-07-19 10:07:15: enwiki 894194 pages (0.615/sec), 110399000 revs (75.990/sec), 83.6% prefetched, ETA 2010-08-28 04:08:46 [max 371385750]
* These dumps can be *very* large, uncompressing up to 20 times the archive download size. Suitable for archival and statistical use, most mirror sites won't want or need this. * pages-meta-history.xml.bz2 119.9 GB (written)
-----------------------------------------------
20100719 3:22:17am PST
# 2010-07-02 14:33:44 in-progress All pages with complete page edit history (.bz2) Error 2 of allowed 5 retrieving revision text for text id 10595737! Pausing 5 seconds before retry...
* These dumps can be *very* large, uncompressing up to 20 times the archive download size. Suitable for archival and statistical use, most mirror sites won't want or need this. * pages-meta-history.xml.bz2 119.9 GB (written)
-----------------------------------------------
20100719 3:37:18am PST
# 2010-07-02 14:33:44 in-progress All pages with complete page edit history (.bz2) Error 3 of allowed 5 retrieving revision text for text id 13930238! Pausing 5 seconds before retry...
* These dumps can be *very* large, uncompressing up to 20 times the archive download size. Suitable for archival and statistical use, most mirror sites won't want or need this. * pages-meta-history.xml.bz2 119.9 GB (written)
-----------------------------------------------
20100719 3:52:19am PST
# 2010-07-02 14:33:44 in-progress All pages with complete page edit history (.bz2) Error 4 of allowed 5 retrieving revision text for text id 355313550! Pausing 5 seconds before retry...
* These dumps can be *very* large, uncompressing up to 20 times the archive download size. Suitable for archival and statistical use, most mirror sites won't want or need this. * pages-meta-history.xml.bz2 119.9 GB (written)
-----------------------------------------------
20100719 4:07:20am PST
# 2010-07-02 14:33:44 in-progress All pages with complete page edit history (.bz2) Error 3 of allowed 5 retrieving revision text for text id 346806445! Pausing 5 seconds before retry...
* These dumps can be *very* large, uncompressing up to 20 times the archive download size. Suitable for archival and statistical use, most mirror sites won't want or need this. * pages-meta-history.xml.bz2 119.9 GB (written)
-----------------------------------------------
20100719 4:22:21am PST
# 2010-07-02 14:33:44 in-progress All pages with complete page edit history (.bz2) Error 4 of allowed 5 retrieving revision text for text id 351921561! Pausing 5 seconds before retry...
* These dumps can be *very* large, uncompressing up to 20 times the archive download size. Suitable for archival and statistical use, most mirror sites won't want or need this. * pages-meta-history.xml.bz2 119.9 GB (written)
-----------------------------------------------
20100719 4:37:21am PST
# 2010-07-02 14:33:44 in-progress All pages with complete page edit history (.bz2) Error 5 of allowed 5 retrieving revision text for text id 358280940! Pausing 5 seconds before retry...
* These dumps can be *very* large, uncompressing up to 20 times the archive download size. Suitable for archival and statistical use, most mirror sites won't want or need this. * pages-meta-history.xml.bz2 119.9 GB (written)
-----------------------------------------------
20100719 4:52:24am PST
# 2010-07-19 11:37:24 failed All pages with complete page edit history (.bz2) #6 {main}
* These dumps can be *very* large, uncompressing up to 20 times the archive download size. Suitable for archival and statistical use, most mirror sites won't want or need this. * pages-meta-history.xml.bz2
-----------------------------------------------
pages referenced in the above errors:
-----------------------------------------------
http://en.wikipedia.org/w/index.php?oldid=10595737
Brothers in Arms: Road to Hill 30 "This is an old revision of this page, as edited by Colonel Cow (talk | contribs) at 01:02, 17 February 2005."
-----------------------------------------------
http://en.wikipedia.org/w/index.php?oldid=13930238
Brothers in Arms: Road to Hill 30 "This is an old revision of this page, as edited by 213.212.58.66 (talk) at 12:34, 19 May 2005."
-----------------------------------------------
http://en.wikipedia.org/w/index.php?oldid=355313550
User:Peter I. Vardy/sandbox This is an old revision of this page, as edited by Peter I. Vardy (talk | contribs) at 10:53, 11 April 2010.
-----------------------------------------------
http://en.wikipedia.org/w/index.php?oldid=346806445
Talk:Amy Shearn "This is an old revision of this page, as edited by Yobot (talk | contribs) at 02:49, 28 February 2010."
-----------------------------------------------
http://en.wikipedia.org/w/index.php?oldid=351921561
User:Ohms Law Bot/Cleanup/Roy D. Bridges, Jr. "This is an old revision of this page, as edited by Ohms Law Bot (talk | contribs) at 06:26, 25 March 2010."
-----------------------------------------------
http://en.wikipedia.org/w/index.php?oldid=358280940
The Tower Treasure "This is an old revision of this page, as edited by 69.144.24.63 (talk) at 21:36, 25 April 2010."
-----------------------------------------------
----- Original Message ----- From: Dmitry Chichkov dchichkov@gmail.com Date: Tuesday, July 20, 2010 3:31 pm Subject: [Xmldatadumps-l] enwiki dump progress on 20100622 - failed again To: xmldatadumps-l@lists.wikimedia.org
Subj: http://download.wikimedia.org/enwiki/20100622/
Is there anything that can be done to alleviate that problem?
By the way, what's the point of producing .bz2 version of the pages-meta-history.xml dump? Is it easier on the system to produce .bz2 first and .7z after that? From the user's perspective I can tell that .7z is all I need, there is simply no point in working with .bz2 (if .7z is available).
-- Regards, Dmitry