20100719 4:37:21am PST # 2010-07-02 14:33:44 in-progress All pages with complete page edit
history (.bz2)
Error 5 of allowed 5 retrieving revision text for text id 358280940!
Pausing 5 seconds before retry...
Well, my comment here would be that the number of 'allowed errors = 5' and the 'retry delay 5 seconds' seem to be rather small. From that it looks like a 25 seconds database unavailability would cause backup failure. Considering that backup literally takes a month...
I'd suggest setting the error rate to something like 0.01% of the number of revisions. Also an incomplete dump (e.g. with missing revisions texts) is much much better than nothing, so it would only make sense to allow higher error rates or even make the interruption procedure manual.
To put that 0.01% error rate into perspective, according to my estimates the error rate in the lase "complete" database dump [enwiki-20100130 31.9GB/280GB] was at least ~0.4% (missing revisions texts due to backup process failures).
-- Regards, Dmitry
On Wed, Jul 21, 2010 at 4:03 AM, Jamie Morken jmorken@shaw.ca wrote:
Hi,
I was polling the http://download.wikimedia.org/enwiki/20100622/ page during the pages-meta-history.xml.bz2 database dump and here is some timestamped output from that page showing some errors that caused the dump to fail. Regarding the .bz2 dump format, Tomasz earlier suggested removing it and using .7z. I thought it might be good to keep the .bz2 format due to there being several programs that use it (ie. wikitaxi, bzreader). 7z format is probably the way to go though for the future, but I don't know if this would fix the database dump errors.
cheers, Jamie
20100719 2:22:14am
# 2010-07-02 14:33:44 in-progress All pages with complete page edit history (.bz2) 2010-07-19 09:22:11: enwiki 889057 pages (0.613/sec), 110108000 revs (75.931/sec), 83.6% prefetched, ETA 2010-08-28 05:12:01 [max 371385750]
* These dumps can be *very* large, uncompressing up to 20 times the
archive download size. Suitable for archival and statistical use, most mirror sites won't want or need this. * pages-meta-history.xml.bz2 119.7 GB (written)
20100719 3:07:16am PST
# 2010-07-02 14:33:44 in-progress All pages with complete page edit history (.bz2) 2010-07-19 10:07:15: enwiki 894194 pages (0.615/sec), 110399000 revs (75.990/sec), 83.6% prefetched, ETA 2010-08-28 04:08:46 [max 371385750]
* These dumps can be *very* large, uncompressing up to 20 times the
archive download size. Suitable for archival and statistical use, most mirror sites won't want or need this. * pages-meta-history.xml.bz2 119.9 GB (written)
20100719 3:22:17am PST
# 2010-07-02 14:33:44 in-progress All pages with complete page edit history (.bz2) Error 2 of allowed 5 retrieving revision text for text id 10595737! Pausing 5 seconds before retry...
* These dumps can be *very* large, uncompressing up to 20 times the
archive download size. Suitable for archival and statistical use, most mirror sites won't want or need this. * pages-meta-history.xml.bz2 119.9 GB (written)
20100719 3:37:18am PST
# 2010-07-02 14:33:44 in-progress All pages with complete page edit history (.bz2) Error 3 of allowed 5 retrieving revision text for text id 13930238! Pausing 5 seconds before retry...
* These dumps can be *very* large, uncompressing up to 20 times the
archive download size. Suitable for archival and statistical use, most mirror sites won't want or need this. * pages-meta-history.xml.bz2 119.9 GB (written)
20100719 3:52:19am PST
# 2010-07-02 14:33:44 in-progress All pages with complete page edit history (.bz2) Error 4 of allowed 5 retrieving revision text for text id 355313550! Pausing 5 seconds before retry...
* These dumps can be *very* large, uncompressing up to 20 times the
archive download size. Suitable for archival and statistical use, most mirror sites won't want or need this. * pages-meta-history.xml.bz2 119.9 GB (written)
20100719 4:07:20am PST
# 2010-07-02 14:33:44 in-progress All pages with complete page edit history (.bz2) Error 3 of allowed 5 retrieving revision text for text id 346806445! Pausing 5 seconds before retry...
* These dumps can be *very* large, uncompressing up to 20 times the
archive download size. Suitable for archival and statistical use, most mirror sites won't want or need this. * pages-meta-history.xml.bz2 119.9 GB (written)
20100719 4:22:21am PST
# 2010-07-02 14:33:44 in-progress All pages with complete page edit history (.bz2) Error 4 of allowed 5 retrieving revision text for text id 351921561! Pausing 5 seconds before retry...
* These dumps can be *very* large, uncompressing up to 20 times the
archive download size. Suitable for archival and statistical use, most mirror sites won't want or need this. * pages-meta-history.xml.bz2 119.9 GB (written)
20100719 4:37:21am PST
# 2010-07-02 14:33:44 in-progress All pages with complete page edit history (.bz2) Error 5 of allowed 5 retrieving revision text for text id 358280940! Pausing 5 seconds before retry...
* These dumps can be *very* large, uncompressing up to 20 times the
archive download size. Suitable for archival and statistical use, most mirror sites won't want or need this. * pages-meta-history.xml.bz2 119.9 GB (written)
20100719 4:52:24am PST
# 2010-07-19 11:37:24 failed All pages with complete page edit history (.bz2) #6 {main}
* These dumps can be *very* large, uncompressing up to 20 times the
archive download size. Suitable for archival and statistical use, most mirror sites won't want or need this. * pages-meta-history.xml.bz2
pages referenced in the above errors:
http://en.wikipedia.org/w/index.php?oldid=10595737
Brothers in Arms: Road to Hill 30 "This is an old revision of this page, as edited by Colonel Cow (talk | contribs) at 01:02, 17 February 2005."
http://en.wikipedia.org/w/index.php?oldid=13930238
Brothers in Arms: Road to Hill 30 "This is an old revision of this page, as edited by 213.212.58.66 (talk) at 12:34, 19 May 2005."
http://en.wikipedia.org/w/index.php?oldid=355313550
User:Peter I. Vardy/sandbox This is an old revision of this page, as edited by Peter I. Vardy (talk | contribs) at 10:53, 11 April 2010.
http://en.wikipedia.org/w/index.php?oldid=346806445
Talk:Amy Shearn "This is an old revision of this page, as edited by Yobot (talk | contribs) at 02:49, 28 February 2010."
http://en.wikipedia.org/w/index.php?oldid=351921561
User:Ohms Law Bot/Cleanup/Roy D. Bridges, Jr. "This is an old revision of this page, as edited by Ohms Law Bot (talk | contribs) at 06:26, 25 March 2010."
http://en.wikipedia.org/w/index.php?oldid=358280940
The Tower Treasure "This is an old revision of this page, as edited by 69.144.24.63 (talk) at 21:36, 25 April 2010."
----- Original Message ----- From: Dmitry Chichkov dchichkov@gmail.com Date: Tuesday, July 20, 2010 3:31 pm Subject: [Xmldatadumps-l] enwiki dump progress on 20100622 - failed again To: xmldatadumps-l@lists.wikimedia.org
Subj: http://download.wikimedia.org/enwiki/20100622/
Is there anything that can be done to alleviate that problem?
By the way, what's the point of producing .bz2 version of the pages-meta-history.xml dump? Is it easier on the system to produce .bz2 first and .7z after that? From the user's perspective I can tell that .7z is all I need, there is simply no point in working with .bz2 (if .7z is available).
-- Regards, Dmitry