These don't cause failure of the backups; a separate (much larger)
number of failed retrieved revisions causes that.
We do want to bail on attempts to retrieve a revision after a few tries
since some revisions are irrecoverable.
Ariel
Στις 21-07-2010, ημέρα Τετ, και ώρα 15:39 -0700, ο/η Dmitry Chichkov
έγραψε:
> 20100719 4:37:21am PST
> # 2010-07-02 14:33:44 in-progress All pages with complete page
edit history
(.bz2)
> Error 5 of allowed 5 retrieving revision text
for text id
358280940! Pausing 5 seconds before retry...
Well, my comment here would be that the number of 'allowed errors = 5'
and the 'retry delay 5 seconds' seem to be rather small. From that it
looks like a 25 seconds database unavailability would cause backup
failure. Considering that backup literally takes a month...
I'd suggest setting the error rate to something like 0.01% of the
number of revisions. Also an incomplete dump (e.g. with missing
revisions texts) is much much better than nothing, so it would only
make sense to allow higher error rates or even make the interruption
procedure manual.
To put that 0.01% error rate into perspective, according to my
estimates the error rate in the lase "complete" database dump
[enwiki-20100130 31.9GB/280GB] was at least ~0.4% (missing revisions
texts due to backup process failures).
-- Regards, Dmitry
On Wed, Jul 21, 2010 at 4:03 AM, Jamie Morken <jmorken(a)shaw.ca> wrote:
Hi,
I was polling the
http://download.wikimedia.org/enwiki/20100622/ page during the
pages-meta-history.xml.bz2 database dump and here is some
timestamped output from that page showing some errors that
caused the dump to fail. Regarding the .bz2 dump format,
Tomasz earlier suggested removing it and using .7z. I thought
it might be good to keep the .bz2 format due to there being
several programs that use it (ie. wikitaxi, bzreader). 7z
format is probably the way to go though for the future, but I
don't know if this would fix the database dump errors.
cheers,
Jamie
-----------------------------------------------
20100719 2:22:14am
# 2010-07-02 14:33:44 in-progress All pages with complete
page edit history (.bz2)
2010-07-19 09:22:11: enwiki 889057 pages (0.613/sec),
110108000 revs (75.931/sec), 83.6% prefetched, ETA 2010-08-28
05:12:01 [max 371385750]
* These dumps can be *very* large, uncompressing up to 20
times the archive download size. Suitable for archival and
statistical use, most mirror sites won't want or need this.
* pages-meta-history.xml.bz2 119.7 GB (written)
-----------------------------------------------
20100719 3:07:16am PST
# 2010-07-02 14:33:44 in-progress All pages with complete
page edit history (.bz2)
2010-07-19 10:07:15: enwiki 894194 pages (0.615/sec),
110399000 revs (75.990/sec), 83.6% prefetched, ETA 2010-08-28
04:08:46 [max 371385750]
* These dumps can be *very* large, uncompressing up to 20
times the archive download size. Suitable for archival and
statistical use, most mirror sites won't want or need this.
* pages-meta-history.xml.bz2 119.9 GB (written)
-----------------------------------------------
20100719 3:22:17am PST
# 2010-07-02 14:33:44 in-progress All pages with complete
page edit history (.bz2)
Error 2 of allowed 5 retrieving revision text for text id
10595737! Pausing 5 seconds before retry...
* These dumps can be *very* large, uncompressing up to 20
times the archive download size. Suitable for archival and
statistical use, most mirror sites won't want or need this.
* pages-meta-history.xml.bz2 119.9 GB (written)
-----------------------------------------------
20100719 3:37:18am PST
# 2010-07-02 14:33:44 in-progress All pages with complete
page edit history (.bz2)
Error 3 of allowed 5 retrieving revision text for text id
13930238! Pausing 5 seconds before retry...
* These dumps can be *very* large, uncompressing up to 20
times the archive download size. Suitable for archival and
statistical use, most mirror sites won't want or need this.
* pages-meta-history.xml.bz2 119.9 GB (written)
-----------------------------------------------
20100719 3:52:19am PST
# 2010-07-02 14:33:44 in-progress All pages with complete
page edit history (.bz2)
Error 4 of allowed 5 retrieving revision text for text id
355313550! Pausing 5 seconds before retry...
* These dumps can be *very* large, uncompressing up to 20
times the archive download size. Suitable for archival and
statistical use, most mirror sites won't want or need this.
* pages-meta-history.xml.bz2 119.9 GB (written)
-----------------------------------------------
20100719 4:07:20am PST
# 2010-07-02 14:33:44 in-progress All pages with complete
page edit history (.bz2)
Error 3 of allowed 5 retrieving revision text for text id
346806445! Pausing 5 seconds before retry...
* These dumps can be *very* large, uncompressing up to 20
times the archive download size. Suitable for archival and
statistical use, most mirror sites won't want or need this.
* pages-meta-history.xml.bz2 119.9 GB (written)
-----------------------------------------------
20100719 4:22:21am PST
# 2010-07-02 14:33:44 in-progress All pages with complete
page edit history (.bz2)
Error 4 of allowed 5 retrieving revision text for text id
351921561! Pausing 5 seconds before retry...
* These dumps can be *very* large, uncompressing up to 20
times the archive download size. Suitable for archival and
statistical use, most mirror sites won't want or need this.
* pages-meta-history.xml.bz2 119.9 GB (written)
-----------------------------------------------
20100719 4:37:21am PST
# 2010-07-02 14:33:44 in-progress All pages with complete
page edit history (.bz2)
Error 5 of allowed 5 retrieving revision text for text id
358280940! Pausing 5 seconds before retry...
* These dumps can be *very* large, uncompressing up to 20
times the archive download size. Suitable for archival and
statistical use, most mirror sites won't want or need this.
* pages-meta-history.xml.bz2 119.9 GB (written)
-----------------------------------------------
20100719 4:52:24am PST
# 2010-07-19 11:37:24 failed All pages with complete page edit
history (.bz2)
#6 {main}
* These dumps can be *very* large, uncompressing up to 20
times the archive download size. Suitable for archival and
statistical use, most mirror sites won't want or need this.
* pages-meta-history.xml.bz2
-----------------------------------------------
pages referenced in the above errors:
-----------------------------------------------
http://en.wikipedia.org/w/index.php?oldid=10595737
Brothers in Arms: Road to Hill 30
"This is an old revision of this page, as edited by Colonel
Cow (talk | contribs) at 01:02, 17 February 2005."
-----------------------------------------------
http://en.wikipedia.org/w/index.php?oldid=13930238
Brothers in Arms: Road to Hill 30
"This is an old revision of this page, as edited by
213.212.58.66 (talk) at 12:34, 19 May 2005."
-----------------------------------------------
http://en.wikipedia.org/w/index.php?oldid=355313550
User:Peter I. Vardy/sandbox
This is an old revision of this page, as edited by Peter I.
Vardy (talk | contribs) at 10:53, 11 April 2010.
-----------------------------------------------
http://en.wikipedia.org/w/index.php?oldid=346806445
Talk:Amy Shearn
"This is an old revision of this page, as edited by Yobot
(talk | contribs) at 02:49, 28 February 2010."
-----------------------------------------------
http://en.wikipedia.org/w/index.php?oldid=351921561
User:Ohms Law Bot/Cleanup/Roy D. Bridges, Jr.
"This is an old revision of this page, as edited by Ohms Law
Bot (talk | contribs) at 06:26, 25 March 2010."
-----------------------------------------------
http://en.wikipedia.org/w/index.php?oldid=358280940
The Tower Treasure
"This is an old revision of this page, as edited by
69.144.24.63 (talk) at 21:36, 25 April 2010."
-----------------------------------------------
----- Original Message -----
From: Dmitry Chichkov <dchichkov(a)gmail.com>
Date: Tuesday, July 20, 2010 3:31 pm
Subject: [Xmldatadumps-l] enwiki dump progress on 20100622 -
failed again
To: xmldatadumps-l(a)lists.wikimedia.org
Subj:
http://download.wikimedia.org/enwiki/20100622/
Is there anything that can be done to alleviate that
problem?
By the way, what's the point of producing .bz2 version of
the
pages-meta-history.xml dump? Is it easier on the
system to
produce .bz2
first and .7z after that? From the user's perspective I can
tell
that .7z is
all I need, there is simply no point in working with .bz2
(if
.7z is
available).
-- Regards, Dmitry
_______________________________________________
Xmldatadumps-l mailing list
Xmldatadumps-l(a)lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/xmldatadumps-l