> We do want to bail on attempts to retrieve a revision after a
> few tries
> since some revisions are irrecoverable.

What separates a recoverable from irrecoverable revision?  Is it just random or are some revisions always irrecoverable?  Also do you guys have a description or diagram of your database dump system?  I think it would be good to share this info, maybe there is a way to make the database dump ALWAYS work! :)

cheers,
Jamie



----- Original Message -----
From: "Ariel T. Glenn" <ariel@wikimedia.org>
Date: Wednesday, July 21, 2010 3:45 pm
Subject: Re: [Xmldatadumps-l] enwiki dump progress on 20100622 - failed again
To: Dmitry Chichkov <dchichkov@gmail.com>
Cc: Jamie Morken <jmorken@shaw.ca>, xmldatadumps-l@lists.wikimedia.org

> These don't cause failure of the backups; a separate (much larger)
> number of failed retrieved revisions causes that. 
>
> We do want to bail on attempts to retrieve a revision after a
> few tries
> since some revisions are irrecoverable.
>
> Ariel
>
> Στις 21-07-2010, ημέρα Τετ, και ώρα 15:39 -0700, ο/η Dmitry Chichkov
> έγραψε:
> >
> > >> 20100719 4:37:21am PST
> > >> # 2010-07-02 14:33:44  in-progress All pages with
> complete page
> > edit history (.bz2)
> > >> Error 5 of allowed 5 retrieving revision text for text id
> > 358280940! Pausing 5 seconds before retry...
> >
> > Well, my comment here would be that the number of 'allowed
> errors = 5'
> > and the 'retry delay 5 seconds' seem to be rather small. From
> that it
> > looks like a 25 seconds database unavailability would cause backup
> > failure. Considering that backup literally takes a month...
> >
> > I'd suggest setting the error rate to something like 0.01% of the
> > number of revisions. Also an incomplete dump (e.g. with missing
> > revisions texts) is much much better than nothing, so it would only
> > make sense to allow higher error rates or even make the interruption
> > procedure manual.
> >
> > To put that 0.01% error rate into perspective, according to my
> > estimates the error rate in the lase "complete" database dump
> > [enwiki-20100130 31.9GB/280GB] was at least ~0.4% (missing revisions
> > texts due to backup process  failures).
> >
> > -- Regards, Dmitry
> >
> >
> >
> >
> >
> > On Wed, Jul 21, 2010 at 4:03 AM, Jamie Morken
> <jmorken@shaw.ca> wrote:
> >        
> >        
> >         Hi,
> >        
> >         I was polling the
> >        
> http://download.wikimedia.org/enwiki/20100622/ page during the
> >         pages-meta-
> history.xml.bz2 database dump and here is some
> >         timestamped
> output from that page showing some errors that
> >         caused the
> dump to fail.  Regarding the .bz2 dump format,
> >         Tomasz earlier
> suggested removing it and using .7z.  I thought
> >         it might be
> good to keep the .bz2 format due to there being
> >         several
> programs that use it (ie. wikitaxi, bzreader).  7z
> >         format is
> probably the way to go though for the future, but I
> >         don't know if
> this would fix the database dump errors.
> >        
> >         cheers,
> >         Jamie
> >        
> >        
> >         ---------------
> --------------------------------
> >        
> >         20100719 2:22:14am
> >        
> >         # 2010-07-02
> 14:33:44  in-progress All pages with complete
> >         page edit
> history (.bz2)
> >         2010-07-19
> 09:22:11: enwiki 889057 pages (0.613/sec),
> >         110108000 revs
> (75.931/sec), 83.6% prefetched, ETA 2010-08-28
> >         05:12:01 [max
> 371385750]>        
> >             * These dumps can be *very* large, uncompressing up to 20
> >         times the
> archive download size. Suitable for archival and
> >         statistical
> use, most mirror sites won't want or need this.
> >             * pages-meta-history.xml.bz2 119.7 GB (written)
> >        
> >        
> >        
> >         ---------------
> --------------------------------
> >        
> >         20100719
> 3:07:16am PST
> >        
> >         # 2010-07-02
> 14:33:44  in-progress All pages with complete
> >         page edit
> history (.bz2)
> >         2010-07-19
> 10:07:15: enwiki 894194 pages (0.615/sec),
> >         110399000 revs
> (75.990/sec), 83.6% prefetched, ETA 2010-08-28
> >         04:08:46 [max
> 371385750]>        
> >             * These dumps can be *very* large, uncompressing up to 20
> >         times the
> archive download size. Suitable for archival and
> >         statistical
> use, most mirror sites won't want or need this.
> >             * pages-meta-history.xml.bz2 119.9 GB (written)
> >        
> >         ---------------
> --------------------------------
> >        
> >         20100719
> 3:22:17am PST
> >        
> >         # 2010-07-02
> 14:33:44  in-progress All pages with complete
> >         page edit
> history (.bz2)
> >         Error 2 of
> allowed 5 retrieving revision text for text id
> >         10595737!
> Pausing 5 seconds before retry...
> >        
> >             * These dumps can be *very* large, uncompressing up to 20
> >         times the
> archive download size. Suitable for archival and
> >         statistical
> use, most mirror sites won't want or need this.
> >             * pages-meta-history.xml.bz2 119.9 GB (written)
> >        
> >         ---------------
> --------------------------------
> >        
> >         20100719
> 3:37:18am PST
> >        
> >         # 2010-07-02
> 14:33:44  in-progress All pages with complete
> >         page edit
> history (.bz2)
> >         Error 3 of
> allowed 5 retrieving revision text for text id
> >         13930238!
> Pausing 5 seconds before retry...
> >        
> >             * These dumps can be *very* large, uncompressing up to 20
> >         times the
> archive download size. Suitable for archival and
> >         statistical
> use, most mirror sites won't want or need this.
> >             * pages-meta-history.xml.bz2 119.9 GB (written)
> >        
> >        
> >         ---------------
> --------------------------------
> >        
> >         20100719
> 3:52:19am PST
> >        
> >         # 2010-07-02
> 14:33:44  in-progress All pages with complete
> >         page edit
> history (.bz2)
> >         Error 4 of
> allowed 5 retrieving revision text for text id
> >         355313550!
> Pausing 5 seconds before retry...
> >        
> >             * These dumps can be *very* large, uncompressing up to 20
> >         times the
> archive download size. Suitable for archival and
> >         statistical
> use, most mirror sites won't want or need this.
> >             * pages-meta-history.xml.bz2 119.9 GB (written)
> >        
> >         ---------------
> --------------------------------
> >        
> >         20100719
> 4:07:20am PST
> >        
> >         # 2010-07-02
> 14:33:44  in-progress All pages with complete
> >         page edit
> history (.bz2)
> >         Error 3 of
> allowed 5 retrieving revision text for text id
> >         346806445!
> Pausing 5 seconds before retry...
> >        
> >             * These dumps can be *very* large, uncompressing up to 20
> >         times the
> archive download size. Suitable for archival and
> >         statistical
> use, most mirror sites won't want or need this.
> >             * pages-meta-history.xml.bz2 119.9 GB (written)
> >        
> >         ---------------
> --------------------------------
> >        
> >         20100719
> 4:22:21am PST
> >        
> >         # 2010-07-02
> 14:33:44  in-progress All pages with complete
> >         page edit
> history (.bz2)
> >         Error 4 of
> allowed 5 retrieving revision text for text id
> >         351921561!
> Pausing 5 seconds before retry...
> >        
> >             * These dumps can be *very* large, uncompressing up to 20
> >         times the
> archive download size. Suitable for archival and
> >         statistical
> use, most mirror sites won't want or need this.
> >             * pages-meta-history.xml.bz2 119.9 GB (written)
> >        
> >         ---------------
> --------------------------------
> >        
> >         20100719
> 4:37:21am PST
> >        
> >         # 2010-07-02
> 14:33:44  in-progress All pages with complete
> >         page edit
> history (.bz2)
> >         Error 5 of
> allowed 5 retrieving revision text for text id
> >         358280940!
> Pausing 5 seconds before retry...
> >        
> >             * These dumps can be *very* large, uncompressing up to 20
> >         times the
> archive download size. Suitable for archival and
> >         statistical
> use, most mirror sites won't want or need this.
> >             * pages-meta-history.xml.bz2 119.9 GB (written)
> >        
> >         ---------------
> --------------------------------
> >        
> >         20100719
> 4:52:24am PST
> >        
> >         # 2010-07-19
> 11:37:24 failed All pages with complete page edit
> >         history (.bz2)
> >         #6 {main}
> >        
> >             * These dumps can be *very* large, uncompressing up to 20
> >         times the
> archive download size. Suitable for archival and
> >         statistical
> use, most mirror sites won't want or need this.
> >             * pages-meta-history.xml.bz2
> >        
> >         ---------------
> --------------------------------
> >        
> >        
> >        
> >        
> >         pages
> referenced in the above errors:
> >        
> >         ---------------
> --------------------------------
> >        
> >        
> http://en.wikipedia.org/w/index.php?oldid=10595737>        
> >         Brothers in
> Arms: Road to Hill 30
> >         "This is an
> old revision of this page, as edited by Colonel
> >         Cow (talk |
> contribs) at 01:02, 17 February 2005."
> >        
> >         ---------------
> --------------------------------
> >        
> >        
> http://en.wikipedia.org/w/index.php?oldid=13930238>        
> >         Brothers in
> Arms: Road to Hill 30
> >         "This is an
> old revision of this page, as edited by
> >        
> 213.212.58.66  (talk) at 12:34, 19 May 2005."
> >        
> >         ---------------
> --------------------------------
> >        
> >        
> http://en.wikipedia.org/w/index.php?oldid=355313550>        
> >         User:Peter I.
> Vardy/sandbox>        
> This is an old revision of this page, as edited by Peter I.
> >         Vardy (talk |
> contribs)  at 10:53, 11 April 2010.
> >        
> >         ---------------
> --------------------------------
> >        
> >        
> http://en.wikipedia.org/w/index.php?oldid=346806445>        
> >         Talk:Amy Shearn
> >         "This is an
> old revision of this page, as edited by Yobot
> >         (talk |
> contribs) at 02:49, 28 February 2010."
> >        
> >         ---------------
> --------------------------------
> >        
> >        
> http://en.wikipedia.org/w/index.php?oldid=351921561>        
> >         User:Ohms Law
> Bot/Cleanup/Roy D. Bridges, Jr.
> >         "This is an
> old revision of this page, as edited by Ohms Law
> >         Bot (talk |
> contribs) at 06:26, 25 March 2010."
> >        
> >         ---------------
> --------------------------------
> >        
> >        
> http://en.wikipedia.org/w/index.php?oldid=358280940>        
> >         The Tower Treasure
> >         "This is an
> old revision of this page, as edited by
> >        
> 69.144.24.63  (talk) at 21:36, 25 April 2010."
> >        
> >         ---------------
> --------------------------------
> >        
> >        
> >        
> >        
> >         ----- Original
> Message -----
> >         From: Dmitry
> Chichkov <dchichkov@gmail.com>
> >         Date: Tuesday,
> July 20, 2010 3:31 pm
> >         Subject:
> [Xmldatadumps-l] enwiki dump progress on 20100622 -
> >         failed again
> >         To:
> xmldatadumps-l@lists.wikimedia.org
> >        
> >        
> >        
> >         > Subj:
> http://download.wikimedia.org/enwiki/20100622/>         >
> >         > Is there
> anything that can be done to alleviate that
> >         problem?
> >         >
> >         > By the way,
> what's the point of producing .bz2 version of
> >         the
> >         > pages-meta-
> history.xml dump? Is it easier on the system to
> >         > produce .bz2
> >         > first and
> .7z after that? From the user's perspective I can
> >         tell
> >         > that .7z is
> >         > all I need,
> there is simply no point in working with .bz2
> >         (if
> >         > .7z is
> >         > available).
> >         >
> >         > -- Regards, Dmitry
> >         >
> >
> > _______________________________________________
> > Xmldatadumps-l mailing list
> > Xmldatadumps-l@lists.wikimedia.org
> > https://lists.wikimedia.org/mailman/listinfo/xmldatadumps-l
>
>
>