We halted them because we can have bad data creep on during times when
the codebase is badly broken. I don't want to have to walk through and
detrmine later which 30 or 50 wiki dumps those are and toss them, so I
have them on hold til things are sorted out or until we have a date for
deployment that is a number of days off.
A dump with errors isn't better than no dump in that it is possible for
bad data to be carried forward into subsequent dumps, even with the
revision length check in the code.
The only certain check involves doing an md5sum of the revision text,
something that can only be accomplished right now by retrieving the text
from the database, thus making prefetch from the previous dump file a
pointless exercise.
After a brief meeting just now about deployment, it appears we are going
to make another stab at testing tomorrow at this time. (Check
http://techblog.wikimedia.org/ in a couple of hours for the details.)
After that we should have several days of a break; if that pans out,
I'll happily crank dumps back up for that interval.
Ariel
Στις 09-02-2011, ημέρα Τετ, και ώρα 13:44 -0800, ο/η Jamie Morken
έγραψε:
Hi Ariel,
I don't really understand why the dumps need to be halted as I thought
the mediawiki code and database dump code were basically two separate
entities already*. I guess the 1.17 branch code changes the structure
of the database causing potential errors in the database dump? I also
don't understand the "precautionary" logic of halting the dumps, as a
dump with errors is better than no dump in the case where there are a
limited supply of recent dumps due to the RAID server failure as well.
If its only a couple day halt as you mentioned that's probably
irrelevant, but it sounds like it may be a longer period of limited
testing from your last wikitech email, which makes me wonder if it is
even worth halting the dumps in the first place.. Also wouldn't
potential dump errors be detected better if they continue to be
produced and check them for errors, rather than halt them?
cheers,
Jamie
*
http://svn.wikimedia.org/viewvc/mediawiki/branches/REL1_17/
http://svn.wikimedia.org/viewvc/mediawiki/branches/ariel/xmldumps-backup/
----- Original Message -----
From: "Ariel T. Glenn" <ariel(a)wikimedia.org>
Date: Saturday, February 5, 2011 10:56 pm
Subject: [Xmldatadumps-l] upcoming 1.17 deployment and the xml dumps
To: xmldatadumps-l(a)lists.wikimedia.org, wikitech-l(a)lists.wikimedia.org
A little bit before the scheduled deployment of
the 1.17 branch
on our
production servers, I will be halting production of XML dumps.
Deployment is set for Tuesday Feb 8 at 07:00 UTC, so a few hours
beforethat I'll start shutting down processes.
This is a precautionary measure; after the deployment and any hasty
fixes that may be needed, I will be doing some testing to ensure
that
> dumps are not impacted, before we restart them. Barring some bizarre
> problem, we should be back up and running within a day or two.
>
> Ariel
>
>
>
> _______________________________________________
> Xmldatadumps-l mailing list
> Xmldatadumps-l(a)lists.wikimedia.org
>
https://lists.wikimedia.org/mailman/listinfo/xmldatadumps-l
>