---------- Forwarded message ---------- From: Zachary Harris zacharyharris@hotmail.com Date: Mon, Oct 8, 2012 at 5:49 PM Subject: [Wikitech-l] "Latest" md5sums of dumps To: wikitech-l@lists.wikimedia.org
It is a bit confusing that http://dumps.wikimedia.org/enwiki/20121001/enwiki-20121001-md5sums.txt has newer hash values that aren't in http://dumps.wikimedia.org/enwiki/latest/enwiki-latest-md5sums.txt. I'm guessing that the "latest" md5sums files get updated only at the end of a cycle when all dumps for the month have been completed? (In other words, the idea is that the "latest" directory contains backups that have been *completed*, and the md5sums file isn't complete until *all of the other backups* which it is going to reference are complete.) I was expecting the "latest" md5sums file to have "rolling" content matching the rolling content of the "latest" directory itself. If the latter is possible without too great of administrative hassle, it could be a nice feature. Otherwise, it would seem that "latest-md5sums" is almost always outdated relative to the other files in "latest".
-Zach
Hi,
Yep, it looks like a mixture of old and new versions of the dumps. Ariel: Shouldn't the new files be copied over to the latest only after the whole dump is completed? This mixture of September and October dumps is quite weird and confusing. :(
On Tue, Oct 9, 2012 at 1:55 AM, Jeremy Baron jeremy@tuxmachine.com wrote:
---------- Forwarded message ---------- From: Zachary Harris zacharyharris@hotmail.com Date: Mon, Oct 8, 2012 at 5:49 PM Subject: [Wikitech-l] "Latest" md5sums of dumps To: wikitech-l@lists.wikimedia.org
It is a bit confusing that http://dumps.wikimedia.org/enwiki/20121001/enwiki-20121001-md5sums.txt has newer hash values that aren't in http://dumps.wikimedia.org/enwiki/latest/enwiki-latest-md5sums.txt. I'm guessing that the "latest" md5sums files get updated only at the end of a cycle when all dumps for the month have been completed? (In other words, the idea is that the "latest" directory contains backups that have been *completed*, and the md5sums file isn't complete until *all of the other backups* which it is going to reference are complete.) I was expecting the "latest" md5sums file to have "rolling" content matching the rolling content of the "latest" directory itself. If the latter is possible without too great of administrative hassle, it could be a nice feature. Otherwise, it would seem that "latest-md5sums" is almost always outdated relative to the other files in "latest".
-Zach
Xmldatadumps-l mailing list Xmldatadumps-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/xmldatadumps-l
On 10/08/2012 11:36 PM, Hydriz Wikipedia wrote:
Shouldn't the new files be copied over to the latest only after the whole dump is completed?
I was actually hoping to go the other direction---keeping the September and October dumps (to use the present example) mixed together in "latest", but having the latest md5sums file track with the mixture. The above suggestion would almost certainly be an easier way to maintain consistency between the dumps and the md5sums file. My request would maintain a "hot" up-to-date "latest" directory with consistent md5sums, although admittedly it may not be worth the trouble if doing so is not straightforward.
-Zach
The "latest" files are just links to files in a dump directory; this includes the md5 file. That's why you get a mismatch. I think there is a report about this in bugzilla (if not please add one). This is relatively low priority on the list of things to fix, since there is a workaround: wait a few days for the run to complete.
Ariel
Στις 09-10-2012, ημέρα Τρι, και ώρα 00:01 -0400, ο/η Zachary Harris έγραψε:
On 10/08/2012 11:36 PM, Hydriz Wikipedia wrote:
Shouldn't the new files be copied over to the latest only after the whole dump is completed?
I was actually hoping to go the other direction---keeping the September and October dumps (to use the present example) mixed together in "latest", but having the latest md5sums file track with the mixture. The above suggestion would almost certainly be an easier way to maintain consistency between the dumps and the md5sums file. My request would maintain a "hot" up-to-date "latest" directory with consistent md5sums, although admittedly it may not be worth the trouble if doing so is not straightforward.
-Zach
Xmldatadumps-l mailing list Xmldatadumps-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/xmldatadumps-l
xmldatadumps-l@lists.wikimedia.org