Hi Kent,
On Fri, Jan 04, 2013 at 01:46:43PM -0500, wp mirror wrote:
Dear Ariel,
allow me to chime in although I'm not Ariel.
Ariel, please bite my head off, if I say anything wrong.
Correction: What I meant to say, is that I rely on `enwiki-latest-md5sums.txt' referencing complete files:
(shell)$ rsync --copy-links dumps.wikimedia.your.org::wikimedia-dumps/enwiki/latest/enwiki-latest-md5sums.txt . (shell)$ cat enwiki-latest-md5sums.txt | grep imagelinks 8b5524d36a795b020a6423127c269610 enwiki-20121201-imagelinks.sql.gz
Is this assumption valid?
The assumption that *-*-md5sums.txt only references completed files, looks sound to me.
The temporary md5sums files get updated in Runner.runUpdateItemFileInfo, which is only called if a Run's item has status "done" (compare dumpruninfo.txt). So all files referenced in the temporary md5sums file should reference only completed files.
Best regards, Christian
P.S.: However (if that was your intention), I would not rely on *-latest-md5sums.txt listing all files that are necessary for a full, complete dump, if the dump's corresponding dumpruninfo.txt contains jobs that are not marked as "done".