The "latest" directory is not useful for this purpose (e.g. http://download.wikipedia.org/enwiki/latest/ points to files from approx Aug-17, which looks to be the latest dump where everything reported as succeeded;
Right, for consistency.
Yes, but how often does somebody intentionally download and use every single file from a dump? Most people need either one or two of the dump files; the rest are simply irrelevant to them.
The latest directory is using a lowest-common-denominator approach (latest run where everything succeeded). This file would essentially be a highest-common-denominator approach (latest successful version of each individual file). Maybe both have their place.
However, I've realised it would be useful to include for each data type the date on which the dump run was started, e.g.: --------------------------------------- <dump type="site_stats.sql.gz"> <desc>A few statistics such as the page count.</desc> <url>http://download.wikipedia.org/enwiki/20060925/enwiki-20060925-site_stats.sql...</url> + <dump_run>20060925</dump_run> <size_in_bytes>451</size_in_bytes> <timestamp>2006-09-24T16:29:01Z</timestamp> <md5sum>e4defa79c36823c67ed4d937f8f7013c</md5sum> </dump> ---------------------------------------
.. that way anyone that needs multiple files can hold off downloading them until all the "dump_run" fields match up, so as to more easily avoid problems of mixing files from different dumps. (It's true that this field can currently be pulled from the directory in the <url> field, but if a different field is used then the url can point just about anywhere, such as potentially using different hostnames for different dumps, or changing directory structure.)
Anyway, it's just a suggestion, and if you don't like it, well, there's not much I can do about it ;-)
All the best, Nick.