Hi,
there seems to be a problem with the dumps or with the way in which we interpret the dumps.
Currently, daily dumps come with a maxrevid.txt file that is supposed to give the largest revision number in the dump. For example, daily dump of 1st Aug 2013 has max id 62860640 [1]. I guess that's true.
The wda scripts (that also create the statistics and digested dumps we publish) use this number to figure out if a daily is still relevant or if it is already contained in the latest full dump. For this, we need to get the maximal revision id of the full dump. We do this by reading the file site_stats.sql.gz, where we look for the line starting with INSERT INTO `site_stats` and take a revision number from there (third number in the insert tuple). For example, for the dump of 27 July 2013, this number is 63069374 [2].
There is a problem here, since the maximal revision in the dumps of 27 July 2013 is not actually that high (the history dump of that date is incomplete, but the current revs dump is done and has max rev 61983867 [3]). Thus, our scripts ignore several days of dailies.
Before I go and work on this, my question is whether this is an error in our script (i.e., the number we take from sitestats is not supposed to be the max revision) or an error in the dumps (i.e., sitestats was exported wrongly).
Cheers,
Markus
[1] http://dumps.wikimedia.org/other/incr/wikidatawiki/20130801/maxrevid.txt [2] http://dumps.wikimedia.org/wikidatawiki/20130727/wikidatawiki-20130727-site_... [3] This can be seen in the comments for the dump at http://dumps.wikimedia.org/wikidatawiki/20130727/
wikidata-tech@lists.wikimedia.org