On Fri, Nov 20, 2009 at 16:38, Anthony wikimail@inbox.org wrote:
On Fri, Nov 20, 2009 at 9:25 AM, Denny Vrandecic denny.vrandecic@kit.edu wrote:
The newer dump should include almost all material from the older dumps, so the older dumps are redundant.
Almost redundant :).
You can just get the fresh dumps and query appropriately.
Except for the one that you can't get.
I think the main problem is that for enwiki, only the current page text is included in the dump, not the older revisions.
pages-meta-history.xml is supposed to contain the old revisions, but for enwiki, it can't be downloaded anymore. I believe it simply got too big. For example, the current enwiki dump progress page [1] displays "ETA 2010-02-12 17:21:11" for pages-meta-history.xml.bz2, and the pages for completed dumps, e.g. [2], don't include pages-meta-history.xml at all.
For the smaller wikis, e.g. dewiki [3], pages-meta-history.xml is still available.
Christopher
[1] http://download.wikimedia.org/enwiki/20091103/ [2] http://download.wikimedia.org/enwiki/20091026/ [3] http://download.wikimedia.org/dewiki/20091110/