On Fri, Nov 20, 2009 at 16:38, Anthony <wikimail(a)inbox.org> wrote:
On Fri, Nov 20, 2009 at 9:25 AM, Denny Vrandecic
<denny.vrandecic(a)kit.edu> wrote:
The newer dump should include almost all material
from the older dumps, so the older dumps are redundant.
Almost redundant :).
You can just get the fresh dumps and query
appropriately.
Except for the one that you can't get.
I think the main problem is that for enwiki, only the current
page text is included in the dump, not the older revisions.
pages-meta-history.xml is supposed to contain the old
revisions, but for enwiki, it can't be downloaded anymore.
I believe it simply got too big. For example, the current enwiki
dump progress page [1] displays "ETA 2010-02-12 17:21:11"
for pages-meta-history.xml.bz2, and the pages for completed
dumps, e.g. [2], don't include pages-meta-history.xml at all.
For the smaller wikis, e.g. dewiki [3], pages-meta-history.xml
is still available.
Christopher
[1]
http://download.wikimedia.org/enwiki/20091103/
[2]
http://download.wikimedia.org/enwiki/20091026/
[3]
http://download.wikimedia.org/dewiki/20091110/