randomcoder1@gmail.com wrote:
Hi,
I'm trying to get a hold of the wikipedia dump , in particular enwiki-latest-pages-meta-history.xml.bz2
These are currently not being run until our new dumps system is in place. The full history job on our internal servers was taking months to complete and wasn't scalable. Thus we turned it off.
It seems that on the page where it's supposed to be (http://download.wikipedia.org/enwiki/latest/) it's weighing at 0.6KB whereas I was used for it to be 147GB
What happened to the data and where did it went ?
Their are four successful runs of the enwiki run starting 20090512 ending 20090604 (20090610 didn't get through it's logging table)
You can find them at
http://download.wikipedia.org/enwiki/ .. it just won't have the full history+text.
Also , on the wikipedia ( http://en.wikipedia.org/wiki/Wikipedia_database ) page I read "As of January 17 </wiki/January_17>, 2009 </wiki/2009>, it seems that all snapshots of pages-meta-history.xml.7z hosted at http://download.wikipedia.org/enwiki/ are missing. The developers at Wikimedia Foundation are working to address this issue (http://lists.wikimedia.org/pipermail/wikitech-l/2009-January/040841.html). There are other ways to obtain this file"
Not until we get the new system into place. For now you still have all of the other archives in place including current pages, log of revisions, templates .. etc . Just not the full page text over time.
Were still working on getting that one back in place and hope to have it back in line before Wikimania.
I checked the other ways of obtaining the file that they describe , none worked. Why did the dumps vanished and how can I download a copy of them ?
The dumps didn't vanish we just had a bad update late last week that trashed the main index. I'll have that cleaned up in the next day or so.
--tomasz