randomcoder1(a)gmail.com wrote:
Hi,
I'm trying to get a hold of the wikipedia dump , in particular
enwiki-latest-pages-meta-history.xml.bz2
These are currently not being run until our new dumps system is in
place. The full history job on our internal servers was taking months
to complete and wasn't scalable. Thus we turned it off.
It seems that on the page where it's supposed to
be
(
http://download.wikipedia.org/enwiki/latest/) it's weighing at 0.6KB
whereas I was used for it to be 147GB
What happened to the data and where did it went ?
Their are four successful runs of the enwiki run starting 20090512
ending 20090604 (20090610 didn't get through it's logging table)
You can find them at
http://download.wikipedia.org/enwiki/ .. it just won't have the full
history+text.
Also , on the wikipedia (
http://en.wikipedia.org/wiki/Wikipedia_database ) page I read
"As of January 17 </wiki/January_17>, 2009 </wiki/2009>, it seems that
all snapshots of pages-meta-history.xml.7z hosted
at
http://download.wikipedia.org/enwiki/ are missing. The developers at
Wikimedia Foundation are working to address this issue
(
http://lists.wikimedia.org/pipermail/wikitech-l/2009-January/040841.html).
There are other ways to obtain this file"
Not until we get the new system into place. For now you still have all
of the other archives in place including current pages, log of
revisions, templates .. etc . Just not the full page text over time.
Were still working on getting that one back in place and hope to have it
back in line before Wikimania.
I checked the other ways of obtaining the file that they describe , none
worked.
Why did the dumps vanished and how can I download a copy of them ?
The dumps didn't vanish we just had a bad update late last week that
trashed the main index. I'll have that cleaned up in the next day or so.
--tomasz