Date: Fri, 19 Feb 2010 18:25:50 +0100 From: Tomasz Finc tfinc@wikimedia.org Subject: Re: [Wikitech-l] enwiki complete page edit history To: Wikimedia developers wikitech-l@lists.wikimedia.org Message-ID: 4B7EC99E.4040907@wikimedia.org Content-Type: text/plain; charset=ISO-8859-1; format=flowed
The pages-meta-history.xml.bz2 is showing 115.4GB written (in
progress) at:
http://download.wikipedia.org/enwiki/20100130/
The older pages-meta-history.xml.bz2 from
http://download.wikipedia.org/enwiki/20091128/%3E shows 255.1GB written (failed build)
So once the 20100130 current pages-meta-history.xml.bz2 dump
is finished writing, will it be over 255GB > as it is newer than the older copy and contains more info?
Correct.
Also these big files aren't weblinked for download lately I
noticed. I think they should be as they contain
the full wikipedia history/discussion pages which have
humongous amounts of useful information that should be > available for easy distribution. What is the reason they aren't weblinked, the bandwidth costs?
Do you mean that the failed runs aren't web linked? If so then I'd rather not point people to corrupted files.
Hi Tomasz,
I don't think there are any (failed or successful) weblinked "pages-meta-history.xml.bz2" or "pages-meta-history.xml.7z" files for the enwiki on the wikimedia download server. I think there must be a successful enwiki "pages-meta-history" from 2009 floating around somewhere, I think that the last successful dump (guessing Sept 2009?) should always be linked for download. If you have a copy of the latest successful build of "pages-meta-history" (.bz2 or .7z) for enwiki I'd appreciate it if you posted a link, thanks
cheers, Jamie
--tomasz
wikitech-l@lists.wikimedia.org