Date: Fri, 19 Feb 2010 18:25:50 +0100
From: Tomasz Finc <tfinc(a)wikimedia.org>
Subject: Re: [Wikitech-l] enwiki complete page edit history
To: Wikimedia developers <wikitech-l(a)lists.wikimedia.org>
Message-ID: <4B7EC99E.4040907(a)wikimedia.org>
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
The pages-meta-history.xml.bz2 is showing 115.4GB written (in
progress) at:
http://download.wikipedia.org/enwiki/20091128/> shows 255.1GB
written (failed build)
So once the 20100130 current pages-meta-history.xml.bz2 dump
is finished writing,
will it be over 255GB
as it is newer than the older copy and contains
more info?
Correct.
Also these big files aren't weblinked for download lately I
noticed. I think
they should be as they contain
the full wikipedia history/discussion pages which
have
humongous amounts of useful information that should be
available for easy distribution. What is the
reason they aren't
weblinked, the bandwidth costs?
Do you mean that the failed runs aren't web linked? If so then
I'd
rather not point people to corrupted files.
Hi Tomasz,
I don't think there are any (failed or successful) weblinked
"pages-meta-history.xml.bz2" or "pages-meta-history.xml.7z" files for
the enwiki on the wikimedia download server. I think there must be a successful enwiki
"pages-meta-history" from 2009 floating around somewhere, I think that the last
successful dump (guessing Sept 2009?) should always be linked for download. If you have a
copy of the latest successful build of "pages-meta-history" (.bz2 or .7z) for
enwiki I'd appreciate it if you posted a link, thanks
cheers,
Jamie