Date: Wed, 17 Feb 2010 05:01:43 +0100
From: Tomasz Finc <tfinc(a)wikimedia.org>
Subject: Re: [Wikitech-l] enwiki complete page edit history
To: Wikimedia developers <wikitech-l(a)lists.wikimedia.org>
Message-ID: <4B7B6A27.9040200(a)wikimedia.org>
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
It sadly failed as noted in
http://lists.wikimedia.org/pipermail/xmldatadumps-admin-l/2010-January/0000…
I've updated the index to clear that up.
--tomasz
Hi Tomasz,
The pages-meta-history.xml.bz2 is showing 115.4GB written (in progress) at:
http://download.wikipedia.org/enwiki/20100130/
The older pages-meta-history.xml.bz2 from
http://download.wikipedia.org/enwiki/20091128/
shows 255.1GB written (failed build)
So once the 20100130 current pages-meta-history.xml.bz2 dump is finished writing, will it
be over 255GB as it is newer than the older copy and contains more info?
Also these big files aren't weblinked for download lately I noticed. I think they
should be as they contain the full wikipedia history/discussion pages which have humongous
amounts of useful information that should be available for easy distribution. What is the
reason they aren't weblinked, the bandwidth costs?
cheers,
Jamie
Jamie Morken wrote:
Hi,
I was looking at the enwiki dump progress and noticed the file size for the enwiki
pages-meta-history.xml.bz2 has decreased
from 255GB on 20100125 down to 105GB on 20100203. Is it possible that
old page revision edit data is being lost due to the smaller archive file
size?
2009-12-03 12:53:43 in-progress All pages with complete page edit history
(.bz2)2010-01-25
16:02:21: enwiki 14833408 pages (3.231/sec), 284292000 revs
(61.930/sec), 54.7% prefetched, ETA 2010-02-03 02:34:19 [max 329446505]
These dumps can be *very* large, uncompressing
up to 20 times the archive download size. Suitable for archival and
statistical use, most mirror sites won't want or need this.pages-meta-history.xml.bz2
255.1 GB (written)
2010-02-03 17:28:43 in-progress All pages with complete page edit history
(.bz2)2010-02-16
00:32:55: enwiki 747550 pages (0.704/sec), 95964000 revs (90.340/sec),
95.8% prefetched, ETA 2010-03-19 12:10:50 [max 341714004]
These dumps can be *very* large, uncompressing
up to 20 times the archive download size. Suitable for archival and
statistical use, most mirror sites won't want or need this.pages-meta-history.xml.bz2
105.1 GB (written)
cheers,
Jamie