I think you're actually referring to 30 January 2010 dump :-).
Please, also beware of some problems with missing revisions in the latests dumps.
Fwding message from Wikimedia-XML-dumps mailing list. Ariel T. Glenn also reported that
this was caused by a bug solved during that run.
In addition, enwiki-20100312.7z is 15.8 GB, whereas enwiki-20100130.7z is 31.9GB. The
first one only contains half of the total number of revisions, so January 2010 dump is the
best one we have, so far, despite revisions with missing text (not because of vandalism,
but due to backup problems).
From: Dmitri Chichkov
======================
By comparing two archives (IMO [enwiki-20100312 15.8 GB] doesn't seem to
have empty-revisions due to backup problem) You can estimate that ~0.2%
of all revisions are empty due to vandalism/etc
and in the [enwiki-20100130 31.9 GB] file additional ~0.4% are missing
due to backup failures.
[enwiki-20100130 31.9 GB] Revisions
313797035. Empty Revisions 1524837.
[enwiki-20100312 15.8 GB]
Revisions 184986173. Empty Revisions 370982
[enwiki-20100130 31.9 GB] Revisions 185000000. Empty Revisions 1158890.
(same position in the the archive)
You can also look at some single article - ie 'Anarchism' article.
In the [enwiki-20100130 31.9 GB] it have 15180 revisions, 624 of them
are empty = 4%. In the [enwiki-20100312 15.8 GB] it have 15261
revisions, only 8 revisions are empty - 0.05%.
============
Best,
Felipe.
--- El mié, 2/6/10, Brian J Mingus <Brian.Mingus(a)Colorado.EDU> escribió:
De: Brian J Mingus <Brian.Mingus(a)Colorado.EDU>
Asunto: Re: [Wiki-research-l] actual size of 30 may 2010 dump
Para: aforte(a)gatech.edu, "Research into Wikimedia content and communities"
<wiki-research-l(a)lists.wikimedia.org>
Fecha: miércoles, 2 de junio, 2010 20:33
On Wed, Jun 2, 2010 at 12:02 PM, Andrea Forte <andrea.forte(a)gmail.com> wrote:
Hi all, anyone have a close estimate (or exact number) for the size of
the 30 May 2010 enwiki dump once unzipped?
5TB is what it says here
[
http://en.wikipedia.org/wiki/Wikipedia_database#Latest_complete_dump_of_Eng…]
but that really does leave a lot of possibilities. :)
Andrea
I found an e-mail on wikitech-l that reports 5.34158501 terabytes.
-----Adjunto en línea a continuación-----
_______________________________________________
Wiki-research-l mailing list
Wiki-research-l(a)lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l