On Fri, Nov 20, 2009 at 10:57 AM, Anthony <wikimail(a)inbox.org> wrote:
The main thing that would be missing, and that
can't be reconstructed
from the newer dumps, would be deleted articles. 0.1%, weighted by
number of revisions? I have absolutely no idea.
By the way, depending on what you're using the data for, this may or
may not be significant. For instance, if you're measuring vandalism,
even a small percentage of missing data might be significant, because
there is likely to be a high correlation to articles which are deleted
and articles which were vandalized.