On Fri, Nov 20, 2009 at 10:57 AM, Anthony wikimail@inbox.org wrote:
The main thing that would be missing, and that can't be reconstructed from the newer dumps, would be deleted articles. 0.1%, weighted by number of revisions? I have absolutely no idea.
By the way, depending on what you're using the data for, this may or may not be significant. For instance, if you're measuring vandalism, even a small percentage of missing data might be significant, because there is likely to be a high correlation to articles which are deleted and articles which were vandalized.