[Foundation-l] EN Wikipedia Editing Statistics

Robert Rohde rarohde at gmail.com
Sun Nov 30 21:20:01 UTC 2008


On Sun, Nov 30, 2008 at 12:58 PM, Thomas Dalton <thomas.dalton at gmail.com> wrote:
>> I saw this the other day as well and found it odd. While enwiki dumps
>> do take the longest, this does seem like an _incredibly_ long time for
>> "All pages with complete page edit history (.bz2)" to finish (May 2009).
>
> Do you know how many pages enwiki has and how much edit history they
> each have? It's a lot!
>
> I think the dumps work by starting with the last successful dump and
> just adding in anything that's changed, but because there haven't been
> any successful dumps of the whole of enwiki in a long time, it
> basically has to start from scratch, which is going to take a long
> time (and means it probably won't succeed - ie. we have a catch-22).
> It seems to me that (if my understanding of the problem is correct),
> the answer is to devote a more powerful computer to the dump for just
> this one so that we can get things moving again - I'm sure if we asked
> around someone could lend us a really powerful computer for a few
> weeks to do the dump on.

No, dumps are total, not incremental.

It is really more than throwing a big computer at it.  The dumping
process ought to be redesigned to be more fault tolerant and faster.
It is ridiculous to have a process that is expected to take months and
yet have no method of saving one's progress as it goes and restarting
it in case of trouble.

-Robert Rohde



More information about the foundation-l mailing list