[Foundation-l] dumps

Anthony wikimail at inbox.org
Thu Feb 26 17:35:26 UTC 2009


On Thu, Feb 26, 2009 at 10:53 AM, Aryeh Gregor <
Simetrical+wikilist at gmail.com <Simetrical%2Bwikilist at gmail.com>> wrote:

> On Thu, Feb 26, 2009 at 9:49 AM, Anthony <wikimail at inbox.org> wrote:
> > Accepted by whom?  Co-lo a box on the Internet, and ask the Foundation
> for
> > permission to create the dump.  A single thread downloading articles to a
> > single server isn't going to impact the project.  It probably wouldn't
> even
> > be *noticed*.
>
> It would also take even longer than the Wikimedia dumps.


What's your estimate of how long it's going to take to get the next full
history English Wikipedia dump?

Say 500ms average to request a single revision.


Why say that?  500ms is a long time.

Besides, through the API, you can get multiple revisions at once.


> Multiply by 250,000,000 revisions.


You'd only need to get revisions since the last successful dump - maybe
150,000,000.

I get a figure of four years.  You're going to need a lot
> more than a single thread to get a remotely recent dump.  You probably
> couldn't even keep up with the rate of new revision creation with a
> single thread blocking on each HTTP request.


I just got 50 revisions of [[Georgia]] in 6.389 seconds using the API and my
slow internet connection.  Even at that rate all the revisions since the
last dump could be downloaded in seven months, which is much less than the
time since the last successful full history dump.  Ongoing it'd take 7 hours
to download a new day's edits, but more realistically a live feed could be
set up at that point.

And this is a worst case scenario.  It assumes the WMF doesn't help *at all*
aside from allowing a single thread to access its servers.



More information about the wikimedia-l mailing list