On Thu, Feb 26, 2009 at 10:53 AM, Aryeh Gregor <
Simetrical+wikilist(a)gmail.com <Simetrical%2Bwikilist(a)gmail.com>> wrote:
On Thu, Feb 26, 2009 at 9:49 AM, Anthony
<wikimail(a)inbox.org> wrote:
Accepted by whom? Co-lo a box on the Internet,
and ask the Foundation
for
permission to create the dump. A single thread
downloading articles to a
single server isn't going to impact the project. It probably wouldn't
even
be *noticed*.
It would also take even longer than the Wikimedia dumps.
What's your estimate of how long it's going to take to get the next full
history English Wikipedia dump?
Say 500ms average to request a single revision.
Why say that? 500ms is a long time.
Besides, through the API, you can get multiple revisions at once.
Multiply by 250,000,000 revisions.
You'd only need to get revisions since the last successful dump - maybe
150,000,000.
I get a figure of four years. You're going to need a lot
more than a single thread to get a remotely recent
dump. You probably
couldn't even keep up with the rate of new revision creation with a
single thread blocking on each HTTP request.
I just got 50 revisions of [[Georgia]] in 6.389 seconds using the API and my
slow internet connection. Even at that rate all the revisions since the
last dump could be downloaded in seven months, which is much less than the
time since the last successful full history dump. Ongoing it'd take 7 hours
to download a new day's edits, but more realistically a live feed could be
set up at that point.
And this is a worst case scenario. It assumes the WMF doesn't help *at all*
aside from allowing a single thread to access its servers.