[Foundation-l] Dump process needs serious fix / was Release of squid log data
Samuel Klein
sj at laptop.org
Sat Sep 15 22:10:30 UTC 2007
On Sat, 15 Sep 2007, Erik Zachte wrote:
> People keep asking me about this, so let me elaborate on it here, rather
> than on wikitech, where it has been brought up a few times:
Thank you.
> But it has to be said, the current sad state where many dumps, large and
> small, have failed is no exception anymore:
> see http://www.infodisiac.com/cgi-bin/WikimediaDownload.pl
> So I am waiting for good input. Notice that even if all goes well, the
> English dump job alone runs for over 6 weeks already!
> See http://download.wikimedia.org/enwiki/20070908/
> Current step started 2007-09-12 , expected time of arrival 2007-10-30.
> There is a good chance some mishap occurs before that.
Can someone elaborate on what is going on here? What are the steps
involved, and why does this take so long? It would take less time to copy
a terabyte of data to a spare disk, drive it to a world-class computing
cluster anywhere in the country, and have the dumps worked on there
(including people figuring out another implementation of the dump
process). Maybe said computing cluster could also become the de facto
mirror-and- statistics center for Wikipedia data, where researchers would
send complex queries to be run.
SJ
More information about the foundation-l
mailing list