[Foundation-l] dumps

Delirium delirium at hackish.org
Wed Feb 25 04:43:23 UTC 2009


Brian wrote:
> Why not make the uncompressed dump available as an Amazon Public
> Dataset? http://aws.amazon.com/publicdatasets/
> 
> You can already find DBPedia and FreeBase there. Its true that the
> uncompressed dump won't fit on a commercial drive (the largest is a
> 4-platter 500GB = 2TB drive). Cloud computing seems to be the most
> economically feasible alternative for all parties involved.

It depends on the parties--- for me as a user, it's more economically 
feasible to download the dataset locally and run scripts on my own 
machine, than to pay for EC2 compute time to run those scripts. But I 
have free unlimited university bandwidth.

It does seem like there might be some mutual benefits to having a copy 
at Amazon, for those who do prefer it. Since it would become easy to 
analyze a full database dump from an Amazon EC2 compute instance, due to 
it being already available on the filesystem, a number of people might 
use EC2 to run their analysis scripts. From that perspective, maybe 
Amazon might be persuaded to help out? Maybe they could donate some 
money, equipment, or developer time to reengineer the dump process, in 
return for one part of the reengineering being the addition of a routine 
sync to their service?

-Mark




More information about the wikimedia-l mailing list