[Foundation-l] Amazon Public Data includes Wikipedia

Thomas Dalton thomas.dalton at gmail.com
Wed Feb 25 16:08:46 UTC 2009


2009/2/25 Nathan <nawrich at gmail.com>:
> http://www.nytimes.com/external/readwriteweb/2009/02/25/25readwriteweb-amazon_exposes_1_terrabyte_of.html
>
> According to this, a new project by Amazon that makes a terabyte of public
> data available includes a full dump of Wikipedia. It also includes the
> complete dbpedia - so it seems like there are likely to be lots of
> duplicates. Given the other information it says it includes (the whole human
> genome, all other publicly available DNA sequences, census data, etc.) I'm
> not sure how it all fits in a single terabyte.  Interesting concept, though.
> I wonder how old the dump is, since they've been unavailable for some time?

It probably only contains the latest copies of each page in the main
namespace, rather than a full dump (I can't see why they would want a
full dump). That's pretty small (a bit larger if they've included
images, of course). I think there have been article dumps of enwiki
reasonably recently, it's just the full dumps that always fail.



More information about the foundation-l mailing list