One of the academics I am speaking of wrote the textbook on natural language
processing. He has a 3TB raid cluster. Of course, for about a thousand
dollars you can create a bigger raid cluster than that using the new 2TB
drives, but funding comes and goes. Our 26 node cluster has a 26 20GB drives
in a glusterfs configuration (disk space isn't key to us, so we skimped). So
I'm not sure what you mean by "usually have access." They have to pay for
this access, or negotiate for it, or receive grant money specifically for
it. Most academics *do not* have what you are describing. This is an
exceptionally large dataset.
On Wed, Feb 25, 2009 at 4:45 PM, Thomas Dalton <thomas.dalton(a)gmail.com>wrote;wrote:
2009/2/25 Brian <Brian.Mingus(a)colorado.edu>du>:
Ahh ok. Anyone who wants to do processing on the
full history (and there
are
a lot of these people who exist!) by definition
*has* to be willing to
throw
some money at it. It simply doesn't fit on
commercial drives. In fact, it
would hardly fit on either of the two raid clusters I have access to.
Making
it available on Amazon means that, for a fair
market rate, you don't have
to
download or uncompress the data. You can just
start your data crunching.
I
can only speak for academics but there is
generally funding available for
Amazon EC2 etc... for specific projects. Professors are even known to pay
for a fixed amount of processing for ambitious student projects, and
these
kinds of earmarks are easily fit into grants.
Academics usually have access to the necessary computers (or clusters
thereof) to do such processing directly. I think Amazon hosting of
dumps would appeal mainly to non-academics who only have access to
home PCs.
_______________________________________________
foundation-l mailing list
foundation-l(a)lists.wikimedia.org
Unsubscribe:
https://lists.wikimedia.org/mailman/listinfo/foundation-l