[Foundation-l] dumps

Brian Brian.Mingus at colorado.edu
Wed Feb 25 23:48:10 UTC 2009


One of the academics I am speaking of wrote the textbook on natural language
processing. He has a 3TB raid cluster. Of course, for about a thousand
dollars you can create a bigger raid cluster than that using the new 2TB
drives, but funding comes and goes. Our 26 node cluster has a 26 20GB drives
in a glusterfs configuration (disk space isn't key to us, so we skimped). So
I'm not sure what you mean by "usually have access." They have to pay for
this access, or negotiate for it, or receive grant money specifically for
it. Most academics *do not* have what you are describing. This is an
exceptionally large dataset.

On Wed, Feb 25, 2009 at 4:45 PM, Thomas Dalton <thomas.dalton at gmail.com>wrote:

> 2009/2/25 Brian <Brian.Mingus at colorado.edu>:
> > Ahh ok. Anyone who wants to do processing on the full history (and there
> are
> > a lot of these people who exist!) by definition *has* to be willing to
> throw
> > some money at it. It simply doesn't fit on commercial drives. In fact, it
> > would hardly fit on either of the two raid clusters I have access to.
> Making
> > it available on Amazon means that, for a fair market rate, you don't have
> to
> > download or uncompress the data. You can just start your data crunching.
> I
> > can only speak for academics but there is generally funding available for
> > Amazon EC2 etc... for specific projects. Professors are even known to pay
> > for a fixed amount of processing for ambitious student projects, and
> these
> > kinds of earmarks are easily fit into grants.
>
> Academics usually have access to the necessary computers (or clusters
> thereof) to do such processing directly. I think Amazon hosting of
> dumps would appeal mainly to non-academics who only have access to
> home PCs.
>
> _______________________________________________
> foundation-l mailing list
> foundation-l at lists.wikimedia.org
> Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l
>



More information about the wikimedia-l mailing list