[Foundation-l] dumps

Anthony wikimail at inbox.org
Wed Feb 25 16:26:22 UTC 2009


On Tue, Feb 24, 2009 at 11:26 PM, Brian <Brian.Mingus at colorado.edu> wrote:

> Why not make the uncompressed dump available as an Amazon Public
> Dataset? http://aws.amazon.com/publicdatasets/
>

Which uncompressed dump?  The full history English Wikipedia dump doesn't
exist, and there doesn't seem to be any demand for this anyway.

You can already find DBPedia and FreeBase there. Its true that the
> uncompressed dump won't fit on a commercial drive (the largest is a
> 4-platter 500GB = 2TB drive). Cloud computing seems to be the most
> economically feasible alternative for all parties involved.
>

"Cloud computing" might be a good alternative for some reusers, but if so
it'd be most economical to just host the cloud at the source, i.e. API/live
feed access open to everyone (for free or for a cost).  For certain uses
there's the toolserver, but access to it is handed out with special
permission.  For small amounts of traffic there's an API, and there's the
live feed which seems to be limited to major corporations with special
permission.

The WMF hasn't put any real resources into this for the small time
commercial user (big players have the live feed and non-commercial users can
probably get toolserver access).  Of course, there isn't all that much
demand either.  If there was, a third party would have set it up by now (I'd
personally be willing to set up a pay-for-access toolserver and custom dump
service if I could get a commitment from one or more people for a couple
hundred a month in funding).

"Typically the data sets in the repository are between 1 GB to 1 TB in
> size (based on the Amazon EBS volume limit), but we can work with you
> to host larger data sets as well. You must have the right to make the
> data freely available."


Yeah, if there was any demand for this, nothing's stopping someone from
setting it up on their own.



More information about the wikimedia-l mailing list