I’d like to see the Commons backups available in the AMZN S3 cloud, even if it is only as “requester pays”. Frankly, my experience is that getting data from the Internet Archive is so slow that I wonder if they are on the Moon.

My infovore framework

is specifically designed to make Hadoop applications easy to run in your own cluster on in a cluster provisioned automatically in Amazon EMR. In particular, an application can be packaged in the S3 cloud and run by somebody with little Hadoop or AWS experience. This makes handling “big data” much more accessible than it ever has been.

AMZN has had a policy of offering free S3 storage for public data sets – I’d like to see them take this program to the next level with data sets of this nature.

From: Gerard Meijssen

Sent: Monday, October 14, 2013 4:38 PM

To: Wikimedia Commons Discussion List

Subject: Re: [Commons-l] [wikiteam-discuss:699] "Tarballs" of all 2004-2012 Commons files now available at archive.org

Hoi,

Geni, sorry but there is a difference of their being a backup within the WMF of Commons and there being a dataset of Commons at the IA that is not current. People can do all the analysis they want on the old data and it will not make any difference. It will not make the data that is currently in Commons any more accessible.

We have been told repeatedly that the data at the WMF is secure. Beyond that the data is like knowing what the maximum is the insurance policy will pay. You know it will be not enough. It is however very much a hypothetical question. How to make Commons usable is an here and now issue.

Thanks,

GerardM

On 14 October 2013 22:22, geni <geniice@gmail.com> wrote:

On 14 October 2013 13:59, Gerard Meijssen <gerard.meijssen@gmail.com> wrote:

Hoi,
While I do agree that it is good to have the data in many places and, the Internet Archive on its own moves it to several places as well. Many of us have seen the IA servers at the Library of Alexandria.

While it is ok to find a use for the data at the IA, I would like us to concentrate first and foremost on how we can make better use of the media that is in Commons itself. How we can open it up to more use. Make Commons more accessable.

And you need to stop right there. As in don't express a further opinion until you realise how wrong you are. You can't do any analysis on data that is lost. And non backed up data is just data that doesn't know that it is lost yet.

--
geni

_______________________________________________
Commons-l mailing list
Commons-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/commons-l

_______________________________________________
Commons-l mailing list
Commons-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/commons-l