I’d like to see the Commons backups available in the AMZN S3 cloud, even if it is only as
“requester pays”. Frankly, my experience is that getting data from the Internet Archive
is so slow that I wonder if they are on the Moon.
My infovore framework
http://github.com/paulhoule/infovore
is specifically designed to make Hadoop applications easy to run in your own cluster on in
a cluster provisioned automatically in Amazon EMR. In particular, an application can be
packaged in the S3 cloud and run by somebody with little Hadoop or AWS experience. This
makes handling “big data” much more accessible than it ever has been.
AMZN has had a policy of offering free S3 storage for public data sets – I’d like to see
them take this program to the next level with data sets of this nature.
From: Gerard Meijssen
Sent: Monday, October 14, 2013 4:38 PM
To: Wikimedia Commons Discussion List
Subject: Re: [Commons-l] [wikiteam-discuss:699] "Tarballs" of all 2004-2012
Commons files now available at
archive.org
Hoi,
Geni, sorry but there is a difference of their being a backup within the WMF of Commons
and there being a dataset of Commons at the IA that is not current. People can do all the
analysis they want on the old data and it will not make any difference. It will not make
the data that is currently in Commons any more accessible.
We have been told repeatedly that the data at the WMF is secure. Beyond that the data is
like knowing what the maximum is the insurance policy will pay. You know it will be not
enough. It is however very much a hypothetical question. How to make Commons usable is an
here and now issue.
Thanks,
GerardM
On 14 October 2013 22:22, geni <geniice(a)gmail.com> wrote:
On 14 October 2013 13:59, Gerard Meijssen <gerard.meijssen(a)gmail.com> wrote:
Hoi,
While I do agree that it is good to have the data in many places and, the Internet
Archive on its own moves it to several places as well. Many of us have seen the IA servers
at the Library of Alexandria.
While it is ok to find a use for the data at the IA, I would like us to concentrate
first and foremost on how we can make better use of the media that is in Commons itself.
How we can open it up to more use. Make Commons more accessable.
And you need to stop right there. As in don't express a further opinion until you
realise how wrong you are. You can't do any analysis on data that is lost. And non
backed up data is just data that doesn't know that it is lost yet.
--
geni
_______________________________________________
Commons-l mailing list
Commons-l(a)lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/commons-l
--------------------------------------------------------------------------------
_______________________________________________
Commons-l mailing list
Commons-l(a)lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/commons-l