Re: [Commons-l] [wikiteam-discuss:699] "Tarballs" of all 2004-2012 Commons files now available at archive.org

14 Oct 2013

It was not the original intention of us at WikiTeam to create these media
tarballs so that researchers can use them from there. We created these
tarballs so that everyone in the Wikimedia movement can be rest assured
that there is one backup copy of their media on the Internet Archive. Trust
me, the number of people who are going to actually use these tarballs are
going to be lesser than the number of people editing the smaller wikis
combined, certainly everyone is going to be using the data on Commons
itself. So, we can fully focus on improving Commons to make it more
data-accessible without taking the risk of having people working on the
tarballs on the Internet Archive for research instead.

That being said, we can't even guarantee that the images in these tarballs
are up-to-date. They are all downloaded and should be regarded as a
snapshot of the image at the time of download, not an effective live backup
of all the images on Commons. We are looking into creating subsequent
tarballs that take into account the new uploads and the re-uploads so that
Commons is actually backed up.

I guess the way we presented the tarballs on the Internet Archive is enough
to deter anyone from conducting research directly from it, unless he/she
does an in-depth mining of the data to get what he/she wants, but it
certainly is going to be much tougher than mining the information from
Commons directly in its current state.

On Mon, Oct 14, 2013 at 8:59 PM, Gerard Meijssen
&lt;gerard.meijssen(a)gmail.com&gt;wrote;wrote:

...
  Hoi,
 While I do agree that it is good to have the data in many places and, the
 Internet Archive on its own moves it to several places as well. Many of us
 have seen the IA servers at the Library of Alexandria.

 While it is ok to find a use for the data at the IA, I would like us to
 concentrate first and foremost on how we can make better use of the media
 that is in Commons itself. How we can open it up to more use. Make Commons
 more accessable.

 Do realise that when there is a good use for all the data that is in the
 IA, the same use and more could be made with the larger amount of data that
 is in Commons itself.
 Thanks,
        GerardM

 On 14 October 2013 14:26, Federico Leva (Nemo) &lt;nemowiki(a)gmail.com&gt; wrote:

  Emilio J. Rodríguez-Posada, 14/10/2013 14:18:

  Internet Archive has this problem in several other topics, like its
  Wayback Machine, there is not search engine to
search the billions
 grabbed websites by keyword of whatever.

 Internet Archive is a pile of hard disks and a time capsule with
 backups, and they try to do the best at showing the materials (media
 players, pdf viewers), but it is not always easy or possible.

 ...and that's why Hay said we need someone with a good idea. :)
 Now it's easy to download the dataset (though it's not perfect), of
 course this doesn't automatically make something cool happen with it.
 Except replication of the data in multiple places, which is a good thing in
 itself.

 Nemo

 ______________________________**_________________
 Commons-l mailing list
 Commons-l(a)lists.wikimedia.org

https://lists.wikimedia.org/**mailman/listinfo/commons-l<https://lists.w…

 _______________________________________________
 Commons-l mailing list
 Commons-l(a)lists.wikimedia.org
 https://lists.wikimedia.org/mailman/listinfo/commons-l

-- 
Regards,
Hydriz

Be social, follow/add me:
Facebook: http://tinyurl.com/hydrizfb
Google+: http://tinyurl.com/hydrizgl
Twitter: @hydrizwiki

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

Re: [Commons-l] [wikiteam-discuss:699] "Tarballs" of all 2004-2012 Commons files now available at archive.org