It was not the original intention of us at WikiTeam to create these media
tarballs so that researchers can use them from there. We created these
tarballs so that everyone in the Wikimedia movement can be rest assured
that there is one backup copy of their media on the Internet Archive. Trust
me, the number of people who are going to actually use these tarballs are
going to be lesser than the number of people editing the smaller wikis
combined, certainly everyone is going to be using the data on Commons
itself. So, we can fully focus on improving Commons to make it more
data-accessible without taking the risk of having people working on the
tarballs on the Internet Archive for research instead.
That being said, we can't even guarantee that the images in these tarballs
are up-to-date. They are all downloaded and should be regarded as a
snapshot of the image at the time of download, not an effective live backup
of all the images on Commons. We are looking into creating subsequent
tarballs that take into account the new uploads and the re-uploads so that
Commons is actually backed up.
I guess the way we presented the tarballs on the Internet Archive is enough
to deter anyone from conducting research directly from it, unless he/she
does an in-depth mining of the data to get what he/she wants, but it
certainly is going to be much tougher than mining the information from
Commons directly in its current state.
On Mon, Oct 14, 2013 at 8:59 PM, Gerard Meijssen
<gerard.meijssen(a)gmail.com>wrote;wrote:
Hoi,
While I do agree that it is good to have the data in many places and, the
Internet Archive on its own moves it to several places as well. Many of us
have seen the IA servers at the Library of Alexandria.
While it is ok to find a use for the data at the IA, I would like us to
concentrate first and foremost on how we can make better use of the media
that is in Commons itself. How we can open it up to more use. Make Commons
more accessable.
Do realise that when there is a good use for all the data that is in the
IA, the same use and more could be made with the larger amount of data that
is in Commons itself.
Thanks,
GerardM
On 14 October 2013 14:26, Federico Leva (Nemo) <nemowiki(a)gmail.com> wrote:
Emilio J. Rodríguez-Posada, 14/10/2013 14:18:
Internet Archive has this problem in several other topics, like its
Wayback Machine, there is not search engine to
search the billions
grabbed websites by keyword of whatever.
Internet Archive is a pile of hard disks and a time capsule with
backups, and they try to do the best at showing the materials (media
players, pdf viewers), but it is not always easy or possible.
...and that's why Hay said we need someone with a good idea. :)
Now it's easy to download the dataset (though it's not perfect), of
course this doesn't automatically make something cool happen with it.
Except replication of the data in multiple places, which is a good thing in
itself.
Nemo
______________________________**_________________
Commons-l mailing list
Commons-l(a)lists.wikimedia.org
https://lists.wikimedia.org/**mailman/listinfo/commons-l<https://lists.w…
_______________________________________________
Commons-l mailing list
Commons-l(a)lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/commons-l
--
Regards,
Hydriz
Be social, follow/add me:
Facebook:
http://tinyurl.com/hydrizfb
Google+:
http://tinyurl.com/hydrizgl
Twitter: @hydrizwiki