The issue of mirroring Wikimedia content has been discussed with a number of scholarly institutions engaged in data-rich research, and the response was generally of the "send us the specs, and we will see what we can do" kind.
I would be interested in giving this another go if someone could provide me with those specs, preferably for Wikimedia projects as a whole as well as broken down by individual projects or languages or timestamps etc.
The WikiTeam's Commons archive would make for a good test dataset.
Daniel
-- http://www.naturkundemuseum-berlin.de/en/institution/mitarbeiter/mietchen-da... https://en.wikipedia.org/wiki/User:Daniel_Mietchen/Publications http://okfn.org http://wikimedia.org
On Fri, Aug 1, 2014 at 4:42 PM, Federico Leva (Nemo) nemowiki@gmail.com wrote:
WikiTeam[1] has released an update of the chronological archive of all Wikimedia Commons files, up to 2013. Now at ~34 TB total. https://archive.org/details/wikimediacommons I wrote to – I think – all the mirrors in the world, but apparently nobody is interested in such a mass of media apart from the Internet Archive (and the mirrorservice.org which took Kiwix). The solution is simple: take a small bite and preserve a copy yourself. One slice only takes one click, from your browser to your torrent client, and typically 20-40 GB on your disk (biggest slice 1400 GB, smallest 216 MB). https://en.wikipedia.org/wiki/User:Emijrp/Wikipedia_Archive#Image_tarballs
Nemo
P.s.: Please help spread the word everywhere.
[1] https://github.com/WikiTeam/wikiteam
Commons-l mailing list Commons-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/commons-l