WikiTeam[1] has released an update of the chronological archive of all Wikimedia Commons files, up to 2013. Now at ~34 TB total. https://archive.org/details/wikimediacommons I wrote to – I think – all the mirrors in the world, but apparently nobody is interested in such a mass of media apart from the Internet Archive (and the mirrorservice.org which took Kiwix). The solution is simple: take a small bite and preserve a copy yourself. One slice only takes one click, from your browser to your torrent client, and typically 20-40 GB on your disk (biggest slice 1400 GB, smallest 216 MB). https://en.wikipedia.org/wiki/User:Emijrp/Wikipedia_Archive#Image_tarballs
Nemo
P.s.: Please help spread the word everywhere.
Le 01/08/2014 16:42, Federico Leva (Nemo) a écrit :
WikiTeam[1] has released an update of the chronological archive of all Wikimedia Commons files, up to 2013. Now at ~34 TB total. https://archive.org/details/wikimediacommons I wrote to – I think – all the mirrors in the world, but apparently nobody is interested in such a mass of media apart from the Internet Archive (and the mirrorservice.org which took Kiwix). The solution is simple: take a small bite and preserve a copy yourself. One slice only takes one click, from your browser to your torrent client, and typically 20-40 GB on your disk (biggest slice 1400 GB, smallest 216 MB). https://en.wikipedia.org/wiki/User:Emijrp/Wikipedia_Archive#Image_tarballs
Hello,
Have you thought about contacting companies having massive storage such as Dropbox ? Maybe they will be happy to share a few TB :-]
Antoine Musso wrote:
Le 01/08/2014 16:42, Federico Leva (Nemo) a écrit :
WikiTeam[1] has released an update of the chronological archive of all Wikimedia Commons files, up to 2013. Now at ~34 TB total. https://archive.org/details/wikimediacommons I wrote to – I think – all the mirrors in the world, but apparently nobody is interested in such a mass of media apart from the Internet Archive (and the mirrorservice.org which took Kiwix). The solution is simple: take a small bite and preserve a copy yourself. One slice only takes one click, from your browser to your torrent client, and typically 20-40 GB on your disk (biggest slice 1400 GB, smallest 216 MB).
https://en.wikipedia.org/wiki/User:Emijrp/Wikipedia_Archive#Image_tarbal ls
Hello,
Have you thought about contacting companies having massive storage such as Dropbox ? Maybe they will be happy to share a few TB :-]
I believe Amazon has donated space at some point, but I don't know (m)any details. I very briefly searched around and found https://wikitech.wikimedia.org/wiki/Amazon_Public_Data_Sets.
MZMcBride
Yes, "all the mirrors in the world" included Amazon. No reply from them either and I'm not going to write companies who don't have a mirroring program/an explicit interest in the offer. It's appreciated if others do, though.
Nemo
Federico Leva (Nemo) wrote:
Yes, "all the mirrors in the world" included Amazon. No reply from them either and I'm not going to write companies who don't have a mirroring program/an explicit interest in the offer. It's appreciated if others do, though.
Yeah, 34 TB is still a lot of data, unfortunately. I think most people reading this list recognize and appreciate this. (I actually have a draft e-mail about Dispenser requesting 24 TB just a few weeks ago....)
I'd personally like to see a price breakdown for this project. Doing a bit of quick research, it sounds like storage alone would probably cost maybe $4,000 USD, but it depends whether you're buying individual 2 TB drives or you're buying larger 20 TB drives. More than this, though, is the ongoing and recurring costs assuming you want to keep this data online. Is having this (backup) data be available online an explicit goal here? Or is the primary goal simply to have an offline backup of this data?
In either case (online or offline), a price breakdown would help nearly any volunteer organization (such as a Wikimedia chapter) decide whether to help in this effort. Crowd-sourcing the funding for this project is also a possibility, either via individual donations (Kickstarter, perhaps) or via small grants from various Internet-related or free content-related organizations (EFF, Mozilla, Wikimedia, et al.).
Soliciting money for this project requires a much clearer, detailed plan. The current shoe-string strategy of everyone downloading a piece of the 34 TB is certainly romantic, but it also seems to be impractical and silly.
MZMcBride
Thanks MZ for your suggestion to ask money. I'm not interested. For those who are: https://meta.wikimedia.org/wiki/Grants:IdeaLab/Commons_tarballs_seedbox
Nemo
When we get Wikimedia Cascadia (name decision still pending) approved and have our legal paperwork in order we could potentially host a Commons or other Wikimedia backup. I think this would be doable if we can work out the legal issues and WMF approves a GAC request for some cheap storage.
Pine On Aug 2, 2014 9:51 AM, "Federico Leva (Nemo)" nemowiki@gmail.com wrote:
Thanks MZ for your suggestion to ask money. I'm not interested. For those who are: https://meta.wikimedia.org/wiki/Grants:IdeaLab/Commons_tarballs_seedbox
Nemo
Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
On Aug 2, 2014 8:17 PM, "Pine W" wiki.pine@gmail.com wrote:
I think this would be doable if we can work out the legal issues and WMF approves a GAC request for some cheap storage.
What legal issues do you envision?
-Jeremy
No offense, I would prefer that Cascadia discuss potential legal issues privately with WMF before we start speculating in public.
There is probably a way to make this successful in the end.
Pine On Aug 2, 2014 5:19 PM, "Jeremy Baron" jeremy@tuxmachine.com wrote:
On Aug 2, 2014 8:17 PM, "Pine W" wiki.pine@gmail.com wrote:
I think this would be doable if we can work out the legal issues and WMF approves a GAC request for some cheap storage.
What legal issues do you envision?
-Jeremy _______________________________________________ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
On Aug 2, 2014 8:25 PM, "Pine W" wiki.pine@gmail.com wrote:
No offense, I would prefer that Cascadia discuss potential legal issues privately with WMF before we start speculating in public.
There is probably a way to make this successful in the end.
I find that rather confusing.
As the legal team's email footers say, they are not your lawyer (and not your chapter's lawyer either).
I can't think of any legal issues you'd encounter besides the ones WMF already deals with. (copyright/trademark/defamation/trade secrets/national security/CDA 230/DMCA/etc.) If you want legal advice on any of those issues then you need to consult counsel outside WMF.
Or maybe there's a concern I haven't imagined yet.
-Jeremy
Yes, those plus a few others. Yes WMF counsel can't act as Cascadia's counsel but I would want to see what terms WMF might offer like indemnifying Cascadia for any issues relating to archiving the Commons content.
Pine On Aug 2, 2014 5:32 PM, "Jeremy Baron" jeremy@tuxmachine.com wrote:
On Aug 2, 2014 8:25 PM, "Pine W" wiki.pine@gmail.com wrote:
No offense, I would prefer that Cascadia discuss potential legal issues privately with WMF before we start speculating in public.
There is probably a way to make this successful in the end.
I find that rather confusing.
As the legal team's email footers say, they are not your lawyer (and not your chapter's lawyer either).
I can't think of any legal issues you'd encounter besides the ones WMF already deals with. (copyright/trademark/defamation/trade secrets/national security/CDA 230/DMCA/etc.) If you want legal advice on any of those issues then you need to consult counsel outside WMF.
Or maybe there's a concern I haven't imagined yet.
-Jeremy _______________________________________________ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
S3 prices have dropped since that page was last modified so the 2200 quoted is about 1200 now a month. If the data doesn't need retrieved except for rare cases, putting in on Glacier would drop it closer to 350-400 On Aug 2, 2014 12:51 PM, "Federico Leva (Nemo)" nemowiki@gmail.com wrote:
Thanks MZ for your suggestion to ask money. I'm not interested. For those who are: https://meta.wikimedia.org/wiki/Grants:IdeaLab/Commons_tarballs_seedbox
Nemo
Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
The issue of mirroring Wikimedia content has been discussed with a number of scholarly institutions engaged in data-rich research, and the response was generally of the "send us the specs, and we will see what we can do" kind.
I would be interested in giving this another go if someone could provide me with those specs, preferably for Wikimedia projects as a whole as well as broken down by individual projects or languages or timestamps etc.
The WikiTeam's Commons archive would make for a good test dataset.
Daniel
-- http://www.naturkundemuseum-berlin.de/en/institution/mitarbeiter/mietchen-da... https://en.wikipedia.org/wiki/User:Daniel_Mietchen/Publications http://okfn.org http://wikimedia.org
On Fri, Aug 1, 2014 at 4:42 PM, Federico Leva (Nemo) nemowiki@gmail.com wrote:
WikiTeam[1] has released an update of the chronological archive of all Wikimedia Commons files, up to 2013. Now at ~34 TB total. https://archive.org/details/wikimediacommons I wrote to – I think – all the mirrors in the world, but apparently nobody is interested in such a mass of media apart from the Internet Archive (and the mirrorservice.org which took Kiwix). The solution is simple: take a small bite and preserve a copy yourself. One slice only takes one click, from your browser to your torrent client, and typically 20-40 GB on your disk (biggest slice 1400 GB, smallest 216 MB). https://en.wikipedia.org/wiki/User:Emijrp/Wikipedia_Archive#Image_tarballs
Nemo
P.s.: Please help spread the word everywhere.
[1] https://github.com/WikiTeam/wikiteam
Commons-l mailing list Commons-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/commons-l
Daniel Mietchen, 03/08/2014 03:57:
The issue of mirroring Wikimedia content has been discussed with a number of scholarly institutions engaged in data-rich research, and the response was generally of the "send us the specs, and we will see what we can do" kind.
I would be interested in giving this another go if someone could provide me with those specs, preferably for Wikimedia projects as a whole as well as broken down by individual projects or languages or timestamps etc.
The WikiTeam's Commons archive would make for a good test dataset.
Ariel keeps https://meta.wikimedia.org/wiki/Mirroring_Wikimedia_project_XML_dumps#Requir... up to date. Anything else needed?
Nemo
That seems to be sufficient to get things rolling. I will give it a try. Thanks! -- http://www.naturkundemuseum-berlin.de/en/institution/mitarbeiter/mietchen-da... https://en.wikipedia.org/wiki/User:Daniel_Mietchen/Publications http://okfn.org http://wikimedia.org
On Sun, Aug 3, 2014 at 9:18 AM, Federico Leva (Nemo) nemowiki@gmail.com wrote:
Daniel Mietchen, 03/08/2014 03:57:
The issue of mirroring Wikimedia content has been discussed with a number of scholarly institutions engaged in data-rich research, and the response was generally of the "send us the specs, and we will see what we can do" kind.
I would be interested in giving this another go if someone could provide me with those specs, preferably for Wikimedia projects as a whole as well as broken down by individual projects or languages or timestamps etc.
The WikiTeam's Commons archive would make for a good test dataset.
Ariel keeps https://meta.wikimedia.org/wiki/Mirroring_Wikimedia_project_XML_dumps#Requir... up to date. Anything else needed?
Nemo
Commons-l mailing list Commons-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/commons-l
wikitech-l@lists.wikimedia.org