WikiTeam has just finished archiving all Wikimedia Commons files up to 2012 (and some more) on the Internet Archive: https://archive.org/details/wikimediacommons So far it's about 24 TB of archives and there are also a hundred torrents you can help seed, ranging from few hundred MB to over a TB, most around 400 GB. Everything is documented at https://meta.wikimedia.org/wiki/Mirroring_Wikimedia_project_XML_dumps#Media_tarballs and if you want here are some ideas to help WikiTeam with coding: https://code.google.com/p/wikiteam/issues/list.
Nemo
Nice work Nemo!
2013/10/13 Federico Leva (Nemo) nemowiki@gmail.com
WikiTeam has just finished archiving all Wikimedia Commons files up to 2012 (and some more) on the Internet Archive: https://archive.org/details/ **wikimediacommons https://archive.org/details/wikimediacommons So far it's about 24 TB of archives and there are also a hundred torrents you can help seed, ranging from few hundred MB to over a TB, most around 400 GB. Everything is documented at <https://meta.wikimedia.org/** wiki/Mirroring_Wikimedia_**project_XML_dumps#Media_**tarballshttps://meta.wikimedia.org/wiki/Mirroring_Wikimedia_project_XML_dumps#Media_tarballs> and if you want here are some ideas to help WikiTeam with coding: < https://code.google.com/p/**wikiteam/issues/listhttps://code.google.com/p/wikiteam/issues/list
.
Nemo
-- You received this message because you are subscribed to the Google Groups "wikiteam-discuss" group. To unsubscribe from this group and stop receiving emails from it, send an email to wikiteam-discuss+unsubscribe@**googlegroups.comwikiteam-discuss%2Bunsubscribe@googlegroups.com . For more options, visit https://groups.google.com/**groups/opt_outhttps://groups.google.com/groups/opt_out .
Hi Frederico,
This is great news! I have two questions though:
1. What happens to files deleted after your crawler retrieved them? I suppose they will still be available in the archives. 2. Is the archive team willing to host 3rd party, specialized downloads, such as all the pictures from WLM (or all the pictures with monuments from a certain country?)
Thanks, Strainu
2013/10/13 Federico Leva (Nemo) nemowiki@gmail.com:
WikiTeam has just finished archiving all Wikimedia Commons files up to 2012 (and some more) on the Internet Archive: https://archive.org/details/wikimediacommons So far it's about 24 TB of archives and there are also a hundred torrents you can help seed, ranging from few hundred MB to over a TB, most around 400 GB. Everything is documented at https://meta.wikimedia.org/wiki/Mirroring_Wikimedia_project_XML_dumps#Media_tarballs and if you want here are some ideas to help WikiTeam with coding: https://code.google.com/p/wikiteam/issues/list.
Nemo
Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Hoi,
Deleted files AFTER the creation of the "tarballs" were created will always be part of the tarball. If you create logic that works on these archives, it is likely that they will also work on the live data at Commons...
MY QUESTION... Yes, it is good to have a backup somewhere. However, what is the point working on old data when the new data is available?
Thanks, Gerard
On 16 October 2013 06:02, Strainu strainu10@gmail.com wrote:
Hi Frederico,
This is great news! I have two questions though:
- What happens to files deleted after your crawler retrieved them? I
suppose they will still be available in the archives. 2. Is the archive team willing to host 3rd party, specialized downloads, such as all the pictures from WLM (or all the pictures with monuments from a certain country?)
Thanks, Strainu
2013/10/13 Federico Leva (Nemo) nemowiki@gmail.com:
WikiTeam has just finished archiving all Wikimedia Commons files up to
2012
(and some more) on the Internet Archive: https://archive.org/details/wikimediacommons So far it's about 24 TB of archives and there are also a hundred torrents you can help seed, ranging from few hundred MB to over a TB, most around
400
GB. Everything is documented at <
https://meta.wikimedia.org/wiki/Mirroring_Wikimedia_project_XML_dumps#Media_...
and if you want here are some ideas to help WikiTeam with coding: https://code.google.com/p/wikiteam/issues/list.
Nemo
Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
wikitech-l@lists.wikimedia.org