For space requirements etc for the xml dumps, see: http://meta.wikimedia.org/wiki/Mirroring_Wikimedia_project_XML_dumps (these figures are pretty up to date).
For "dumps" of images, we have no such thing; this rsync mirror is the first thing out of the gate and we can't possibly generate multiple copies of it on different dates as we do for the xml dumps.
Come find me on irc (wikimedia-tech on freenode) or send me an email off list and we can talk through the technical end of things.
I'm already working on automating copying up the xml archives to you guys at 6month intervals or so.
Ariel
Στις 02-04-2012, ημέρα Δευ, και ώρα 15:38 -0400, ο/η Alex Buie έγραψε:
Hey guys,
Sorry for breaking the thread, but I just subscribed, so I think this'll probably break mailman's threading headers.
This is very exciting news, and IA would love to have a copy! We're more interested in being a historical mirror (on our item infrastructure), rather than a live rsync/http/ftp mirror, but perhaps we can also work something out mirroring the latest dumps. (How big are the last 2 or so?)
I suppose the next step is for me and Ariel to talk about technical procedures and details, et cetera, but I just wanted to subscribe to this ml and introduce myself.
Ariel, when you have a minute to chat, shoot me an email (or skype). I'm thinking we just pull things at whatever frequency you guys push out the data to your.org (which may or may not be scheduled yet) and throw them into new items on the cluster.
Others' thoughts are, of course, always welcome.
Thanks!
Alex Buie Collections Group Internet Archive, a registered California non-profit library abuie@archive.org
Xmldatadumps-l mailing list Xmldatadumps-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/xmldatadumps-l