No, they don't currently, but I'm working with the your.org guys to get a copy of the mirror mounted as NFS, and then I should be able to combine that with the XML dumps for the images to process media from certain time periods.
I wonder how well python's lxml handles multigigabyte XML files... Guess we'll see :)
Alex
2012/4/3 emijrp emijrp@gmail.com:
Does rsync mirror allow to download Commons images by date? I mean, in day-by-day packages. That was the method we wanted to use at WikiTeam to archive Wikimedia Commons.
2012/4/3 Ariel T. Glenn ariel@wikimedia.org
Στις 02-04-2012, ημέρα Δευ, και ώρα 23:43 +0200, ο/η Platonides έγραψε:
On 02/04/12 21:55, Ariel T. Glenn wrote:
For "dumps" of images, we have no such thing; this rsync mirror is the first thing out of the gate and we can't possibly generate multiple copies of it on different dates as we do for the xml dumps.
That's not too hard to do. You just copy the image tree with hardlinks, making a version. Then the next rsync will only replace modified images. (Unless you manually added the --inplace parameter, but in such case you supposedly know what you're doing) You could also use --link-dest instead of manaully building hardlink copies.
Yes, that would work fine under the existing setup; what I don't know and what needs to be figured out is what we will do when images are moved into swift, Real Soon Now.
Ariel
Xmldatadumps-l mailing list Xmldatadumps-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/xmldatadumps-l
Xmldatadumps-l mailing list Xmldatadumps-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/xmldatadumps-l