2012/4/3 Alex Buie <abuie@archive.org>
No, they don't currently, but I'm working with the your.org guys to
get a copy of the mirror mounted as NFS, and then I should be able to
combine that with the XML dumps for the images to process media from
certain time periods.

I wonder how well python's lxml handles multigigabyte XML files...
Guess we'll see :)


Pywikipediabot uses cElementTree for Python, which is fast as hell.
 
Alex

2012/4/3 emijrp <emijrp@gmail.com>:
> Does rsync mirror allow to download Commons images by date? I mean, in
> day-by-day packages. That was the method we wanted to use at WikiTeam to
> archive Wikimedia Commons.
>
> 2012/4/3 Ariel T. Glenn <ariel@wikimedia.org>
>>
>> Στις 02-04-2012, ημέρα Δευ, και ώρα 23:43 +0200, ο/η Platonides έγραψε:
>> > On 02/04/12 21:55, Ariel T. Glenn wrote:
>> > > For "dumps" of images, we have no such thing; this rsync mirror is the
>> > > first thing out of the gate and we can't possibly generate multiple
>> > > copies of it on different dates as we do for the xml dumps.
>> >
>> > That's not too hard to do. You just copy the image tree with hardlinks,
>> > making a version. Then the next rsync will only replace modified images.
>> > (Unless you manually added the --inplace parameter, but in such case you
>> > supposedly know what you're doing)
>> > You could also use --link-dest instead of manaully building hardlink
>> > copies.
>>
>> Yes, that would work fine under the existing setup; what I don't know
>> and what needs to be figured out is what we will do when images are
>> moved into swift, Real Soon Now.
>>
>> Ariel
>>
>>
>>
>>
>>
>> _______________________________________________
>> Xmldatadumps-l mailing list
>> Xmldatadumps-l@lists.wikimedia.org
>> https://lists.wikimedia.org/mailman/listinfo/xmldatadumps-l
>
>
>
> _______________________________________________
> Xmldatadumps-l mailing list
> Xmldatadumps-l@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/xmldatadumps-l
>