No, they don't currently, but I'm working with
the
your.org guys to
get a copy of the mirror mounted as NFS, and then I should be able to
combine that with the XML dumps for the images to process media from
certain time periods.
I wonder how well python's lxml handles multigigabyte XML files...
Guess we'll see :)
Alex
2012/4/3 emijrp <emijrp(a)gmail.com>om>:
Does rsync mirror allow to download Commons
images by date? I mean, in
day-by-day packages. That was the method we wanted to use at WikiTeam to
archive Wikimedia Commons.
2012/4/3 Ariel T. Glenn <ariel(a)wikimedia.org>
>
> Στις 02-04-2012, ημέρα Δευ, και ώρα 23:43 +0200, ο/η Platonides έγραψε:
> > On 02/04/12 21:55, Ariel T. Glenn wrote:
> > > For "dumps" of images, we have no such thing; this rsync mirror
is
the
> > > first thing out of the gate and we
can't possibly generate multiple
> > > copies of it on different dates as we do for the xml dumps.
> >
> > That's not too hard to do. You just copy the image tree with
hardlinks,
> > making a version. Then the next rsync
will only replace modified
images.
> > (Unless you manually added the --inplace
parameter, but in such case
you
supposedly know what you're doing)
You could also use --link-dest instead of manaully building hardlink
copies.
Yes, that would work fine under the existing setup; what I don't know
and what needs to be figured out is what we will do when images are
moved into swift, Real Soon Now.
Ariel
_______________________________________________
Xmldatadumps-l mailing list
Xmldatadumps-l(a)lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/xmldatadumps-l
_______________________________________________
Xmldatadumps-l mailing list
Xmldatadumps-l(a)lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/xmldatadumps-l