Kate:
i've started running an image dump for en.wp using
a version of trickle with
large-file support (the last one died after 2GB). if this works i'll set
up regular image dumps again along with the db backups.
the copy is running slowly so as not to overload the fileserver, so dumps
may not be entirely up to date when done.
This is great news, but it's also worth noting that this will not
include images from the Commons, so if you're trying to set up a
complete mirror, this will be increasingly difficult as the free
material gets moved over there. The Commons dump itself will be
prohibitively large for most users.
To address this, I've written a very basic Perl script a while ago that
makes a dump of a wiki's images that are only in the Commons. It's in
/home/erik/extractdb.pl. I'm sure it could be done a lot faster, though.
Ideally, such a solution could be used to create combined dumps that
include *all* the images used in a particular wiki.
From a legal standpoint, we have to be careful with distributing image
dumps separately from the metadata that includes the licensing
information, as many licenses prohibit this.
Erik