Replying to a few different messages:
Reasons we can't just host bundles of monthly media dumps right now
include space, as we really don't have a place to put 17T more things to
download; even in 100-200gb batches it still comes out to the same 17T,
only getting worse over time. The media files are rsynced to a host in
eqiad IIRC so we could generate them there without impacting our
infrastructure, but we don't have somewhere for them to live, nor a host
to serve them. Unlike our other web services, these would always be
huge files, one download would tie up a thread or process for some hours
For thumbs the situation is a bit more dire, as our current thumbs
server is quite fragile, and we can' really hand it a pile of requests
(or especially request a pile of new sizes from the scalers) without it
becoming unhappy and affecting the site. The SWIFT replacement cannot
come swiftly enough. See http://wikitech.wikimedia.org/view/Swift
more info on that.
The thing about copying media onto disks for someone else is that it is
something we could do immediately.
Just to be very clear about it, I really want to have external mirrors,
copies, archives and everything else. The more the better.
I'd be pretty psyched to host POTY packages for the years we are
missing. Is there an easy way to get the list of picture titles for a
given year? (Yeah, I don't know the category system over there, it's not
my home project :-p)
Στις 22-11-2011, ημέρα Τρι, και ώρα 16:07 +0100, ο/η burslem έγραψε:
> Providing multiple terabyte sized files for download doesn't make
of sense to me.
> However, if we get concrete proposals for
categories of Commons
images people really want
> and would use, we can put those together. I
think this has been
said before on wikitech-l if not here.
The Picture of the Year (POTY) collections are truly stunning! I am
not very interested in having terabytes of random snapshots on my
computer, instead I find smaller collections of "best of the best"
much more suiting for the public. This way, it will be accessible to
those with smaller amounts of diskspace and they'll be equally
I doubt there are many people interested in the image tarballs at all,
they're just going for the principle of accessibility. Presumably
wikipedia has plenty of back-up capabilities and there are enough
gurus doing everything to prevent possible data loss. Offering public
back-ups has no additional value in this perspective. Most of us
probably use the wikipedia XML's for offline usage or research. I have
not yet come across image research projects requiring tens of
terabytes of images to be successful!
I say, if the POTY downloads are popular according to statistics, why
not compile a couple more years? The thing I'm talking about is hosted
Xmldatadumps-l mailing list