Replying to a few different messages:
Reasons we can't just host bundles of monthly media dumps right now include space, as we really don't have a place to put 17T more things to download; even in 100-200gb batches it still comes out to the same 17T, only getting worse over time. The media files are rsynced to a host in eqiad IIRC so we could generate them there without impacting our infrastructure, but we don't have somewhere for them to live, nor a host to serve them. Unlike our other web services, these would always be huge files, one download would tie up a thread or process for some hours or days.
For thumbs the situation is a bit more dire, as our current thumbs server is quite fragile, and we can' really hand it a pile of requests (or especially request a pile of new sizes from the scalers) without it becoming unhappy and affecting the site. The SWIFT replacement cannot come swiftly enough. See http://wikitech.wikimedia.org/view/Swift for more info on that.
The thing about copying media onto disks for someone else is that it is something we could do immediately.
Just to be very clear about it, I really want to have external mirrors, copies, archives and everything else. The more the better.
I'd be pretty psyched to host POTY packages for the years we are missing. Is there an easy way to get the list of picture titles for a given year? (Yeah, I don't know the category system over there, it's not my home project :-p)
Ariel
Στις 22-11-2011, ημέρα Τρι, και ώρα 16:07 +0100, ο/η burslem έγραψε:
Ariel:
Providing multiple terabyte sized files for download doesn't make
any kind of sense to me.
However, if we get concrete proposals for categories of Commons
images people really want
and would use, we can put those together. I think this has been
said before on wikitech-l if not here.
The Picture of the Year (POTY) collections are truly stunning! I am not very interested in having terabytes of random snapshots on my computer, instead I find smaller collections of "best of the best" much more suiting for the public. This way, it will be accessible to those with smaller amounts of diskspace and they'll be equally impressed.
I doubt there are many people interested in the image tarballs at all, they're just going for the principle of accessibility. Presumably wikipedia has plenty of back-up capabilities and there are enough gurus doing everything to prevent possible data loss. Offering public back-ups has no additional value in this perspective. Most of us probably use the wikipedia XML's for offline usage or research. I have not yet come across image research projects requiring tens of terabytes of images to be successful!
I say, if the POTY downloads are popular according to statistics, why not compile a couple more years? The thing I'm talking about is hosted at http://dumps.wikimedia.org/other/poty/ . _______________________________________________ Xmldatadumps-l mailing list Xmldatadumps-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/xmldatadumps-l