Ariel:
Providing multiple terabyte sized files for download doesn't make any
kind of sense to me.
However, if we get concrete proposals for categories of Commons
images people really want
and would use, we can put those together. I think this has been said
before on wikitech-l if not here.
The Picture of the Year (POTY) collections are truly stunning! I am not very interested in having terabytes of random snapshots on my computer, instead I find smaller collections of "best of the best" much more suiting for the public. This way, it will be accessible to those with smaller amounts of diskspace and they'll be equally impressed.
I doubt there are many people interested in the image tarballs at all, they're just going for the principle of accessibility. Presumably wikipedia has plenty of back-up capabilities and there are enough gurus doing everything to prevent possible data loss. Offering public back-ups has no additional value in this perspective. Most of us probably use the wikipedia XML's for offline usage or research. I have not yet come across image research projects requiring tens of terabytes of images to be successful!
I say, if the POTY downloads are popular according to statistics, why not compile a couple more years? The thing I'm talking about is hosted at http://dumps.wikimedia.org/other/poty/ .
On 22 November 2011 15:07, burslem burslem@gmail.com wrote:
I doubt there are many people interested in the image tarballs at all, they're just going for the principle of accessibility.
I think the main reason people are calling for complete image dumps is the desire that it should be possible to completely fork Wikipedia (most of them don't actually want to do so, but they want the safety net of being able to).
Replying to a few different messages:
Reasons we can't just host bundles of monthly media dumps right now include space, as we really don't have a place to put 17T more things to download; even in 100-200gb batches it still comes out to the same 17T, only getting worse over time. The media files are rsynced to a host in eqiad IIRC so we could generate them there without impacting our infrastructure, but we don't have somewhere for them to live, nor a host to serve them. Unlike our other web services, these would always be huge files, one download would tie up a thread or process for some hours or days.
For thumbs the situation is a bit more dire, as our current thumbs server is quite fragile, and we can' really hand it a pile of requests (or especially request a pile of new sizes from the scalers) without it becoming unhappy and affecting the site. The SWIFT replacement cannot come swiftly enough. See http://wikitech.wikimedia.org/view/Swift for more info on that.
The thing about copying media onto disks for someone else is that it is something we could do immediately.
Just to be very clear about it, I really want to have external mirrors, copies, archives and everything else. The more the better.
I'd be pretty psyched to host POTY packages for the years we are missing. Is there an easy way to get the list of picture titles for a given year? (Yeah, I don't know the category system over there, it's not my home project :-p)
Ariel
Στις 22-11-2011, ημέρα Τρι, και ώρα 16:07 +0100, ο/η burslem έγραψε:
Ariel:
Providing multiple terabyte sized files for download doesn't make
any kind of sense to me.
However, if we get concrete proposals for categories of Commons
images people really want
and would use, we can put those together. I think this has been
said before on wikitech-l if not here.
The Picture of the Year (POTY) collections are truly stunning! I am not very interested in having terabytes of random snapshots on my computer, instead I find smaller collections of "best of the best" much more suiting for the public. This way, it will be accessible to those with smaller amounts of diskspace and they'll be equally impressed.
I doubt there are many people interested in the image tarballs at all, they're just going for the principle of accessibility. Presumably wikipedia has plenty of back-up capabilities and there are enough gurus doing everything to prevent possible data loss. Offering public back-ups has no additional value in this perspective. Most of us probably use the wikipedia XML's for offline usage or research. I have not yet come across image research projects requiring tens of terabytes of images to be successful!
I say, if the POTY downloads are popular according to statistics, why not compile a couple more years? The thing I'm talking about is hosted at http://dumps.wikimedia.org/other/poty/ . _______________________________________________ Xmldatadumps-l mailing list Xmldatadumps-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/xmldatadumps-l
Στις 23-11-2011, ημέρα Τετ, και ώρα 10:19 +0200, ο/η Ariel T. Glenn έγραψε:
I'd be pretty psyched to host POTY packages for the years we are missing. Is there an easy way to get the list of picture titles for a given year? (Yeah, I don't know the category system over there, it's not my home project :-p)
And while digging around to try to put together a POTY 2010 bundle, I found a very curious thing: a few of the POTY images, voted on and everything, have since been deleted :-( Example:
http://commons.wikimedia.org/wiki/Commons:Picture_of_the_Year/2010/R1/File:D...
Ariel
POTY 2010 files are available; I hope I've included everything. I don't know what the setup was like in past years but this year I've included the xml of the description pages and the md5sums of the tar files. See http://dumps.wikimedia.org/other/
Ariel
On Wed, Nov 23, 2011 at 11:53 AM, Ariel T. Glenn ariel@wikimedia.org wrote:
POTY 2010 files are available; I hope I've included everything. I don't know what the setup was like in past years but this year I've included the xml of the description pages and the md5sums of the tar files. See http://dumps.wikimedia.org/other/
Ariel
The POTY2010 seem to be online now at http://dumps.wikimedia.org/other/poty/2010/ and everything seems to be far okay. The files displayed below are the files missing (deleted from wiki commons presumably) at the point of image tarball creation. The only real issue I have is the character-code mess up when there's non-English symbols in the filenames; like "Manuel_Reimóndez_Portela_-_A_Estrada_-_Galiza.jpg" for "Manuel Reimóndez Portela - A Estrada - Galiza-3.jpg".
" _!--File_Poster_Papaver_2a.jpg Charging_Leopard-001.jpg Darth_vader_hot_air_balloon.jpg Kloster_Ebrach_BW_5.jpg Panorama_Berliner_Olympiastadion-Glockenturm.jpg Sombrero,_Hubble_images.jpg Sonnenblume_Helianthus_2.jpg Téviec_Crane_Homme_Profil_Droit_II.jpg "
I hope others will appreciate the natural beauty of many of the images in the POTY packages! There's historic pictures and images about the Apollo mission, detailed charts about the South Pole, many sharp images of small insects and birds, photographs of mineral crystals and there's even a 217 megapixel scan of "The Garden of Earthly Delights"! It is a magnificent compilation featuring a wide variety of man-made art and information!
It makes me wonder, do the Picture-of-the-day images usually also end up in POTY, it'd be a shame if they were left out, but I'll just guess they're all in there.
xmldatadumps-l@lists.wikimedia.org