[Xmldatadumps-l] Xmldatadumps-l Digest, Vol 21, Issue 2 - proposals for categories of Commons

Ariel T. Glenn ariel at wikimedia.org
Wed Nov 23 08:19:09 UTC 2011


Replying to a few different messages:

Reasons we can't just host bundles of monthly media dumps right now
include space, as we really don't have a place to put 17T more things to
download; even in 100-200gb batches it still comes out to the same 17T,
only getting worse over time. The media files are rsynced to a host in
eqiad IIRC so we could generate them there without impacting our
infrastructure, but we don't have somewhere for them to live, nor a host
to serve them.  Unlike our other web services, these would always be
huge files, one download would tie up a thread or process for some hours
or days.

For thumbs the situation is a bit more dire, as our current thumbs
server is quite fragile, and we can' really hand it a pile of requests
(or especially request a pile of new sizes from the scalers) without it
becoming unhappy and affecting the site.  The SWIFT replacement cannot
come swiftly enough.  See http://wikitech.wikimedia.org/view/Swift for
more info on that.

The thing about copying media onto disks for someone else is that it is
something we could do immediately.

Just to be very clear about it, I really want to have external mirrors,
copies, archives and everything else.  The more the better.

I'd be pretty psyched to host POTY packages for the years we are
missing. Is there an easy way to get the list of picture titles for a
given year? (Yeah, I don't know the category system over there, it's not
my home project :-p)

Ariel


Στις 22-11-2011, ημέρα Τρι, και ώρα 16:07 +0100, ο/η burslem έγραψε:
> > Ariel:
> >> Providing multiple terabyte sized files for download doesn't make
> any kind of sense to me.
> >> However, if we get concrete proposals for categories of Commons
> images people really want
> >> and would use, we can put those together. I think this has been
> said before on wikitech-l if not here.
> 
> 
> The Picture of the Year (POTY) collections are truly stunning! I am
> not very interested in having terabytes of random snapshots on my
> computer, instead I find smaller collections of "best of the best"
> much more suiting for the public. This way, it will be accessible to
> those with smaller amounts of diskspace and they'll be equally
> impressed. 
> 
> 
> I doubt there are many people interested in the image tarballs at all,
> they're just going for the principle of accessibility. Presumably
> wikipedia has plenty of back-up capabilities and there are enough
> gurus doing everything to prevent possible data loss. Offering public
> back-ups has no additional value in this perspective. Most of us
> probably use the wikipedia XML's for offline usage or research. I have
> not yet come across image research projects requiring tens of
> terabytes of images to be successful! 
> 
> 
> I say, if the POTY downloads are popular according to statistics, why
> not compile a couple more years? The thing I'm talking about is hosted
> at http://dumps.wikimedia.org/other/poty/ . 
> _______________________________________________
> Xmldatadumps-l mailing list
> Xmldatadumps-l at lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/xmldatadumps-l





More information about the Xmldatadumps-l mailing list