[Xmldatadumps-l] Fwd: Dumps, dumps, dumps

emijrp emijrp at gmail.com
Sun Aug 15 08:35:57 UTC 2010


Hi Ariel;

Thanks for your reply. I can't download 8TB at home, of course, but perhaps
people from university which want to research or mirroring the data can do
it. You can make image dumps by year (all images uploaded in 2005, all in
2006, ect), and upload them to Internet Archive (that folks rock). Also, I
have calculated that an image dump with all the 7M Commons images resized to
800x600 it would be ~500 GB, today.

On the other hand, you can publish image dumps for individual Wikipedias[1]
(look at the images column). Any legal problems with the English dump
containing fair use images?

Regards,
emijrp

[1]
http://meta.wikimedia.org/wiki/List_of_Wikipedias#All_Wikipedias_ordered_by_number_of_articles

2010/8/15 Ariel T. Glenn <ariel at wikimedia.org>

Στις 14-08-2010, ημέρα Σαβ, και ώρα 23:25 -0700, ο/η Jamie Morken
> έγραψε:
> >
> > Hi,
> >
> > ----- Original Message -----
> > From: emijrp <emijrp at gmail.com>
> > Date: Friday, August 13, 2010 4:48 am
> > Subject: [Xmldatadumps-l] Dumps, dumps, dumps
> > To: xmldatadumps-l at lists.wikimedia.org
> >
> > > Hi all;
> > >
> > > Yesterday, I wrote a post[1] with some links to current dumps,
> > > old dumps,
> > > and another raw data like Domas visits logs. Also, some links to
> > > InternetArchive where we can download some historical dumps.
> > > Please, can you share
> > > your links?
> > >
> > > Also, what about making a tarball with thumbnails from Commons?
> > > 800x600would be a nice (re)-solution, to avoid a TB dump. If
> > > not, probably it will
> > > never be published an image dump. Commons is growing ~5000
> > > images per day.
> > > It is scaring.
> >
> > Yes publicly available tarballs of image dumps would be great.  Here's
> > what I think it would take to implement:
> >
> > 1. allocate the server space for the image tarballs
> > 2. allocate the bandwidth for us to download them
> > 3. decide what tarballs will be made available (ie. separated by wiki
> > or whole commons, thumbnails or 800x600max, etc)
> > 3. write the script(s) for collecting the image lists, automating the
> > image scaling and creating the tarballs
> > 4. done!
> >
> > None of those tasks are really that difficult, the hard part is
> > figuring out why there used to be tarball images available but not
> > anymore, especially when apparently there is adequate server space and
> > bandwidth.  I guess it is one more thing that could break and then
> > people would complain about it not working.
> >
>
> Images take up 8T or more these days (of course that includes deletes
> and earlier versions but those aren't the bulk of it).  Hosting 8T
> tarballs seems out of the question... who would download them anyways?
>
> Having said that, hosting small subsets of images is qute possible and
> is something that has been discussed in the past.  I would love to hear
> which subsets of images people want and would actually use.
>
> Ariel
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.wikimedia.org/pipermail/xmldatadumps-l/attachments/20100815/5af07fd5/attachment.htm 


More information about the Xmldatadumps-l mailing list