Hi,

----- Original Message -----
From: "Ariel T. Glenn" <ariel@wikimedia.org>
Date: Sunday, August 15, 2010 12:15 am
Subject: Re: [Xmldatadumps-l] Dumps, dumps, dumps
To: Jamie Morken <jmorken@shaw.ca>
Cc: emijrp <emijrp@gmail.com>, xmldatadumps-l@lists.wikimedia.org

> Images take up 8T or more these days (of course that includes deletes
> and earlier versions but those aren't the bulk of it). 
> Hosting 8T
> tarballs seems out of the question... who would download them anyways?
>
> Having said that, hosting small subsets of images is qute
> possible and
> is something that has been discussed in the past.  I would
> love to hear
> which subsets of images people want and would actually use.

There is the script wikix that people have used to manually download images from wikis:

http://meta.wikimedia.org/wiki/Wikix

It generates a list of all the images in an XML dump and then downloads them.  The only thing missing is the image scaling, without that the enwiki image dump will be too large for most people to use right now.  ImageMagick, http://en.wikipedia.org/wiki/ImageMagick could work to scale the various formats of images to smaller sizes.

Here's a script snippet I found using it in the bash shell:

#!/bin/sh
find /media/SHAWN\ IPOD/Songs/ -iname "*.png"| while read file;
do
convert -size 75x75 "$file" -resize 100x100 "cover.bmp"
cp cover.bmp "${file%/*}"/.
done

If wikimedia foundation provides a dump of images I think people will find good ways to use them in interesting ways.  Dumps of enwiki images with a max size of 640x480 or 800x600 and also enwiki thumbnails are the two subsets I think would be most valuable.

cheers,
Jamie


>
> Ariel
>
>