Hi,
----- Original Message -----
From: "Ariel T. Glenn" <ariel(a)wikimedia.org>
Date: Sunday, August 15, 2010 12:15 am
Subject: Re: [Xmldatadumps-l] Dumps, dumps, dumps
To: Jamie Morken <jmorken(a)shaw.ca>
Cc: emijrp <emijrp(a)gmail.com>om>, xmldatadumps-l(a)lists.wikimedia.org
Images take up 8T or more these days (of course that
includes deletes
and earlier versions but those aren't the bulk of it).
Hosting 8T
tarballs seems out of the question... who would download them anyways?
Having said that, hosting small subsets of images is qute
possible and
is something that has been discussed in the past. I would
love to hear
which subsets of images people want and would actually use.
There is the script wikix that people have used to manually download images from wikis:
http://meta.wikimedia.org/wiki/Wikix
It generates a list of all the images in an XML dump and then downloads them. The only
thing missing is the image scaling, without that the enwiki image dump will be too large
for most people to use right now. ImageMagick,
http://en.wikipedia.org/wiki/ImageMagick
could work to scale the various formats of images to smaller sizes.
Here's a script snippet I found using it in the bash shell:
#!/bin/sh
find /media/SHAWN\ IPOD/Songs/ -iname "*.png"| while read file;
do
convert -size 75x75 "$file" -resize 100x100 "cover.bmp"
cp cover.bmp "${file%/*}"/.
done
If wikimedia foundation provides a dump of images I think people will find good ways to
use them in interesting ways. Dumps of enwiki images with a max size of 640x480 or
800x600 and also enwiki thumbnails are the two subsets I think would be most valuable.
cheers,
Jamie
Ariel