On Thu, Sep 9, 2010 at 10:54 PM, Jamie Morken jmorken@shaw.ca wrote:
Hi all,
This is a preliminary list of what needs to be done to generate images dumps. If anyone can help with #2 to provide the access log of image usage stats please send me an email!
run wikix to generate list of images for a given wiki ie. enwiki
sort the image list based on usage frequency from access log files
Hi,
It will be great to have these image dumps ! I wonder if a different dump my be worth it for a different scenario:
* User only wants to get the photos for a small set of ids i.e. 1000 pages
What would be the proper way to get these photos without downloading large dumps ?
a. Parse the actual html pages and get the actual image urls (plus license info and then download the images) ?
b. Try to find the actual image urls using the commons wikitext dump (and parse license info, ..) ?
Both approaches seem complicated so maybe a different dump would be helpful:
Page id --> List of [ Image id | real url | type (original | dim_xy | thumb) | license ]
regards