Re: [Wikitech-l] list of things to do for image dumps

10 Sep 2010

On Thu, Sep 9, 2010 at 10:54 PM, Jamie Morken &lt;jmorken(a)shaw.ca&gt; wrote:
...
  Hi all,

 This is a preliminary list of what needs to be done to generate images dumps.  If anyone
can help with #2 to provide the access log of image usage stats please send me an email!

 1. run wikix to generate list of images for a given wiki ie. enwiki

 2. sort the image list based on usage frequency from access log files 
Hi,

It will be great to have these image dumps ! I wonder if a different
dump my be worth it for a different scenario:

* User only wants to get the photos for a small set of ids i.e. 1000 pages

What would be the proper way to get these photos without downloading
large dumps ?

    a. Parse the actual html pages and get the actual image urls (plus
license info and then download the images) ?

    b. Try to find the actual image urls using the commons wikitext
dump (and parse license info, ..) ?

Both approaches seem complicated so maybe a different dump would be helpful:

Page id  -->  List of [ Image id | real url |   type (original |
dim_xy | thumb) | license ]

regards

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

2003

2002

Re: [Wikitech-l] list of things to do for image dumps