On Feb 8, 2008 12:07 PM, Colm McMullan <colm(a)multimap.com> wrote:
Hi Brianna, thanks for replying. However the solution isn't really what
I'm looking for. I'm parsing through the whole wikipedia dump and I
don't want to have to make a HTTP request for each image to find out
where it actually is...
If you're working from an enwiki dump, then you could check for the
existence of a page in the Image: namespace with the same name as the
image name you've extracted from the article text. If there is, then
the file is on enwiki, otherwise it's on Commons. This doesn't
guarantee anything because images do get moved to Commons or deleted
from time to time, but at least the check is a database query and not
a HTTP request.
--
Stephen Bain
stephen.bain(a)gmail.com