On Feb 8, 2008 12:07 PM, Colm McMullan colm@multimap.com wrote:
Hi Brianna, thanks for replying. However the solution isn't really what I'm looking for. I'm parsing through the whole wikipedia dump and I don't want to have to make a HTTP request for each image to find out where it actually is...
If you're working from an enwiki dump, then you could check for the existence of a page in the Image: namespace with the same name as the image name you've extracted from the article text. If there is, then the file is on enwiki, otherwise it's on Commons. This doesn't guarantee anything because images do get moved to Commons or deleted from time to time, but at least the check is a database query and not a HTTP request.