Hi,
I am building an application where I need to find the most relevant commons
image for a wikipedia page. For example, if the page has an infobox with an
image that would be the image I want. If not, I would look for an image on
the page that is also found on a wikimedia commons search for that entity.
Otherwise, I fall back to the first result from wikimedia commons search.
Is this the best possible algorithm for this requirement?
I also considered grabbing the first image from the action=render page, but
there are two problems there:
1. action=render is very slow the first time and can become prohibitive for
my application.
2. the first image on the action=render page's html is often an icon - is
there a reasonable way to find the actual first "main" image on that page?
I also played with dbpedia and flickrwrappr, but neither seem to give me
anything better (more relevant to the page) than what I can get directly
from wiki api
If this algorithm is the right approach, I am missing the best way to get at
the titles of the image files on a given wikipedia page. I am currently
using
http://en.wikipedia.org/w/api.php?action=query&list=search&srnamesp…
but that gives me a list of search results for the search string from
wikipedia, and not images on the page.
Thanks
Anand