Hi,
I am building an application where I need to find the most relevant commons image for a wikipedia page. For example, if the page has an infobox with an image that would be the image I want. If not, I would look for an image on the page that is also found on a wikimedia commons search for that entity. Otherwise, I fall back to the first result from wikimedia commons search.
Is this the best possible algorithm for this requirement?
I also considered grabbing the first image from the action=render page, but there are two problems there:
1. action=render is very slow the first time and can become prohibitive for my application.
2. the first image on the action=render page's html is often an icon - is there a reasonable way to find the actual first "main" image on that page?
I also played with dbpedia and flickrwrappr, but neither seem to give me anything better (more relevant to the page) than what I can get directly from wiki api
If this algorithm is the right approach, I am missing the best way to get at the titles of the image files on a given wikipedia page. I am currently using