On Fri, Sep 5, 2014 at 10:21 AM, Jonas Öberg <jonas@commonsmachinery.se> wrote:
It's possible to use Special:Redirect or thumb.php to get the
thumbnail/URL, but both are actually PHP scripts that need running. So
while perhaps not ideal, it seems to make the most sense here to
generate the thumbnail URLs ourselves and hit the web server directly.

That can work if you don't mind getting errors in some % of cases where the file format would require a more complex URL scheme. Otherwise, you have three options:
  • just use Special:Redirect. Depending on your request frequency, it might be fine. We can ask ops what speed limit would be reasonable; for bots using the API, the general recommendation is 12 requests per minute.
  • scrape file description pages. The HTML page is cached in varnish and it has links to various standard image sizes, so you won't hit PHP this way; of course, HTML scraping is not the most reliable way of retrieving data.
  • use the API in batches. You can retrieve the information (including thumbnail URL) for 500 files in a single request (5000 if you get a bot flag):
https://en.wikipedia.org/w/api.php?format=jsonfm&action=query&titles=File:30C3_Commons_Machinery_1.jpg|File:30C3_Commons_Machinery_2.jpg|File:30C3_Commons_Machinery_3.jpg&prop=imageinfo&iiprop=extmetadata|url&iiextmetadatafilter=ObjectName|Artist|LicenseShortName&iiurlwidth=640

IMO the last option is the cleanest one.