On Fri, Sep 5, 2014 at 10:21 AM, Jonas Öberg <jonas(a)commonsmachinery.se>
wrote:
It's possible to use Special:Redirect or thumb.php
to get the
thumbnail/URL, but both are actually PHP scripts that need running. So
while perhaps not ideal, it seems to make the most sense here to
generate the thumbnail URLs ourselves and hit the web server directly.
That can work if you don't mind getting errors in some % of cases where the
file format would require a more complex URL scheme. Otherwise, you have
three options:
- just use Special:Redirect. Depending on your request frequency, it
might be fine. We can ask ops what speed limit would be reasonable; for
bots using the API, the general recommendation is 12 requests per minute.
- scrape file description pages. The HTML page is cached in varnish and
it has links to various standard image sizes, so you won't hit PHP this
way; of course, HTML scraping is not the most reliable way of retrieving
data.
- use the API in batches. You can retrieve the information (including
thumbnail URL) for 500 files in a single request (5000 if you get a bot
flag):
https://en.wikipedia.org/w/api.php?format=jsonfm&action=query&title…
IMO the last option is the cleanest one.