Per request in meeting, thought I'd stick it on the public list for
references. :)
As I recall there should be three possible URL formats for images embedded
in <img> tags in wiki pages or returned as thumbnails via the API:
http(s)?://
upload.wikimedia.org/(project)/(subdomain)/(hash1)/(hash2)/(base-filename)
^ original-size images
http(s)?://
upload.wikimedia.org/(project)/(subdomain)/(hash1)/(hash2)/thumb/(base-file…
?
^ thumbnails
http(s)?://
upload.wikimedia.org/(project)/(subdomain)/(hash1)/(hash2)/thumb/(base-file…
^ this last is used in cases where the filename is very very long and we
can't actually prepend all the options to the filename (happens mostly in
South Asian languages where UTF-8 is 3 bytes per letter)
* project: 'wikipedia' in all cases we need to handle; local files on
Wiktionary etc will have it separate but we don't use these.
* subdomain: language 'en' etc for Wikipedias, subproject for special-case
wikis like Commons/'commons'
* hash1: first digit of md5 hash of the filename (you don't need to use
this here, consider it opaque)
* hash2: first 2 digits of md5 hash of the filename
* base-filename: the base filename -- you want this! This is the raw
filename for files served at original size; thumbnails will use it as a
directory component.
* render-extension: files other than PNG, GIF, and JPEG are rendered to one
of those, usually PNG. So you'll see things like ".svg.png" at times -- but
never ".png.png". These only appear on thumbnails.
* size: thumbnails are always given with the pixel size.
* possible-other-options: Note that other options may include a page number
for PDF, DjVu, or TIFF files, or a time position for video thumbnails. To
avoid parsing that stuff out, consider using the subdirectory base name on
thumbnails if possible.
-- brion