Per request in meeting, thought I'd stick it on the public list for references. :)
As I recall there should be three possible URL formats for images embedded in <img> tags in wiki pages or returned as thumbnails via the API:
http(s)?:// upload.wikimedia.org/(project)/(subdomain)/(hash1)/(hash2)/(base-filename) ^ original-size images
http(s)?:// upload.wikimedia.org/(project)/(subdomain)/(hash1)/(hash2)/thumb/(base-filename)/(size)px(possible-other-options)-(base-filename)(.render-extension) ? ^ thumbnails
http(s)?:// upload.wikimedia.org/(project)/(subdomain)/(hash1)/(hash2)/thumb/(base-filename)/(size)px(possible-other-options)-thumbnail.(render-extension) ^ this last is used in cases where the filename is very very long and we can't actually prepend all the options to the filename (happens mostly in South Asian languages where UTF-8 is 3 bytes per letter)
* project: 'wikipedia' in all cases we need to handle; local files on Wiktionary etc will have it separate but we don't use these. * subdomain: language 'en' etc for Wikipedias, subproject for special-case wikis like Commons/'commons' * hash1: first digit of md5 hash of the filename (you don't need to use this here, consider it opaque) * hash2: first 2 digits of md5 hash of the filename * base-filename: the base filename -- you want this! This is the raw filename for files served at original size; thumbnails will use it as a directory component. * render-extension: files other than PNG, GIF, and JPEG are rendered to one of those, usually PNG. So you'll see things like ".svg.png" at times -- but never ".png.png". These only appear on thumbnails. * size: thumbnails are always given with the pixel size. * possible-other-options: Note that other options may include a page number for PDF, DjVu, or TIFF files, or a time position for video thumbnails. To avoid parsing that stuff out, consider using the subdirectory base name on thumbnails if possible.
-- brion
Max showed me how to get the file page url from the api. So if all we have is the image name we can get the file page url automagically. I attached a sample query to: https://trello.com/c/cXEMxGb3/8-5-retrieve-file-metadata-from-commonsmetadat...
On Fri, Dec 5, 2014 at 3:52 PM, Brion Vibber bvibber@wikimedia.org wrote:
Per request in meeting, thought I'd stick it on the public list for references. :)
As I recall there should be three possible URL formats for images embedded in <img> tags in wiki pages or returned as thumbnails via the API:
http(s)?:// upload.wikimedia.org/(project)/(subdomain)/(hash1)/(hash2)/(base-filename) ^ original-size images
http(s)?:// upload.wikimedia.org/(project)/(subdomain)/(hash1)/(hash2)/thumb/(base-filename)/(size)px(possible-other-options)-(base-filename)(.render-extension) ? ^ thumbnails
http(s)?:// upload.wikimedia.org/(project)/(subdomain)/(hash1)/(hash2)/thumb/(base-filename)/(size)px(possible-other-options)-thumbnail.(render-extension) ^ this last is used in cases where the filename is very very long and we can't actually prepend all the options to the filename (happens mostly in South Asian languages where UTF-8 is 3 bytes per letter)
- project: 'wikipedia' in all cases we need to handle; local files on
Wiktionary etc will have it separate but we don't use these.
- subdomain: language 'en' etc for Wikipedias, subproject for special-case
wikis like Commons/'commons'
- hash1: first digit of md5 hash of the filename (you don't need to use
this here, consider it opaque)
- hash2: first 2 digits of md5 hash of the filename
- base-filename: the base filename -- you want this! This is the raw
filename for files served at original size; thumbnails will use it as a directory component.
- render-extension: files other than PNG, GIF, and JPEG are rendered to
one of those, usually PNG. So you'll see things like ".svg.png" at times -- but never ".png.png". These only appear on thumbnails.
- size: thumbnails are always given with the pixel size.
- possible-other-options: Note that other options may include a page
number for PDF, DjVu, or TIFF files, or a time position for video thumbnails. To avoid parsing that stuff out, consider using the subdirectory base name on thumbnails if possible.
-- brion
Mobile-l mailing list Mobile-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/mobile-l
Oh, here's the query for the record:
http://en.wikipedia.org/wiki/Special:ApiSandbox#action=query&prop=imagei...
Should see a "descriptionurl" in the results with the file page url.
On Fri, Dec 5, 2014 at 4:23 PM, Monte Hurd mhurd@wikimedia.org wrote:
Max showed me how to get the file page url from the api. So if all we have is the image name we can get the file page url automagically. I attached a sample query to: https://trello.com/c/cXEMxGb3/8-5-retrieve-file-metadata-from-commonsmetadat...
On Fri, Dec 5, 2014 at 3:52 PM, Brion Vibber bvibber@wikimedia.org wrote:
Per request in meeting, thought I'd stick it on the public list for references. :)
As I recall there should be three possible URL formats for images embedded in <img> tags in wiki pages or returned as thumbnails via the API:
http(s)?:// upload.wikimedia.org/(project)/(subdomain)/(hash1)/(hash2)/(base-filename) ^ original-size images
http(s)?:// upload.wikimedia.org/(project)/(subdomain)/(hash1)/(hash2)/thumb/(base-filename)/(size)px(possible-other-options)-(base-filename)(.render-extension) ? ^ thumbnails
http(s)?:// upload.wikimedia.org/(project)/(subdomain)/(hash1)/(hash2)/thumb/(base-filename)/(size)px(possible-other-options)-thumbnail.(render-extension) ^ this last is used in cases where the filename is very very long and we can't actually prepend all the options to the filename (happens mostly in South Asian languages where UTF-8 is 3 bytes per letter)
- project: 'wikipedia' in all cases we need to handle; local files on
Wiktionary etc will have it separate but we don't use these.
- subdomain: language 'en' etc for Wikipedias, subproject for
special-case wikis like Commons/'commons'
- hash1: first digit of md5 hash of the filename (you don't need to use
this here, consider it opaque)
- hash2: first 2 digits of md5 hash of the filename
- base-filename: the base filename -- you want this! This is the raw
filename for files served at original size; thumbnails will use it as a directory component.
- render-extension: files other than PNG, GIF, and JPEG are rendered to
one of those, usually PNG. So you'll see things like ".svg.png" at times -- but never ".png.png". These only appear on thumbnails.
- size: thumbnails are always given with the pixel size.
- possible-other-options: Note that other options may include a page
number for PDF, DjVu, or TIFF files, or a time position for video thumbnails. To avoid parsing that stuff out, consider using the subdirectory base name on thumbnails if possible.
-- brion
Mobile-l mailing list Mobile-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/mobile-l
For reference as a learning experiment in alpha, mobile web is generating infoboxes from Wikidata. See example @ https://en.m.wikipedia.org/wiki/Albert%20Einstein?mobileaction=alpha
We are interested in converting filenames into thumbnail URLs. Basically wikidata api returns just the title. We are currently thus using an md5 library to get to the thumbnail. It would be nice to have a better way of doing this.
I opened a bug [1] but it's not clear what the path forward is for this...
[1] https://phabricator.wikimedia.org/T76827
On Fri, Dec 5, 2014 at 4:24 PM, Monte Hurd mhurd@wikimedia.org wrote:
Oh, here's the query for the record:
http://en.wikipedia.org/wiki/Special:ApiSandbox#action=query&prop=imagei...
Should see a "descriptionurl" in the results with the file page url.
On Fri, Dec 5, 2014 at 4:23 PM, Monte Hurd mhurd@wikimedia.org wrote:
Max showed me how to get the file page url from the api. So if all we have is the image name we can get the file page url automagically. I attached a sample query to: https://trello.com/c/cXEMxGb3/8-5-retrieve-file-metadata-from-commonsmetadat...
On Fri, Dec 5, 2014 at 3:52 PM, Brion Vibber bvibber@wikimedia.org wrote:
Per request in meeting, thought I'd stick it on the public list for references. :)
As I recall there should be three possible URL formats for images embedded in <img> tags in wiki pages or returned as thumbnails via the API:
http(s)?://upload.wikimedia.org/(project)/(subdomain)/(hash1)/(hash2)/(base-filename) ^ original-size images
http(s)?://upload.wikimedia.org/(project)/(subdomain)/(hash1)/(hash2)/thumb/(base-filename)/(size)px(possible-other-options)-(base-filename)(.render-extension)? ^ thumbnails
http(s)?://upload.wikimedia.org/(project)/(subdomain)/(hash1)/(hash2)/thumb/(base-filename)/(size)px(possible-other-options)-thumbnail.(render-extension) ^ this last is used in cases where the filename is very very long and we can't actually prepend all the options to the filename (happens mostly in South Asian languages where UTF-8 is 3 bytes per letter)
- project: 'wikipedia' in all cases we need to handle; local files on
Wiktionary etc will have it separate but we don't use these.
- subdomain: language 'en' etc for Wikipedias, subproject for
special-case wikis like Commons/'commons'
- hash1: first digit of md5 hash of the filename (you don't need to use
this here, consider it opaque)
- hash2: first 2 digits of md5 hash of the filename
- base-filename: the base filename -- you want this! This is the raw
filename for files served at original size; thumbnails will use it as a directory component.
- render-extension: files other than PNG, GIF, and JPEG are rendered to
one of those, usually PNG. So you'll see things like ".svg.png" at times -- but never ".png.png". These only appear on thumbnails.
- size: thumbnails are always given with the pixel size.
- possible-other-options: Note that other options may include a page
number for PDF, DjVu, or TIFF files, or a time position for video thumbnails. To avoid parsing that stuff out, consider using the subdirectory base name on thumbnails if possible.
-- brion
Mobile-l mailing list Mobile-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/mobile-l
Mobile-l mailing list Mobile-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/mobile-l