Hi,
I am writing a mobile client that can show wikipedia content. My approach is to download the raw media-wiki markup instead of the generated html. This allows me more control and I avoid using a html parser/viewer.
The approach is quite successful except for when I encounter images. In the markup I can see something like; File:1945-P-Jefferson-War-Nickel-Reverse.JPG
I use the API to fetch some metadata; en.wikipedia.org/w/api.php?action=query&prop=imageinfo \ &iilimit=1&format=xml&iiprop=dimensions%7Cmime&titles=[foo]
The piece of the puzzle I am still missing is how to find out the actual download URL for any given image.
I've seen images start with; http://upload.wikimedia.org/wikipedia/en/6/6d/ and with; http://upload.wikimedia.org/wikipedia/commons/d/d0/
But I don't really understand how to decide what url to prefix to my image-name. Anyone can shed some light on this?
Thanks!
As an aside I'm curious , how are you rendering the html from the wikitext ?
On Fri, Nov 15, 2013 at 2:07 PM, Thomas thomas@thomaszander.se wrote:
Hi,
I am writing a mobile client that can show wikipedia content. My approach is to download the raw media-wiki markup instead of the generated html. This allows me more control and I avoid using a html parser/viewer.
The approach is quite successful except for when I encounter images. In the markup I can see something like; File:1945-P-Jefferson-War-Nickel-Reverse.JPG
I use the API to fetch some metadata; en.wikipedia.org/w/api.php?action=query&prop=imageinfo \ &iilimit=1&format=xml&iiprop=dimensions%7Cmime&titles=[foo]
The piece of the puzzle I am still missing is how to find out the actual download URL for any given image.
I've seen images start with; http://upload.wikimedia.org/wikipedia/en/6/6d/ and with; http://upload.wikimedia.org/wikipedia/commons/d/d0/
But I don't really understand how to decide what url to prefix to my image-name. Anyone can shed some light on this?
Thanks!
Thomas Zander
Mediawiki-api mailing list Mediawiki-api@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/mediawiki-api
Quoting jeph jephpaul@gmail.com:
As an aside I'm curious , how are you rendering the html from the wikitext ?
I'm rendering rich-text, not html.
So I wrote a tokenizer for the wiki text and a general-purpose text engine that renders it.
On Fri, Nov 15, 2013 at 2:07 PM, Thomas thomas@thomaszander.se wrote:
Hi,
I am writing a mobile client that can show wikipedia content. My approach is to download the raw media-wiki markup instead of the generated html. This allows me more control and I avoid using a html parser/viewer.
The approach is quite successful except for when I encounter images. In the markup I can see something like; File:1945-P-Jefferson-War-Nickel-Reverse.JPG
I use the API to fetch some metadata; en.wikipedia.org/w/api.php?action=query&prop=imageinfo \ &iilimit=1&format=xml&iiprop=dimensions%7Cmime&titles=[foo]
The piece of the puzzle I am still missing is how to find out the actual download URL for any given image.
I've seen images start with; http://upload.wikimedia.org/wikipedia/en/6/6d/ and with; http://upload.wikimedia.org/wikipedia/commons/d/d0/
But I don't really understand how to decide what url to prefix to my image-name. Anyone can shed some light on this?
Thanks!
Thomas Zander
Mediawiki-api mailing list Mediawiki-api@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/mediawiki-api
On 15/11/13 11:16, Thomas wrote:
Quoting jeph jephpaul@gmail.com:
As an aside I'm curious , how are you rendering the html from the wikitext ?
I'm rendering rich-text, not html.
So I wrote a tokenizer for the wiki text and a general-purpose text engine that renders it.
These attempts are always interesting if it is shareable work.
On Fri, 15 Nov 2013 09:37:49 +0100, Thomas thomas@thomaszander.se wrote:
I use the API to fetch some metadata; en.wikipedia.org/w/api.php?action=query&prop=imageinfo \ &iilimit=1&format=xml&iiprop=dimensions%7Cmime&titles=[foo] The piece of the puzzle I am still missing is how to find out the actual download URL for any given image.
Just add "url" to "iiprop" and the API will return the full URL.
I've seen images start with; http://upload.wikimedia.org/wikipedia/en/6/6d/ and with; http://upload.wikimedia.org/wikipedia/commons/d/d0/ But I don't really understand how to decide what url to prefix to my image-name. Anyone can shed some light on this?
The "wikipedia/en" / "wikipedia/commons" part depends on which wiki the file is uploaded to, in these cases the English Wikipedia or Wikimedia Commons (which is a repository of media files used by all Wikipedias and sister projects in all languages).
The "6/6d" / "d/d0" part is constructed based on the MD5 hash of the filename, for example md5("Wiki.png") == bc32c4ef985f1924664e5f5c7359ef62, so the URL is https://upload.wikimedia.org/wikipedia/en/b/bc/Wiki.png . But I wouldn't rely on that, especially if you're already calling the API.
Quoting Bartosz Dziewo?ski matma.rex@gmail.com:
On Fri, 15 Nov 2013 09:37:49 +0100, Thomas thomas@thomaszander.se wrote:
I use the API to fetch some metadata; en.wikipedia.org/w/api.php?action=query&prop=imageinfo \ &iilimit=1&format=xml&iiprop=dimensions%7Cmime&titles=[foo] The piece of the puzzle I am still missing is how to find out the actual download URL for any given image.
Just add "url" to "iiprop" and the API will return the full URL.
Oh, cool. I wonder why I missed that :)
Seems to work great, thanks!
mediawiki-api@lists.wikimedia.org