Sorry for the email spam. Worked through it, I think. Not too familiar with wiki internals. :-)
This particular page doesn't have the content I'm looking for in it. It references a template which is used by a few other versions of the same image, presumably so the data can be stored once and be given consistently. Not being familiar with wiki internals, that was looking to me like it wasn't returning the entire page content... But it is, so I'll have to recognize this situation and pull referenced templates when the information I need isn't already there.
On Tue, Jun 3, 2014 at 2:45 AM, james harvey jamespharvey20@gmail.com wrote:
I may have stumbled upon it. If I change the API call from "titles=File:XYZ.jpg" to "titles=Template:XYZ" (note: dropped the .jpg) then it *appears* to get me what I need.
Is this correct, or did I run across a case where it appears to work but isn't going to be the right way to go? (Like, I'm not sure if "Template:XYZ" directly relates to the Summary information on the "File:XYZ.jpg" page, or if it's duplicated data that in this case matches. And, I'm confused why the .jpg gets dropped switching "File:" to "Template:")
And, will this always get me the full template information, or if someone just updates the "Year" portion, would it only return back that part -- since the revisions seem to be returning data as much as they can based on changes from the previous revision, rather than the answer ignoring any other revisions.
On Tue, Jun 3, 2014 at 1:59 AM, james harvey jamespharvey20@gmail.com wrote:
Given a Wikimedia Commons description page URL - such as: https://commons.wikimedia.org/wiki/File:Van_Gogh_-_Starry_Night_-_Google_Art...
I would like to be able to programmatically retrieve the information in the "Summary" header. (Values for "Artist", "Title", "Date", "Medium", "Dimensions", "Current location", etc.)
I believe all this information is in "Template:Artwork". I can't figure out how to get the wikitext/json-looking template data.
If I use the API and call: https://commons.wikimedia.org/w/api.php?action=query&format=xml&titl... https://commons.wikimedia.org/w/api.php?action=query&format=xml&titles=File:Van%20Gogh%20-%20Starry%20Night%20-%20Google%20Art%20Project.jpg&iilimit=max&iiprop=timestamp%7Cuser%7Ccomment%7Curl%7Csize%7Cmime&prop=imageinfo%7Crevisions&rvgeneratexml=&rvprop=ids%7Ctimestamp%7Cuser%7Ccomment%7Ccontent
Then I don't get the information I'm looking for. This shows the most recent revision, and its changes. Unless the most recent revision changed this data, it doesn't show up.
To see all the information I'm looking for, it seems I'd have to specify rvlimit=max and go through all the past revisions to figure out which is most current. For example, if I do so and I look at revid 79665032, that includes: "{{Artwork | Artist = {{Creator:Vincent van Gogh}} | . . . | Year = 1889 | Technique = {{Oil on canvas}} | . . ."
Isn't there a way to get the current version in whatever format you'd call that - the wikitext/json looking format?
In my API call, I can specify rvexpandtemplates which even with only the most recent revision gives me the information I need, but it's largely in HTML tables/divs/etc format rather than wikitext/json/xml/etc.