On 9/6/13, Daniel Kinzler daniel@brightbyte.de wrote:
The only thing I'm slightly worried about is the data model and representation of the metadata. Swapping one backend for another will only work if they are conceptually compatible.
The data model I was using was simple key-value pairs. Specifically it was using the various properties defined by Exif (and other metadata things that MediaWiki extracts from files) as the key names. I imagine wikidata would allow for much more complex types of metadata. I was thinking this api module would serve to gather the "basic" information, and wikidata would have its own querying endpoints for the complex view of its metadata.
Can you give a brief overview of how you imagine the output of the API would be structured, and what information it would contain?
As an example, for the url of the license: <LicenseUrl source="commons-desc-page" translatedName="URL for copyright license" hidden="" xml:space="preserve">http://creativecommons.org/licenses/by-sa/3.0/at/deed.en</LicenseUrl>
Which contains the key name ("LicenseUrl"), the place where the data was retrieved from ("commons-desc-page", as opposed to "file-metadata" if it came from the CC:LicenseUrl property of XMP data embedded in the file), the translated name of the key name ( "URL for copyright license", coming from MediaWiki:Exif-licenseurl message), whether or not this property is hidden when displayed on image description page (true in the example), and the value of the property (http://creativecommons.org/licenses/by-sa/3.0/at/deed.en)
Also, your original proposal said something about outputting HTML. That confuses me - an API module would return structured data, why would you use HTML to represent the metadata? That makes it a lot harder to process...
It does. Part of the reason, is I wanted something that could instantly be displayed to the user, hence more user friendly than machine friendly (For example human readable timestamps instead of iso timestamps. Human readable flash firing values, vs constant). The second reason is the source of the data. If we look at the description field on a commons image page, we have things like:
"Front and western side of the house located at 912 E. First Street in {{w|Bloomington, Indiana|Bloomington}}, {{w|Indiana}}, {{w|United States}}. Built in 1925, it is part of the locally-designated Elm Heights Historic District."
Which has links in it. There's a couple options for what we can do with that. We can give it out as is, or we could expand templates and return:
"Front and western side of the house located at 912 E. First Street in [[:w:Bloomington, Indiana|Bloomington]], [[:w:Indiana|Indiana]], [[:w:United States|United States]]. Built in 1925, it is part of the locally-designated Elm Heights Historic District."
Or we could return html: Front and western side of the house located at 912 E. First Street in <a href="//en.wikipedia.org/wiki/Bloomington,_Indiana" class="extiw" title="w:Bloomington, Indiana">Bloomington</a>, <a href="//en.wikipedia.org/wiki/Indiana" class="extiw" title="w:Indiana">Indiana</a>, <a href="//en.wikipedia.org/wiki/United_States" class="extiw" title="w:United States">United States</a>. Built in 1925, it is part of the locally-designated Elm Heights Historic District.
Or we could ditch the html entirely:
Front and western side of the house located at 912 E. First Street in Bloomington, Indiana, United States. Built in 1925, it is part of the locally-designated Elm Heights Historic District.
I think returning the html is the option that is most honest to the original data, while still being easy to process. Sometimes the formatting in the description field is more complex than just simple links.
Given that the use case of showing data to user and having metadata that is easy to process for computers are slightly different, perhaps it makes sense to have two different modules, one that returns html (and human formatted things for timestamps, etc), and the other that returns more machine oriented data (including perhaps the version of the description tag with all html stripped out).
--bawolff