On 9/1/13, Jean-Frédéric jeanfrederic.wiki@gmail.com wrote: [..]
The downside to this is in order to effectively get metadata out of commons given the current practises, one essentially has to screen scrape and do slightly ugly things
This [1] looks quite acrobatic indeed. Can’t we make better use of the machine-readable markings provided by templates? https://commons.wikimedia.org/wiki/Commons:Machine-readable_data
[1] https://gerrit.wikimedia.org/r/#/c/80403/4/CommonsMetadata_body.php
It is using the machine readable data from that page. (Although its debatable how much "Look for a <td> with this id, and then look at the contents of the next sibling <td> you encounter is").
I'm somewhat of a newb though with extracting microformat style metadata, so its quite possible there is a better way, or some higher level parsing library I could use (Something like xpath maybe, although its not really xml I'm looking at).