On 9/1/13, Jean-Frédéric <jeanfrederic.wiki(a)gmail.com> wrote:
[..]
The downside to this is in order to effectively
get metadata out of
commons given the current practises, one essentially has to screen
scrape and do slightly ugly things
This [1] looks quite acrobatic indeed. Can’t we make better use of the
machine-readable markings provided by templates?
<https://commons.wikimedia.org/wiki/Commons:Machine-readable_data>
[1]
https://gerrit.wikimedia.org/r/#/c/80403/4/CommonsMetadata_body.php
It is using the machine readable data from that page. (Although its
debatable how much "Look for a <td> with this id, and then look at the
contents of the next sibling <td> you encounter is").
I'm somewhat of a newb though with extracting microformat style
metadata, so its quite possible there is a better way, or some higher
level parsing library I could use (Something like xpath maybe,
although its not really xml I'm looking at).
--
-bawolff