On 4 sep. 2013, at 18:59, Brian Wolff bawolff@gmail.com wrote:
On 9/1/13, Jean-Frédéric jeanfrederic.wiki@gmail.com wrote: [..]
The downside to this is in order to effectively get metadata out of commons given the current practises, one essentially has to screen scrape and do slightly ugly things
This [1] looks quite acrobatic indeed. Can’t we make better use of the machine-readable markings provided by templates? https://commons.wikimedia.org/wiki/Commons:Machine-readable_data
[1] https://gerrit.wikimedia.org/r/#/c/80403/4/CommonsMetadata_body.php
It is using the machine readable data from that page. (Although its debatable how much "Look for a <td> with this id, and then look at the contents of the next sibling <td> you encounter is").
Almost all of that is templated, so of course we can choose to actually fix some of those templates if we really wanted to. Especially for the licenses, my intent was EXACTLY to feed a system like you are building right now, while at the same time making Magnus' StockPhoto gadget possible for the immediate future, so I love what you are doing here.
I have not had time to read your patches unfortunately, but can I suggest creating a separate table of licenses ? The licenses are very well suited as 'managed' data units I think and would give you a lot of flexibility. You could have like:
id, abbreviation, short name, long name, license version, long description page, default template, scrapeid, canonical license URL, canonical RFDa, PD/CC, BY, NC, SA, other properties of the license requirements
Then use the 'scrapeid' to link the licenses to the file metadata. The licenses are very well suited for this I think and it will make it a lot easier to search trough the database and to dynamically give suitable representations of the license in different types (very short linked, long linked, full text, full linked) in different languages.
For the other metadata it would also be very nice to take a much more structured and even WikiData approach, but I think a licenses table is much simpler that most other metadata, would give us a lot of flexibility and advantadges and would be easy to import into WikiData once we think we are up to that. Something to consider.
DJ