Collecting metadata from Wikimedia Commons - Commons-l

19 Aug 2011

Hi all,
I write to you on behalf of the public domain working grouf of the
Open Knowledge Foundation. We are currently developing an automated
system to identify the legal status of different types of works (i.e.
to determine whether or not they are in public domain). In order to do
this, we need to gather the necessary metadata to determine the legal
status of these works. This includes information such as title,
author, date of publication, etc.
You can find more information about the project on our site
http://publicdomain.okfn.org/calculators.
A preliminary implementation of the project can be seen at
www.publicdomainworks.net (site still under development).

Incorporating the metadata from the Wikimedia Commons archive into our
database would be extremely useful both for us, since it would greatly
increase the quality of our results.
eg. in the case of
http://commons.wikimedia.org/wiki/File:Cyphoma_signatum_(Fingerprint_Cowry_…
- we would like to retrieve the information from the Summary section

If I understood correctly, the metadata regarding the works of the
archive is primarily text/html based.
Hence, I would like to know (a) whether there exists a database where
this metadata can be retrieved, or alternatively (b) whether would you
be interested in switching to a more structured database contained all
the relevant metadata about those works?
Looking forward to your answer,
Primavera