Hi all, I write to you on behalf of the public domain working grouf of the Open Knowledge Foundation. We are currently developing an automated system to identify the legal status of different types of works (i.e. to determine whether or not they are in public domain). In order to do this, we need to gather the necessary metadata to determine the legal status of these works. This includes information such as title, author, date of publication, etc. You can find more information about the project on our site http://publicdomain.okfn.org/calculators. A preliminary implementation of the project can be seen at www.publicdomainworks.net (site still under development).
Incorporating the metadata from the Wikimedia Commons archive into our database would be extremely useful both for us, since it would greatly increase the quality of our results. eg. in the case of http://commons.wikimedia.org/wiki/File:Cyphoma_signatum_(Fingerprint_Cowry_-... - we would like to retrieve the information from the Summary section
If I understood correctly, the metadata regarding the works of the archive is primarily text/html based. Hence, I would like to know (a) whether there exists a database where this metadata can be retrieved, or alternatively (b) whether would you be interested in switching to a more structured database contained all the relevant metadata about those works? Looking forward to your answer, Primavera
Hi Primavera, I'm no admin in Commons, but I think I can say few things.
As you know, Commons, as the all other Wikimedia projects, is based on MediaWiki, which cannot manage metadata properly (i.e. in dedicated databases).
So far, we just have parsers and bots which read our templates (in Wikipedia as in Commons) to retrieve some metadata. Another way is to work directly with the dumps (it happens with DBpedia, I think).
As far as I know, it has been discussed *many* times to upgrade/shift to a more granular, structured system, but right now there is not. There are some extensions in MediaWiki that could help (as Semantic MediaWiki), but they are not implemented for security reasons. People interested in GLAM partnerships (Galleries, Libraries, Archives, Museums) discuss often about the need of managing metadata, but it's a big and not easy issue, involving the very core of MediaWiki. (I'm the guy obsessed in having an OAI-MPH extension for MediaWiki, so I understand you perfectly :-)
So, the only thing you could do is ask our developers/tecchies for bots and other fancy script which currently are doing a similar job.
Aubrey
PS: to Commons admins: please corret me if I siad something wrong, but this is the picture I have.
2011/8/19 Primavera De Filippi pdefilippi@gmail.com:
Hi all, I write to you on behalf of the public domain working grouf of the Open Knowledge Foundation. We are currently developing an automated system to identify the legal status of different types of works (i.e. to determine whether or not they are in public domain). In order to do this, we need to gather the necessary metadata to determine the legal status of these works. This includes information such as title, author, date of publication, etc. You can find more information about the project on our site http://publicdomain.okfn.org/calculators. A preliminary implementation of the project can be seen at www.publicdomainworks.net (site still under development).
Incorporating the metadata from the Wikimedia Commons archive into our database would be extremely useful both for us, since it would greatly increase the quality of our results. eg. in the case of http://commons.wikimedia.org/wiki/File:Cyphoma_signatum_(Fingerprint_Cowry_-...
- we would like to retrieve the information from the Summary section
If I understood correctly, the metadata regarding the works of the archive is primarily text/html based. Hence, I would like to know (a) whether there exists a database where this metadata can be retrieved, or alternatively (b) whether would you be interested in switching to a more structured database contained all the relevant metadata about those works? Looking forward to your answer, Primavera
Commons-l mailing list Commons-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/commons-l
Primavera De Filippi, 19/08/2011 22:58:
Incorporating the metadata from the Wikimedia Commons archive into our database would be extremely useful both for us, since it would greatly increase the quality of our results.
Your project is very interesting; you might know that http://outofcopyright.eu/calculator.html has been recently published and a wikimedian took part in the process (user:Multichill). Actually I don't understand how you're going to use this information but I agree that it should be available for everyone.
I only want to add to what Andrea said that actually looks like something is moving now, see http://www.mediawiki.org/wiki/License_integration_MediaWiki (see names and other pages linked there); it would be great if you found some way to join efforts to get something out of it...
Nemo
Hello, you can take a look to project templatetiger (from Stefan Kühn and me) that extract all templates from some wikis by parsing the dumps.
Example Template:Information from Commons: http://toolserver.org/~kolossos/templatetiger/tt-table4.php?template=Informa...
Docu: http://de.wikipedia.org/wiki/Wikipedia:WikiProjekt_Vorlagenauswertung/en
Disadvantage: The update-intervall is only 2-3 months.
With Toolserver-account you would have access to the database of the tool.
With Toolserver-account you could also ask the commons-DB to give you e.g. all images with Template:PD-old back.
I'm waiting also for years that the Foundation solve this problem of structual data. Some activities from Wikimedia seems to be in http://render-project.eu but I'm not sure if they will come to a result.
Greetings Kolossos
Am 19.08.2011 22:58, schrieb Primavera De Filippi:
Hi all, I write to you on behalf of the public domain working grouf of the Open Knowledge Foundation. We are currently developing an automated system to identify the legal status of different types of works (i.e. to determine whether or not they are in public domain). In order to do this, we need to gather the necessary metadata to determine the legal status of these works. This includes information such as title, author, date of publication, etc. You can find more information about the project on our site http://publicdomain.okfn.org/calculators. A preliminary implementation of the project can be seen at www.publicdomainworks.net (site still under development).
Incorporating the metadata from the Wikimedia Commons archive into our database would be extremely useful both for us, since it would greatly increase the quality of our results. eg. in the case of http://commons.wikimedia.org/wiki/File:Cyphoma_signatum_(Fingerprint_Cowry_-...
- we would like to retrieve the information from the Summary section
If I understood correctly, the metadata regarding the works of the archive is primarily text/html based. Hence, I would like to know (a) whether there exists a database where this metadata can be retrieved, or alternatively (b) whether would you be interested in switching to a more structured database contained all the relevant metadata about those works? Looking forward to your answer, Primavera