Re: [Glamtools] Getting metadata back from Wikimedia Commons

26 Nov 2015

Hi Jesse,

...
  On 26 Nov 2015, at 8:47 AM, Jesse de Vos
&lt;jdvos(a)beeldengeluid.nl&gt; wrote:

 Hi everyone, 

 We’re trying to get a clearer picture of what material we have on Wikimedia Commons so
that our next batch upload doesn’t duplicate with material that is already on Commons. The
category: Media from Open Beelden contains all the files and we would like to have all the
metadata on Commons for that category (specifically the 'source' URL) to match
against  our new content upload.

 Does anyone know how to gather this using the Commons API? Basically, a call to the API
with the “File:title” field that would return a JSON object with all the metadata is
exactly what we need. Help would be much appreciated!

 Best,

 Jesse 
You *can* get to that data via DBpedia — I came up with a quick SPARQL query
(https://gist.github.com/gaurav/c9704c9b714e1e927140), which you can run at
http://commons.dbpedia.org/sparql — here’s what the output looks like:
http://commons.dbpedia.org/sparql?default-graph-uri=http%3A%2F%2Fcommons.db…

Unfortunately, this based off of a dump of the Commons as of January 10, 2015, so it might
be out of date for your purposes! If you’d like more frequent updates, I’d ask on the
DBpedia mailing lists — I helped write the Commons extractors, but I don’t really know
anything about their infrastructure. You can also run the DBpedia Extraction Framework on
a local dump of the entire Commons or on a subset of pages (by using
https://commons.wikimedia.org/wiki/Special:Export to export all the pages from the
category of interest, say), but I’d definitely check with the DBpedia developers first to
see if they have something in the works that might be helpful for you!

Hope this helps!

cheers,
Gaurav

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

Re: [Glamtools] Getting metadata back from Wikimedia Commons