Hi everyone,
I am new to the Commons API and would like to know how to get (in a machine readable way) the metadata found within the Summary section of a page.
In particular, given a File page like this one: https://commons.wikimedia.org/wiki/File:African_Dusky_Nightjar_(Caprimulgus_...
I would like to get the "Europeana link" part... it is enough for me to get the data as Wiki markup, but parsing the whole HTML would be too much :S
... btw, is there any way to query for such data? I have been using the API Sandbox (https://en.wikipedia.org/wiki/Special:ApiSandbox ) but could not find a method that could do this...
Your help is really appreciated! Thank you in advance!
Best regards, Hugo
One option (old, unmaintained code, no support, no warranty, good luck) would be my attempt at parsing this: https://tools.wmflabs.org/magnustools/commonsapi.php
On Fri, Nov 25, 2016 at 2:11 PM Hugo Manguinhas < Hugo.Manguinhas@europeana.eu> wrote:
Hi everyone,
I am new to the Commons API and would like to know how to get (in a machine readable way) the metadata found within the Summary section of a page.
In particular, given a File page like this one: https://commons.wikimedia.org/wiki/File:African_Dusky_Nightjar_(Caprimulgus_...
I would like to get the "Europeana link" part... it is enough for me to get the data as Wiki markup, but parsing the whole HTML would be too much :S
... btw, is there any way to query for such data? I have been using the API Sandbox (https://en.wikipedia.org/wiki/Special:ApiSandbox ) but could not find a method that could do this...
Your help is really appreciated! Thank you in advance!
Best regards, Hugo _______________________________________________ Commons-l mailing list Commons-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/commons-l
If you know what the external link looks like (does it always start with "http://www.europeana.eu/%E2%80%9C?) and the page(s) you’re interested in, you can use ‘extlinks’ to find all external links on a set of pages:
- https://commons.wikimedia.org/w/api.php?action=query&titles=File:African...
You can also get a list of every page on the Commons that has a URL containing "europeana.eu/portal/record”, like in Special:Linksearch:
- https://commons.wikimedia.org/w/api.php?action=query&list=exturlusage&am...
I don’t think there’s an API to parse the Information template yet. DBpedia tries to do this (e.g. http://commons.dbpedia.org/page/File:These_three_geese.jpg), but I couldn’t find the file you were interested in on their website.
Hope that helps!
cheers, Gaurav
On 25 Nov 2016, at 9:21 AM, Magnus Manske magnusmanske@googlemail.com wrote:
One option (old, unmaintained code, no support, no warranty, good luck) would be my attempt at parsing this: https://tools.wmflabs.org/magnustools/commonsapi.php
On Fri, Nov 25, 2016 at 2:11 PM Hugo Manguinhas Hugo.Manguinhas@europeana.eu wrote: Hi everyone,
I am new to the Commons API and would like to know how to get (in a machine readable way) the metadata found within the Summary section of a page.
In particular, given a File page like this one: https://commons.wikimedia.org/wiki/File:African_Dusky_Nightjar_(Caprimulgus_...
I would like to get the "Europeana link" part... it is enough for me to get the data as Wiki markup, but parsing the whole HTML would be too much :S
... btw, is there any way to query for such data? I have been using the API Sandbox (https://en.wikipedia.org/wiki/Special:ApiSandbox ) but could not find a method that could do this...
Your help is really appreciated! Thank you in advance!
Best regards, Hugo _______________________________________________ Commons-l mailing list Commons-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/commons-l _______________________________________________ Commons-l mailing list Commons-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/commons-l
On Fri, Nov 25, 2016 at 6:11 AM, Hugo Manguinhas < Hugo.Manguinhas@europeana.eu> wrote:
In particular, given a File page like this one: https://commons.wikimedia.org/wiki/File:African_Dusky_ Nightjar_(Caprimulgus_pectoralis)_(W1CDR0000386_BD28).ogghttps://commons. wikimedia.org/wiki/File:African_Dusky_Nightjar_(Caprimulgus_pectoralis)_( W1CDR0000386_BD28).ogg
I would like to get the "Europeana link" part... it is enough for me to get the data as Wiki markup, but parsing the whole HTML would be too much :S
... btw, is there any way to query for such data? I have been using the API Sandbox (https://en.wikipedia.org/wiki/Special:ApiSandbox ) but could not find a method that could do this...
I don't think it's possible. You can query the main fields of the information table (author, source etc) via prop=imageinfo&iiprop=extmetadata https://commons.wikimedia.org/wiki/Special:ApiSandbox#action=query&format=json&prop=imageinfo&titles=File%3AAfrican_Dusky_Nightjar_(Caprimulgus_pectoralis)_(W1CDR0000386_BD28).ogg&iiprop=extmetadata, but that field is just marked as a miscellaneous info field so there isn't really any way to find it.
For the other people who are reading this: I also got this question. Solved this by doing a query on the database, see https://quarry.wmflabs.org/query/14350
Parsing wikitext is generally messy. Quite a few identifier templates on Commons (like https://commons.wikimedia.org/wiki/Template:Rijksmonument ) set a tracker category and use the identifier as the sorting key. This way it's possible to keep track of what identifier is used on what page (see https://www.mediawiki.org/wiki/Manual:Categorylinks_table for the database layout). In this case no tracker category was set so the externallinks table was used as a fallback ( https://www.mediawiki.org/wiki/Manual:Externallinks_table ).
Maarten
On 25-11-16 15:11, Hugo Manguinhas wrote:
Hi everyone,
I am new to the Commons API and would like to know how to get (in a machine readable way) the metadata found within the Summary section of a page.
In particular, given a File page like this one: https://commons.wikimedia.org/wiki/File:African_Dusky_Nightjar_(Caprimulgus_...
I would like to get the "Europeana link" part... it is enough for me to get the data as Wiki markup, but parsing the whole HTML would be too much :S
... btw, is there any way to query for such data? I have been using the API Sandbox (https://en.wikipedia.org/wiki/Special:ApiSandbox ) but could not find a method that could do this...
Your help is really appreciated! Thank you in advance!
Best regards, Hugo _______________________________________________ Commons-l mailing list Commons-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/commons-l