Some help needed

List overview All Threads
Download

newer

older

Commons app (Android) update -...

WG: [Wikitech-l] Update on WMF...

Hugo Manguinhas

25 Nov 2016 25 Nov '16

6:11 a.m.

Hi everyone,

I am new to the Commons API and would like to know how to get (in a machine readable way) the metadata found within the Summary section of a page.

In particular, given a File page like this one: https://commons.wikimedia.org/wiki/File:African_Dusky_Nightjar_(Caprimulgus_...

I would like to get the "Europeana link" part... it is enough for me to get the data as Wiki markup, but parsing the whole HTML would be too much :S

... btw, is there any way to query for such data? I have been using the API Sandbox (https://en.wikipedia.org/wiki/Special:ApiSandbox ) but could not find a method that could do this...

Your help is really appreciated! Thank you in advance!

Best regards, Hugo

Show replies by date

Magnus Manske

25 Nov 25 Nov

6:21 a.m.

One option (old, unmaintained code, no support, no warranty, good luck) would be my attempt at parsing this: https://tools.wmflabs.org/magnustools/commonsapi.php

On Fri, Nov 25, 2016 at 2:11 PM Hugo Manguinhas < Hugo.Manguinhas@europeana.eu> wrote:

...

Hi everyone,

I am new to the Commons API and would like to know how to get (in a machine readable way) the metadata found within the Summary section of a page.

In particular, given a File page like this one: https://commons.wikimedia.org/wiki/File:African_Dusky_Nightjar_(Caprimulgus_...

I would like to get the "Europeana link" part... it is enough for me to get the data as Wiki markup, but parsing the whole HTML would be too much :S

... btw, is there any way to query for such data? I have been using the API Sandbox (https://en.wikipedia.org/wiki/Special:ApiSandbox ) but could not find a method that could do this...

Your help is really appreciated! Thank you in advance!

Best regards, Hugo _______________________________________________ Commons-l mailing list Commons-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/commons-l

Gaurav Vaidya

11:01 a.m.

If you know what the external link looks like (does it always start with "http://www.europeana.eu/%E2%80%9C?) and the page(s) you’re interested in, you can use ‘extlinks’ to find all external links on a set of pages:

- https://commons.wikimedia.org/w/api.php?action=query&titles=File:African...

You can also get a list of every page on the Commons that has a URL containing "europeana.eu/portal/record”, like in Special:Linksearch:

- https://commons.wikimedia.org/w/api.php?action=query&list=exturlusage&am...

I don’t think there’s an API to parse the Information template yet. DBpedia tries to do this (e.g. http://commons.dbpedia.org/page/File:These_three_geese.jpg), but I couldn’t find the file you were interested in on their website.

Hope that helps!

cheers, Gaurav

...

On 25 Nov 2016, at 9:21 AM, Magnus Manske magnusmanske@googlemail.com wrote:

One option (old, unmaintained code, no support, no warranty, good luck) would be my attempt at parsing this: https://tools.wmflabs.org/magnustools/commonsapi.php

On Fri, Nov 25, 2016 at 2:11 PM Hugo Manguinhas Hugo.Manguinhas@europeana.eu wrote: Hi everyone,

I am new to the Commons API and would like to know how to get (in a machine readable way) the metadata found within the Summary section of a page.

In particular, given a File page like this one: https://commons.wikimedia.org/wiki/File:African_Dusky_Nightjar_(Caprimulgus_...

I would like to get the "Europeana link" part... it is enough for me to get the data as Wiki markup, but parsing the whole HTML would be too much :S

... btw, is there any way to query for such data? I have been using the API Sandbox (https://en.wikipedia.org/wiki/Special:ApiSandbox ) but could not find a method that could do this...

Your help is really appreciated! Thank you in advance!

Best regards, Hugo _______________________________________________ Commons-l mailing list Commons-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/commons-l _______________________________________________ Commons-l mailing list Commons-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/commons-l

Gergo Tisza

11:04 a.m.

On Fri, Nov 25, 2016 at 6:11 AM, Hugo Manguinhas < Hugo.Manguinhas@europeana.eu> wrote:

...

In particular, given a File page like this one: https://commons.wikimedia.org/wiki/File:African_Dusky_ Nightjar_(Caprimulgus_pectoralis)_(W1CDR0000386_BD28).ogghttps://commons. wikimedia.org/wiki/File:African_Dusky_Nightjar_(Caprimulgus_pectoralis)_( W1CDR0000386_BD28).ogg

I would like to get the "Europeana link" part... it is enough for me to get the data as Wiki markup, but parsing the whole HTML would be too much :S

... btw, is there any way to query for such data? I have been using the API Sandbox (https://en.wikipedia.org/wiki/Special:ApiSandbox ) but could not find a method that could do this...

I don't think it's possible. You can query the main fields of the information table (author, source etc) via prop=imageinfo&iiprop=extmetadata https://commons.wikimedia.org/wiki/Special:ApiSandbox#action=query&format=json&prop=imageinfo&titles=File%3AAfrican_Dusky_Nightjar_(Caprimulgus_pectoralis)_(W1CDR0000386_BD28).ogg&iiprop=extmetadata, but that field is just marked as a miscellaneous info field so there isn't really any way to find it.

Maarten Dammers

26 Nov 26 Nov

3:47 a.m.

For the other people who are reading this: I also got this question. Solved this by doing a query on the database, see https://quarry.wmflabs.org/query/14350

Parsing wikitext is generally messy. Quite a few identifier templates on Commons (like https://commons.wikimedia.org/wiki/Template:Rijksmonument ) set a tracker category and use the identifier as the sorting key. This way it's possible to keep track of what identifier is used on what page (see https://www.mediawiki.org/wiki/Manual:Categorylinks_table for the database layout). In this case no tracker category was set so the externallinks table was used as a fallback ( https://www.mediawiki.org/wiki/Manual:Externallinks_table ).

Maarten

On 25-11-16 15:11, Hugo Manguinhas wrote:

...

Hi everyone,

I am new to the Commons API and would like to know how to get (in a machine readable way) the metadata found within the Summary section of a page.

In particular, given a File page like this one: https://commons.wikimedia.org/wiki/File:African_Dusky_Nightjar_(Caprimulgus_...

I would like to get the "Europeana link" part... it is enough for me to get the data as Wiki markup, but parsing the whole HTML would be too much :S

... btw, is there any way to query for such data? I have been using the API Sandbox (https://en.wikipedia.org/wiki/Special:ApiSandbox ) but could not find a method that could do this...

Your help is really appreciated! Thank you in advance!

Best regards, Hugo _______________________________________________ Commons-l mailing list Commons-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/commons-l

2946

Age (days ago)

2947

Last active (days ago)

commons-l@lists.wikimedia.org

4 comments

5 participants

tags (0)

participants (5)

Gaurav Vaidya
Gergo Tisza
Hugo Manguinhas
Maarten Dammers
Magnus Manske