Hi everyone,
About ten years ago the Rijksdienst voor het Cultureel Erfgoed (RCE, Cultural Heritage Agency of the Netherlands) made available the data about all rijksmonumenten (national heritage sites in the Netherlands). This was used to create lists of Rijksmonumenten on Wikipedia which was the starting point for Wiki Loves Monuments.
The RCE now published all this data (and more) as linked open data, see https://linkeddata.cultureelerfgoed.nl/home (unfortunately all in Dutch). That opens up all sorts of exciting new possibilities. That made me wonder: Do any of the other heritage organizations public linked open data or is the RCE the first to do this?
Maarten
Maarten Dammers, 14/07/19 13:57:
That made me wonder: Do any of the other heritage organizations public linked open data or is the RCE the first to do this?
Several, I think. The most significant I remember were from Sweden and Finland. Even the Italian ministry published some linked (semi-open) data last year and Wikimedia Italia funded the import of the small part of it which was usable (about 30k items): https://www.wikidata.org/?curid=30576438#Luoghi_della_cultura
Federico
Hi Federico,
On 14-07-19 13:21, Federico Leva (Nemo) wrote:
Maarten Dammers, 14/07/19 13:57:
That made me wonder: Do any of the other heritage organizations public linked open data or is the RCE the first to do this?
Several, I think. The most significant I remember were from Sweden and Finland.
Any pointers? I don't see anything on https://www.wikidata.org/wiki/Wikidata:SPARQL_query_service/Federation_repor... that might be the endpoints.
Even the Italian ministry published some linked (semi-open) data last year and Wikimedia Italia funded the import of the small part of it which was usable (about 30k items): https://www.wikidata.org/?curid=30576438#Luoghi_della_cultura
Semi-open? According to https://www.wikidata.org/wiki/Wikidata:SPARQL_federation_input/Archive#dati.... this is cc-by 2.5. I assume they did a separate release?
Anyway, their platform (Lodview) is quite nice. We should also add links to things like http://dati.beniculturali.it/iccd/schede/resource/GeographicalFeature/Comune... and http://dati.beniculturali.it/iccd/schede/resource/uod/S010537 . With the federation in place, it's possible to set up automated reports to find mismatches between the data. See for example the report on https://www.wikidata.org/wiki/Property_talk:P1006/Mismatches . Obvious report for this domain would be monuments in the beniculturali database, but not on Wikidata. Or do you already have something in place?
Maarten
Maarten Dammers, 14/07/19 15:04:
Several, I think. The most significant I remember were from Sweden and Finland.
Any pointers?
Maybe http://www.ksamsok.se/in-english/ , but I know more about http://data.nationallibrary.fi/bib/sparql and related.
Semi-open? According to https://www.wikidata.org/wiki/Wikidata:SPARQL_federation_input/Archive#dati.... this is cc-by 2.5. I assume they did a separate release?
They changed the license a few times, I think. When we imported it, it was CC-BY-3.0-it which is fine because it waives the sui generis rights completely.
Anyway, their platform (Lodview) is quite nice. We should also add links to things like http://dati.beniculturali.it/iccd/schede/resource/GeographicalFeature/Comune...
I guess it wouldn't harm. Matching municipalities is often a major pain. The amount of "open data" which is released with usable references to municipalities is negligible, usually you end up manually matching names or codes in free text form in some CSV.
and http://dati.beniculturali.it/iccd/schede/resource/uod/S010537 .
Importing the entire of the ontology itself can be trickier. More work on this side has been done by ICCU and ICCD (the ministry): usually it takes them a few years of manual work to connect an ontology.
For our import it was more important to handle the objects which had very little (structured) information. The more detailed descriptions are usually sparsely used (in this case you linked, only by one province which was cataloguing first world war damages? I don't know).
With the federation in place, it's possible to set up automated reports to find mismatches between the data. See for example the report on https://www.wikidata.org/wiki/Property_talk:P1006/Mismatches . Obvious report for this domain would be monuments in the beniculturali database, but not on Wikidata. Or do you already have something in place?
We used the SPARQL queries listed at the end of the page: https://www.wikidata.org/?curid=30576438#Reports_for_cleanup_and_data_improvement. I don't remember if federated queries were fast enough at the time to be usable, I only remember using them for small subsets of the data.
As far as I can see the bot is coded to get all the data at once and then see what needs doing. It doesn't attempt to get incremental updates with federated queries. https://github.com/synapta/wikidata-mibact-luoghi-cultura/blob/master/bot-mibact-to-wikidata/queries.js
Federico
Hi Federico,
On 14-07-19 14:49, Federico Leva (Nemo) wrote:
Maarten Dammers, 14/07/19 15:04:
Several, I think. The most significant I remember were from Sweden and Finland.
Any pointers?
Maybe http://www.ksamsok.se/in-english/ , but I know more about http://data.nationallibrary.fi/bib/sparql and related.
Maybe one of the locals has more information.
Anyway, their platform (Lodview) is quite nice. We should also add links to things like http://dati.beniculturali.it/iccd/schede/resource/GeographicalFeature/Comune...
I guess it wouldn't harm. Matching municipalities is often a major pain. The amount of "open data" which is released with usable references to municipalities is negligible, usually you end up manually matching names or codes in free text form in some CSV.
I took https://www.wikidata.org/wiki/Q42327 and http://dati.beniculturali.it/iccd/schede/resource/GeographicalFeature/Comune... to compare them: * We have ISTAT ID set to 020036, they have owl:sameAs http://dati.isprambiente.it/id/place/20036 which has haCodIstat set to 020036 and links back to Wikidata (and a lot more) * We have Italian cadastre code F705, they have owl:sameAs http://spcdata.digitpa.gov.it/Comune/F705 All sorts of cross links exist and we should be able to add quite a few missing links.
Looks to me that http://dati.isprambiente.it/sparql/ is under a free license ( http://dati.isprambiente.it/id/place/20036 give cc-by 4.0 international at the bottom) and I don't see it at the report. I guess this one is good to go to add to https://www.wikidata.org/wiki/Wikidata:SPARQL_federation_input#Incoming_nomi... ?
and http://dati.beniculturali.it/iccd/schede/resource/uod/S010537 .
Importing the entire of the ontology itself can be trickier. More work on this side has been done by ICCU and ICCD (the ministry): usually it takes them a few years of manual work to connect an ontology.
For our import it was more important to handle the objects which had very little (structured) information. The more detailed descriptions are usually sparsely used (in this case you linked, only by one province which was cataloguing first world war damages? I don't know).
The record is actually linked to http://dati.beniculturali.it/iccd/schede/resource/Site/Sito_di_S010537_Chies... which links to http://dati.beniculturali.it/lodview/iccd/schede/resource/Address/Indirizzo_... and http://dati.beniculturali.it/lodview/iccd/schede/resource/GeographicalFeatur... giving a lot more context. Here you can also see that the same site seems to be listed twice ( http://dati.beniculturali.it/lodview/iccd/schede/resource/Site/Sito_di_S0105... and http://dati.beniculturali.it/lodview/iccd/schede/resource/Site/Sito_di_S0105... ) with something happening around 1935. Not sure which one we shoud link to.
With the federation in place, it's possible to set up automated reports to find mismatches between the data. See for example the report on https://www.wikidata.org/wiki/Property_talk:P1006/Mismatches . Obvious report for this domain would be monuments in the beniculturali database, but not on Wikidata. Or do you already have something in place?
We used the SPARQL queries listed at the end of the page: https://www.wikidata.org/?curid=30576438#Reports_for_cleanup_and_data_improvement. I don't remember if federated queries were fast enough at the time to be usable, I only remember using them for small subsets of the data.
These federated queries seem to break a lot. Looks my example started timing out in April.....
As far as I can see the bot is coded to get all the data at once and then see what needs doing. It doesn't attempt to get incremental updates with federated queries. https://github.com/synapta/wikidata-mibact-luoghi-cultura/blob/master/bot-mibact-to-wikidata/queries.js
It probably produced items like https://www.wikidata.org/wiki/Q55162430 ? Based on the link quite a bit more info could be added.
Maybe this (linked open data and Wiki Loves Monuments) is something fun to work on during the pre-conference of Wikimania. We should probably get some of the missing SPARQL endpoints whitelisted before so we won't be slowed down by that.
Maarten
Several, I think. The most significant I remember were from Sweden and Finland.
Any pointers?
Maybe http://www.ksamsok.se/in-english/ , but I know more about http://data.nationallibrary.fi/bib/sparql and related.
Maybe one of the locals has more information.
The link to k-samsök is excellent and should provide you with necessary information. Remember that linked open data doesn't equal SPARQL endpoints.
Best regards, Jan Ainali
Hi Jan,
On 14-07-19 16:36, Jan Ainali wrote:
Several, I think. The most significant I remember were from Sweden and Finland.
Any pointers?
Maybe http://www.ksamsok.se/in-english/ , but I know more about http://data.nationallibrary.fi/bib/sparql and related.
Maybe one of the locals has more information.
The link to k-samsök is excellent and should provide you with necessary information. Remember that linked open data doesn't equal SPARQL endpoints.
Sure, but SPARQL is the de facto standard these days for making linked open data accessible. If you already went through the trouble of making URI's, RDF, etc. you might as well throw it all into a SPARQL endpoint to unlock all the federation magic. I'm sure some (most!) API's are much better than SPARQL, but with SPARQL I only need to figure out the data model, with an API I also need to figure out what the calls are, get some stupid key, etc.
Seeing the started in 2009 they are probably subject to the "Wet van de remmende voorsprong" ( https://en.wikipedia.org/wiki/Law_of_the_handicap_of_a_head_start ). For organizations starting now they just to do some data wrangling with something like Poolparty or Lodview and they can do all the pretty stuff. Back in 2009, that wasn't available yet.
Do you have some kind of system in place to keep Sweden in sync? Or should I ask Alicia and André?
Maarten
Maarten Dammers, 14/07/19 16:52:> All sorts of cross links exist and we should be able to add quite a few
missing links.
I can't wait to see the missing links. :)
Looks to me that http://dati.isprambiente.it/sparql/ is under a free license ( http://dati.isprambiente.it/id/place/20036 give cc-by 4.0 international at the bottom) and I don't see it at the report. I guess this one is good to go to add to https://www.wikidata.org/wiki/Wikidata:SPARQL_federation_input#Incoming_nomi... ?
It's definitely an interesting dataset to federate with, for instance the "soil consumption" dataset was heavily used on some Italian Wikipedia articles. http://dati.isprambiente.it/dataset/cds/
It probably produced items like https://www.wikidata.org/wiki/Q55162430 ? Based on the link quite a bit more info could be added.
Hm this might be one of the few cases where the basic matches (such as municipality) have failed. A more typical item is https://www.wikidata.org/wiki/Q55677691.
Maybe this (linked open data and Wiki Loves Monuments) is something fun to work on during the pre-conference of Wikimania. We should probably get some of the missing SPARQL endpoints whitelisted before so we won't be slowed down by that.
I'll probably be there!
Federico
By the way I wonder if the WMIT/Synapta import from "Luoghi della cultura" is the reason Italy is so visible in this map: https://meta.wikimedia.org/wiki/File:Map_of_GLAM_institutions_on_Wikidata,_m...
Federico
wikilovesmonuments@lists.wikimedia.org