Hi Federico,
On 14-07-19 14:49, Federico Leva (Nemo) wrote:
Maarten Dammers, 14/07/19 15:04:
Several, I think. The most significant I remember were from Sweden and Finland.
Any pointers?
Maybe http://www.ksamsok.se/in-english/ , but I know more about http://data.nationallibrary.fi/bib/sparql and related.
Maybe one of the locals has more information.
Anyway, their platform (Lodview) is quite nice. We should also add links to things like http://dati.beniculturali.it/iccd/schede/resource/GeographicalFeature/Comune...
I guess it wouldn't harm. Matching municipalities is often a major pain. The amount of "open data" which is released with usable references to municipalities is negligible, usually you end up manually matching names or codes in free text form in some CSV.
I took https://www.wikidata.org/wiki/Q42327 and http://dati.beniculturali.it/iccd/schede/resource/GeographicalFeature/Comune... to compare them: * We have ISTAT ID set to 020036, they have owl:sameAs http://dati.isprambiente.it/id/place/20036 which has haCodIstat set to 020036 and links back to Wikidata (and a lot more) * We have Italian cadastre code F705, they have owl:sameAs http://spcdata.digitpa.gov.it/Comune/F705 All sorts of cross links exist and we should be able to add quite a few missing links.
Looks to me that http://dati.isprambiente.it/sparql/ is under a free license ( http://dati.isprambiente.it/id/place/20036 give cc-by 4.0 international at the bottom) and I don't see it at the report. I guess this one is good to go to add to https://www.wikidata.org/wiki/Wikidata:SPARQL_federation_input#Incoming_nomi... ?
and http://dati.beniculturali.it/iccd/schede/resource/uod/S010537 .
Importing the entire of the ontology itself can be trickier. More work on this side has been done by ICCU and ICCD (the ministry): usually it takes them a few years of manual work to connect an ontology.
For our import it was more important to handle the objects which had very little (structured) information. The more detailed descriptions are usually sparsely used (in this case you linked, only by one province which was cataloguing first world war damages? I don't know).
The record is actually linked to http://dati.beniculturali.it/iccd/schede/resource/Site/Sito_di_S010537_Chies... which links to http://dati.beniculturali.it/lodview/iccd/schede/resource/Address/Indirizzo_... and http://dati.beniculturali.it/lodview/iccd/schede/resource/GeographicalFeatur... giving a lot more context. Here you can also see that the same site seems to be listed twice ( http://dati.beniculturali.it/lodview/iccd/schede/resource/Site/Sito_di_S0105... and http://dati.beniculturali.it/lodview/iccd/schede/resource/Site/Sito_di_S0105... ) with something happening around 1935. Not sure which one we shoud link to.
With the federation in place, it's possible to set up automated reports to find mismatches between the data. See for example the report on https://www.wikidata.org/wiki/Property_talk:P1006/Mismatches . Obvious report for this domain would be monuments in the beniculturali database, but not on Wikidata. Or do you already have something in place?
We used the SPARQL queries listed at the end of the page: https://www.wikidata.org/?curid=30576438#Reports_for_cleanup_and_data_improvement. I don't remember if federated queries were fast enough at the time to be usable, I only remember using them for small subsets of the data.
These federated queries seem to break a lot. Looks my example started timing out in April.....
As far as I can see the bot is coded to get all the data at once and then see what needs doing. It doesn't attempt to get incremental updates with federated queries. https://github.com/synapta/wikidata-mibact-luoghi-cultura/blob/master/bot-mibact-to-wikidata/queries.js
It probably produced items like https://www.wikidata.org/wiki/Q55162430 ? Based on the link quite a bit more info could be added.
Maybe this (linked open data and Wiki Loves Monuments) is something fun to work on during the pre-conference of Wikimania. We should probably get some of the missing SPARQL endpoints whitelisted before so we won't be slowed down by that.
Maarten