Dear all,
I don't mean to hijack the thread, but for federation purposes, you might be interested in a Triple Pattern Fragments interface [1]. TPF offers lower server cost to reach high availability, at the expense of slower queries and higher bandwidth [2]. This is possible because the client performs most of the query execution.
I noticed the Wikidata SPARQL endpoint has had an excellent track record so far (congratulations on this), so the TPF solution might not be necessary for server cost / availability reasons.
However, TPF is an excellent solution for federated queries. In (yet to be pulbished) experiments, we have verified that the TPF client/server solution performs on par with state-of-the-art federation frameworks based on SPARQL endpoints for many simple and complex queries. Furthermore, there are no security problems etc. ("open proxy"), because all federation is performed by the client.
You can see a couple of example queries here with other datasets: – Works by writers born in Stockholm (VIAF and DBpedia – http://bit.ly/writers-stockholm) – Books by Swedish Nobel prize winners that are in the Harvard Library (VIAF, DBpedia, Harvard – http://bit.ly/swedish-nobel-harvard)
It might be a quick win to set up a TPF interface on top of the existing SPARQL endpoint. If you want any info, don't hesitate to ask.
Best,
Ruben
[1] http://linkeddatafragments.org/in-depth/ [2] http://linkeddatafragments.org/publications/iswc2014.pdf
Dear Ruben,
LDF seems a very promising solution to build reliable Linked Data production environment with high scalability at relatively low cost.
However, I'm not sure if the solution works well on queries like the ones discussed here (see below). It would be very interesting to learn how exactly such a query would be dealt with in an LDF client / server setting.
To me, a crucial point seems to be that I'm trying to look up a large number of distinct entities in two endpoints and join them. In the "real life" case discussed here, about 430.000 "economists" extracted from GND and about 320.000 "persons with GND id" from wikidata. The result of the join are about 30.000 wikidata items, for which the German and English wikipedia site links are required.
How could an LDF client get this information effectively?
Cheers, Joachim
PREFIX skos: http://www.w3.org/2004/02/skos/core# PREFIX schema: http://schema.org/ # construct { ?gnd schema:about ?sitelink . } where { # the relevant wikidata items have already been # identified and loaded to the econ_pers endpoint in a # previous step service http://zbw.eu/beta/sparql/econ_pers/query { ?gnd skos:prefLabel [] ; skos:exactMatch ?wd . filter(contains(str(?wd), 'wikidata')) } ?sitelink schema:about ?wd ; schema:inLanguage ?language . filter (contains(str(?sitelink), 'wikipedia')) filter (lang(?wdLabel) = ?language && ?language in ('en', 'de')) }
-----Ursprüngliche Nachricht----- Von: Ruben Verborgh [mailto:ruben.verborgh@ugent.be] Gesendet: Donnerstag, 18. Februar 2016 14:02 An: wikidata@lists.wikimedia.org Cc: Neubert, Joachim Betreff: Re: [Wikidata] Make federated queries possible / was: SPARQL CONSTRUCT results truncated
Dear all,
I don't mean to hijack the thread, but for federation purposes, you might be interested in a Triple Pattern Fragments interface [1]. TPF offers lower server cost to reach high availability, at the expense of slower queries and higher bandwidth [2]. This is possible because the client performs most of the query execution.
I noticed the Wikidata SPARQL endpoint has had an excellent track record so far (congratulations on this), so the TPF solution might not be necessary for server cost / availability reasons.
However, TPF is an excellent solution for federated queries. In (yet to be pulbished) experiments, we have verified that the TPF client/server solution performs on par with state-of-the-art federation frameworks based on SPARQL endpoints for many simple and complex queries. Furthermore, there are no security problems etc. ("open proxy"), because all federation is performed by the client.
You can see a couple of example queries here with other datasets: - Works by writers born in Stockholm (VIAF and DBpedia - http://bit.ly/writers-stockholm) - Books by Swedish Nobel prize winners that are in the Harvard Library (VIAF, DBpedia, Harvard - http://bit.ly/swedish-nobel-harvard)
It might be a quick win to set up a TPF interface on top of the existing SPARQL endpoint. If you want any info, don't hesitate to ask.
Best,
Ruben
[1] http://linkeddatafragments.org/in-depth/ [2] http://linkeddatafragments.org/publications/iswc2014.pdf
Hi Joachim,
To me, a crucial point seems to be that I'm trying to look up a large number of distinct entities in two endpoints and join them. In the "real life" case discussed here, about 430.000 "economists" extracted from GND and about 320.000 "persons with GND id" from wikidata. The result of the join are about 30.000 wikidata items, for which the German and English wikipedia site links are required.
The query plan a regular TPF client would come up with, would probably not differ from that of (most) SPARQL federation engines, so they would be similarly slow.
However…
You might know that TPF is an interface that allows for auto-discoverable extensions. Recently, we published an extension of TPF that uses Bloom filters to perform faster joins [1]. The trade-off is that the server needs to perform an extra operation (but if this saves thousands of other requests, that might be worthwhile). The public implementation works, but is still preliminary; however, if there is interest in such cases, we might speed things up. Let us know!
Best,
Ruben
[1] http://linkeddatafragments.org/publications/iswc2015-amf.pdf