Dear Ruben,
LDF seems a very promising solution to build reliable Linked Data production environment
with high scalability at relatively low cost.
However, I'm not sure if the solution works well on queries like the ones discussed
here (see below). It would be very interesting to learn how exactly such a query would be
dealt with in an LDF client / server setting.
To me, a crucial point seems to be that I'm trying to look up a large number of
distinct entities in two endpoints and join them. In the "real life" case
discussed here, about 430.000 "economists" extracted from GND and about 320.000
"persons with GND id" from wikidata. The result of the join are about 30.000
wikidata items, for which the German and English wikipedia site links are required.
How could an LDF client get this information effectively?
Cheers, Joachim
PREFIX skos:
<http://www.w3.org/2004/02/skos/core#>
PREFIX schema: <http://schema.org/>
#
construct {
?gnd schema:about ?sitelink .
}
where {
# the relevant wikidata items have already been
# identified and loaded to the econ_pers endpoint in a
# previous step
service <http://zbw.eu/beta/sparql/econ_pers/query> {
?gnd skos:prefLabel [] ;
skos:exactMatch ?wd .
filter(contains(str(?wd), 'wikidata'))
}
?sitelink schema:about ?wd ;
schema:inLanguage ?language .
filter (contains(str(?sitelink), 'wikipedia'))
filter (lang(?wdLabel) = ?language && ?language in ('en',
'de')) }
-----Ursprüngliche Nachricht-----
Von: Ruben Verborgh [mailto:ruben.verborgh@ugent.be]
Gesendet: Donnerstag, 18. Februar 2016 14:02
An: wikidata(a)lists.wikimedia.org
Cc: Neubert, Joachim
Betreff: Re: [Wikidata] Make federated queries possible / was: SPARQL CONSTRUCT results
truncated
Dear all,
I don't mean to hijack the thread, but for federation purposes, you might be
interested in a Triple Pattern Fragments interface [1]. TPF offers lower server cost to
reach high availability, at the expense of slower queries and higher bandwidth [2]. This
is possible because the client performs most of the query execution.
I noticed the Wikidata SPARQL endpoint has had an excellent track record so far
(congratulations on this), so the TPF solution might not be necessary for server cost /
availability reasons.
However, TPF is an excellent solution for federated queries. In (yet to be pulbished)
experiments, we have verified that the TPF client/server solution performs on par with
state-of-the-art federation frameworks based on SPARQL endpoints for many simple and
complex queries. Furthermore, there are no security problems etc. ("open
proxy"), because all federation is performed by the client.
You can see a couple of example queries here with other datasets:
- Works by writers born in Stockholm (VIAF and DBpedia -
http://bit.ly/writers-stockholm)
- Books by Swedish Nobel prize winners that are in the Harvard Library (VIAF, DBpedia,
Harvard -
http://bit.ly/swedish-nobel-harvard)
It might be a quick win to set up a TPF interface on top of the existing SPARQL endpoint.
If you want any info, don't hesitate to ask.
Best,
Ruben
[1]
http://linkeddatafragments.org/in-depth/
[2]
http://linkeddatafragments.org/publications/iswc2014.pdf