Re: [Wikidata] Linked data fragment enabled on the Query Service

21 Dec 2016

Hi Markus,

Answering this as the LDF lead developer.

...
  (1) The results do not seem to be correct. The example
query related to films returns 55 results, while on the official endpoint it returns 128.
It seems that this is not because of missing data, but because of wrong multiplicities
(the correct result has several rows repeated multiple times). Is there an implicit
DISTINCT applied in this service somewhere? 
No, there isn't. Let me investigate what goes wrong there.

...
  Are there any other changes from the normal SPARQL
semantics? 
There should be none; any you find are bugs.

...
  (2) It is really slow. 
Depending on your definition of "really", yes.

For this, I'd like to point to the overall aim of the LDF project,
as documented on our website and papers.
Summarizing: the SemWeb community has almost exclusively
cared about speed so far concerning query execution.
This has resulted in super-fast, but super-expensive services,
which simply don't work on the public Web.
More than half of all public SPARQL endpoints
are down for more than 1.5 days each month [1].

I started LDF with the idea:
what if other things are important as well?
What if it is acceptable to trade speed for lower server cost
and higher cacheability and scalability?

The other alternatives are:
– downloading a data dump and querying yourself (would take >= 20 mins)
– executing traversal-based querying (incomplete and likely >= 10 mins)
So in that sense, TPF is slow, but not as slow as other alternatives.
Furthermore, these alternatives have other problems,
such as bandwidth, remaining up-to-date, and completeness.

...
  The sample query took 55s on my machine (producing
only half of the results), while it takes 0.2s on WDQS. 
Exactly, that's the trade-off we offer.
And in many cases, it isn't as bad as in this example.
And in many others, it's even worse.
But that's something we accept;
Web scalability is our main goal.
Everything is possible, some things just very slowly.

WDQS is exceptional, in the sense that
it has an uptime unlike any public SPARQL endpoint.
But for the average public SPARQL endpoint, my answer would be:
“Yes, that's 0.2s–if you're lucky.
 Otherwise, it can take 1h for the endpoint to come up again.
 The TPF interface is 55s, but consistently so.”

We document the speed/cost trade-off extensively in our research [2],
especially in our ISWC paper [3] and JWS article [4].

The JWS article also shows that we perform great on federation,
in some cases even better than the state-of-the-art with SPARQL endpoints.

...
  I am afraid that hard queries which would timeout on
WDQS might take too long to be used at all. 
True, but they would cost less on the server.
And that's what we optimize for, not speed.

...
  Can I use the service from a program 
Yes: https://github.com/LinkedDataFragments/Client.js

...
  Ideally, I would like to use it like a SPARQL service
that I send a request to. Is this possible? 
Yes, the above software package also has a SPARQL endpoint.

...
  I understand that it is a demo, and what the
motivation is on paper 
So for clarity: the motivation is low server cost,
and easy federation.

...
  if it returns incorrect results, then it is of little
use. 
Of course; I'll look into the bug.

...
  Or are federated queries the main goal here?
(that's still useful, but I hope that WDQS will also support a whitelisted set of
external endpoint at some time) 
Not just about the whitelist;
TPF has been shown to do several cases of federation faster
and with higher completeness.

Best,

Ruben

[1] https://aran.library.nuigalway.ie/handle/10379/4545
[2] http://linkeddatafragments.org/publications/
[3] http://linkeddatafragments.org/publications/iswc2014.pdf
[4] http://linkeddatafragments.org/publications/jws2016.pdf

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

Re: [Wikidata] Linked data fragment enabled on the Query Service