On 13.02.2016 23:50, Kingsley Idehen wrote:
...
Markus and others interested in this matter,
What about using OFFSET and LIMIT to address this problem? That's what
we advice users of the DBpedia endpoint (and others we publish) to do.
We have to educate people about query implications and options. Even
after that, you have the issue of timeouts (which aren't part of the
SPARQL spec) that can be used to produce partial results (notified via
HTTP headers), but that's something that comes after the basic scrolling
functionality of OFFSET and LIMIT are understood.
I think this does not help here. If I only ask for part of the data
(see my previous email), I can get all 300K results in 9.3sec. The
size of the result does not seem to be the issue. If I add further
joins to the query, the time needed seems to go above 10sec (timeout)
even with a LIMIT. Note that you need to order results for using LIMIT
in a reliable way, since the data changes by the minute and the
"natural" order of results would change as well. I guess with a
blocking operator like ORDER BY in the equation, the use of LIMIT does
not really save much time (other than for final result serialisation
and transfer, which seems pretty quick).
Markus
Markus,
LIMIT isn't the key element in my example since all it does is set
cursor size. It's the use of OFFSET to move the cursor through positions
in the solution that's key here.
Fundamentally, this is about using HTTP GET requests to page through the
data if a single query solution is either too large or its preparation
exceeds underlying DBMS timeout settings.
Ultimately, developers have to understand these time-tested techniques
for working with data.
Kingsley
--
Regards,
Kingsley Idehen
Founder & CEO
OpenLink Software
Company Web: