On 2/13/16 6:26 PM, Markus Kroetzsch wrote:
On 13.02.2016 23:50, Kingsley Idehen wrote: ...
Markus and others interested in this matter,
What about using OFFSET and LIMIT to address this problem? That's what we advice users of the DBpedia endpoint (and others we publish) to do.
We have to educate people about query implications and options. Even after that, you have the issue of timeouts (which aren't part of the SPARQL spec) that can be used to produce partial results (notified via HTTP headers), but that's something that comes after the basic scrolling functionality of OFFSET and LIMIT are understood.
I think this does not help here. If I only ask for part of the data (see my previous email), I can get all 300K results in 9.3sec. The size of the result does not seem to be the issue. If I add further joins to the query, the time needed seems to go above 10sec (timeout) even with a LIMIT. Note that you need to order results for using LIMIT in a reliable way, since the data changes by the minute and the "natural" order of results would change as well. I guess with a blocking operator like ORDER BY in the equation, the use of LIMIT does not really save much time (other than for final result serialisation and transfer, which seems pretty quick).
Markus
Markus,
LIMIT isn't the key element in my example since all it does is set cursor size. It's the use of OFFSET to move the cursor through positions in the solution that's key here.
Fundamentally, this is about using HTTP GET requests to page through the data if a single query solution is either too large or its preparation exceeds underlying DBMS timeout settings.
Ultimately, developers have to understand these time-tested techniques for working with data.
Kingsley
[1] http://stackoverflow.com/questions/20937556/how-to-get-all-companies-from-db...
[2] https://sourceforge.net/p/dbpedia/mailman/message/29172307/
Wikidata mailing list Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata