On 2/13/16 4:56 PM, Markus Kroetzsch wrote:
And here is another comment on this interesting topic
:-)
I just realised how close the service is to answering the query. It
turns out that you can in fact get the whole set of (currently >324000
result items) together with their GND identifiers as a download
*within the timeout* (I tried several times without any errors). This
is a 63M json result file with >640K individual values, and it
downloads in no time on my home network. The query I use is simply this:
PREFIX wd: <http://www.wikidata.org/entity/>
PREFIX wdt: <http://www.wikidata.org/prop/direct/>
select ?item ?gndId
where {
?item wdt:P227 ?gndId ; # get gnd ID
wdt:P31 wd:Q5 . # instance of human
} ORDER BY ASC(?gndId) LIMIT 10
(don't run this in vain: even with the limit, the ORDER clause
requires the service to compute all results every time someone runs
this. Also be careful when removing the limit; your browser may hang
on an HTML page that large; better use the SPARQL endpoint directly to
download the complete result file.)
It seems that the timeout is only hit when adding more information
(labels and wiki URLs) to the result.
So it seems that we are not actually very far away from being able to
answer the original query even within the timeout. Certainly not as
far away as I first thought. It might not be necessary at all to
switch to a different approach (though it would be interesting to know
how long LDF takes to answer the above -- our current service takes
less than 10sec).
Cheers,
Markus
For a page-size of 20 (covered by LIMIT) you can move through offets of
20 via:
First call:
PREFIX wd: <http://www.wikidata.org/entity/>
PREFIX wdt: <http://www.wikidata.org/prop/direct/>
select ?item ?gndId
where {
?item wdt:P227 ?gndId ; # get gnd ID
wdt:P31 wd:Q5 . # instance of human
} ORDER BY ASC(?gndId) OFFSET 10 LIMIT 10
Next call:
PREFIX wd: <http://www.wikidata.org/entity/>
PREFIX wdt: <http://www.wikidata.org/prop/direct/>
select ?item ?gndId
where {
?item wdt:P227 ?gndId ; # get gnd ID
wdt:P31 wd:Q5 . # instance of human
} ORDER BY ASC(?gndId) OFFSET 20 LIMIT 10
Subsequent Calls:
PREFIX wd: <http://www.wikidata.org/entity/>
PREFIX wdt: <http://www.wikidata.org/prop/direct/>
select ?item ?gndId
where {
?item wdt:P227 ?gndId ; # get gnd ID
wdt:P31 wd:Q5 . # instance of human
} ORDER BY ASC(?gndId) OFFSET {last-offset-plus-20} LIMIT 10
Remember, you simply change the OFFSET value in the SPARQL HTTP URL.
--
Regards,
Kingsley Idehen
Founder & CEO
OpenLink Software
Company Web:
http://www.openlinksw.com
Personal Weblog 1:
http://kidehen.blogspot.com
Personal Weblog 2:
http://www.openlinksw.com/blog/~kidehen
Twitter Profile:
https://twitter.com/kidehen
Google+ Profile:
https://plus.google.com/+KingsleyIdehen/about
LinkedIn Profile:
http://www.linkedin.com/in/kidehen
Personal WebID:
http://kingsley.idehen.net/dataspace/person/kidehen#this