On 13.02.2016 22:56, Markus Kroetzsch wrote:
And here is another comment on this interesting topic
:-)
I just realised how close the service is to answering the query. It
turns out that you can in fact get the whole set of (currently >324000
result items) together with their GND identifiers as a download *within
the timeout* (I tried several times without any errors). This is a 63M
json result file with >640K individual values, and it downloads in no
time on my home network. The query I use is simply this:
PREFIX wd: <http://www.wikidata.org/entity/>
PREFIX wdt: <http://www.wikidata.org/prop/direct/>
select ?item ?gndId
where {
?item wdt:P227 ?gndId ; # get gnd ID
wdt:P31 wd:Q5 . # instance of human
} ORDER BY ASC(?gndId) LIMIT 10
(don't run this in vain: even with the limit, the ORDER clause requires
the service to compute all results every time someone runs this. Also be
careful when removing the limit; your browser may hang on an HTML page
that large; better use the SPARQL endpoint directly to download the
complete result file.)
P.S. For those who are interested, here is the direct link to the
complete result (remove the line break [1]):
https:
//query.wikidata.org/sparql?query=PREFIX+wd%3A+%3Chttp%3A%2F%2Fwww.wikidata.org%2Fentity%2F%3E%0D%0APREFIX+wdt%3A+%3Chttp%3A%2F%2Fwww.wikidata.org%2Fprop%2Fdirect%2F%3E%0D%0Aselect+%3Fitem+%3FgndId+where+{+%3Fitem+wdt%3AP227+%3FgndId+%3B+wdt%3AP31++wd%3AQ5+.+}+ORDER+BY+ASC%28%3FgndId%29&format=json
Markus
[1] Is the service protected against internet crawlers that find such
links in the online logs of this email list? It would be a pity if we
would have to answer this query tens of thousands of times for many
years to come just to please some spiders who have no use for the result.
It seems that the timeout is only hit when adding more information
(labels and wiki URLs) to the result.
So it seems that we are not actually very far away from being able to
answer the original query even within the timeout. Certainly not as far
away as I first thought. It might not be necessary at all to switch to a
different approach (though it would be interesting to know how long LDF
takes to answer the above -- our current service takes less than 10sec).
Cheers,
Markus
On 13.02.2016 11:40, Peter Haase wrote:
Hi,
you may want to check out the Linked Data Fragment server in Blazegraph:
https://github.com/blazegraph/BlazegraphBasedTPFServer
Cheers,
Peter
On 13.02.2016, at 01:33, Stas Malyshev
<smalyshev(a)wikimedia.org> wrote:
Hi!
The Linked data fragments approach Osma mentioned
is very interesting
(particularly the bit about setting it up on top of an regularily
updated existing endpoint), and could provide another alternative,
but I have not yet experimented with it.
There is apparently this:
https://github.com/CristianCantoro/wikidataldf
though not sure what it its status - I just found it.
In general, yes, I think checking out LDF may be a good idea. I'll put
it on my todo list.
--
Stas Malyshev
smalyshev(a)wikimedia.org
_______________________________________________
Wikidata mailing list
Wikidata(a)lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata
_______________________________________________
Wikidata mailing list
Wikidata(a)lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata
--
Markus Kroetzsch
Faculty of Computer Science
Technische Universität Dresden
+49 351 463 38486
http://korrekt.org/