Hi Stas,
Thanks for your answer. You asked how long the query runs: 8.21 sec (having processed 6443
triples), in an example invocation. If roughly linear, that could mean 800-1500 sec for
the whole set. However, I would expect a clearly shorter runtime: I routinely use queries
of similar complexity and result sizes on ZBW's public endpoints. One arbitrary
selected query which extracts data from GND runs for less than two minutes to produce 1.2m
triples.
Given the size of Wikidata, I wouldn't consider such an use abusive. Of course, if you
have lots of competing queries and resources are limited, it is completely legitimate to
implement some policy which formulates limits and enforces them technically (throddle down
long-running queries, or limit the number of produced triples, or the execution time, or
whatever seems reasonable and can be implemented).
Anyway, in this case (truncation in the middle of a statement), it looks much more like
some technical bug (or an obscure timeout somewhere down the way). The execution time and
the result size varies widely:
5.44s empty result
8.60s 2090 triples
5.44s empty result
22.70s 27352 triples
Can you reproduce this kind of results with the given query, or with other supposedly
longer-running queries?
Thanks again for looking into this.
Cheers, Joachim
PS. I plan to set up an own Wikidata SPAQL endpoint to do more complex things, but that
depends on a new machine which will be available in some month. For now, I'd just like
to know which for "our" persons (economists and the like) have wikipedia
pages.
PPS. From my side, I would much more have liked to build a query which asks for exactly
the GND IDs I'm interested in (about 430.000 out of millions of GNDs). This would have
led to a much smaller result - but I cannot squeeze that query into a GET request ...
-----Ursprüngliche Nachricht-----
Von: Wikidata [mailto:wikidata-bounces@lists.wikimedia.org] Im Auftrag von Stas Malyshev
Gesendet: Donnerstag, 11. Februar 2016 01:35
An: Discussion list for the Wikidata project.
Betreff: Re: [Wikidata] SPARQL CONSTRUCT results truncated
Hi!
I try to extract all mappings from wikidata to the GND
authority file,
along with the according wikipedia pages, expecting roughly 500,000 to
1m triples as result.
As a starting note, I don't think extracting 1M triples may be the best way to use
query service. If you need to do processing that returns such big result sets - in
millions - maybe processing the dump - e.g. with wikidata toolkit at
https://github.com/Wikidata/Wikidata-Toolkit - would be better idea?
However, with various calls, I get much less triples
(about 2,000 to
10,000). The output seems to be truncated in the middle of a statement, e.g.
It may be some kind of timeout because of the quantity of the data being sent. How long
does such request take?
--
Stas Malyshev
smalyshev(a)wikimedia.org
_______________________________________________
Wikidata mailing list
Wikidata(a)lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata