Re: [Wikidata] SPARQL CONSTRUCT results truncated

11 Feb 2016

Hi Joachim,

I think the problem is not to answer your query in 5min or so (Wikidata 
Toolkit on my laptop takes 27min without a database, by simply parsing 
the whole data file, so any database that already has the data should be 
much faster). The bigger issue is that you would have to configure the 
site to run for 5min before timeout. This would mean that other queries 
that never terminate (because they are really hard) also can run for at 
least this time. It seems that this could easily cause the service to 
break down.

Maybe one could have an "unstable" service on a separate machine that 
does the same as WDQS but with a much more liberal timeout and less 
availability (if it's overloaded a lot, it will just be down more often, 
but you would know when you use it that this is the deal).

Cheers,

Markus

On 11.02.2016 15:54, Neubert, Joachim wrote:
...
  Hi Stas,

 Thanks for your answer. You asked how long the query runs: 8.21 sec (having processed
6443 triples), in an example invocation. If roughly linear, that could mean 800-1500 sec
for the whole set. However, I would expect a clearly shorter runtime: I routinely use
queries of similar complexity and result sizes on ZBW's public endpoints. One
arbitrary selected query which extracts data from GND runs for less than two minutes to
produce 1.2m triples.

 Given the size of Wikidata, I wouldn't consider such an use abusive. Of course, if
you have lots of competing queries and resources are limited, it is completely legitimate
to implement some policy which formulates limits and enforces them technically (throddle
down long-running queries, or limit the number of produced triples, or the execution time,
or whatever seems reasonable and can be implemented).

 Anyway, in this case (truncation in the middle of a statement), it looks much more like
some technical bug (or an obscure timeout somewhere down the way). The execution time and
the result size varies widely:

 5.44s empty result
 8.60s 2090 triples
 5.44s empty result
 22.70s 27352 triples

 Can you reproduce this kind of results with the given query, or with other supposedly
longer-running queries?

 Thanks again for looking into this.

 Cheers, Joachim

 PS. I plan to set up an own Wikidata SPAQL endpoint to do more complex things, but that
depends on a new machine which will be available in some month. For now, I'd just like
to know which for "our" persons (economists and the like) have wikipedia pages.

 PPS. From my side, I would much more have liked to build a query which asks for exactly
the GND IDs I'm interested in (about 430.000 out of millions of GNDs). This would have
led to a much smaller result - but I cannot squeeze that query into a GET request ...

 -----Ursprüngliche Nachricht-----
 Von: Wikidata [mailto:wikidata-bounces@lists.wikimedia.org] Im Auftrag von Stas Malyshev
 Gesendet: Donnerstag, 11. Februar 2016 01:35
 An: Discussion list for the Wikidata project.
 Betreff: Re: [Wikidata] SPARQL CONSTRUCT results truncated

 Hi!

  I try to extract all mappings from wikidata to
the GND authority file,
 along with the according wikipedia pages, expecting roughly 500,000 to
 1m triples as result. 
 As a starting note, I don't think extracting 1M triples may be the best way to use
query service. If you need to do processing that returns such big result sets - in
millions - maybe processing the dump - e.g. with wikidata toolkit at
https://github.com/Wikidata/Wikidata-Toolkit - would be better idea?

  However, with various calls, I get much less
triples (about 2,000 to
 10,000). The output seems to be truncated in the middle of a statement, e.g. 
 It may be some kind of timeout because of the quantity of the data being sent. How long
does such request take?

 --
 Stas Malyshev
 smalyshev(a)wikimedia.org

 _______________________________________________
 Wikidata mailing list
 Wikidata(a)lists.wikimedia.org
 https://lists.wikimedia.org/mailman/listinfo/wikidata
 _______________________________________________
 Wikidata mailing list
 Wikidata(a)lists.wikimedia.org
 https://lists.wikimedia.org/mailman/listinfo/wikidata

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

Re: [Wikidata] SPARQL CONSTRUCT results truncated