Why do these two SPARQL queries take such different times to run?

List overview All Threads
Download

newer

older

Re: [Wikidata] [Commons-l] Trends...

Wikidata needs your votes

James Heald

9 Sep 2015 9 Sep '15

3:05 p.m.

Prompted by this thread at Project Chat, https://www.wikidata.org/wiki/Wikidata:Project_chat#Identical_data_sets

here's a query to find multiple humans with nationality:Greece that have the same day of birth and day of death: http://tinyurl.com/ow6lpen It produces one pair, and executes in about 0.6 seconds.

Here's a query to try to add item numbers and labels to the previous search: http://tinyurl.com/ovjwzc9

It *just* completes, taking just over 60 seconds to execute.

(Please don't merge the two items yet, because that will destroy the example).

Analogous queries with lookups for France (71 apparent sets of duplicates), UK (32), and Italy(14) fail to complete.

Two questions therefore: (1) Why are the two queries taking such different times to run ? (2) Is there a good way to rewrite the second to make it faster ?

Obviously the second query as written at the moment involves a sub-query, which inevitably must make it a bit slower -- but given the solution set of the sub-query only has two rows, and an exact date for a given property ought to be a fairly quick key to look up, why is the second query taking 100 times longer than the first ?

And is there a better way I should be doing this, since the query does appear to be producing useful real matches ?

Thanks,

James.

Show replies by date

Magnus Manske

9 Sep 9 Sep

5:06 p.m.

New subject: Why do these two SPARQL queries take such different times to run?

Your "labeled" example just ran for me in 121ms.

Maybe the server gets overloaded at times and goes into disk swap? Nothing to do with the query?

On Wed, Sep 9, 2015 at 2:06 PM James Heald j.heald@ucl.ac.uk wrote:

...

Prompted by this thread at Project Chat, https://www.wikidata.org/wiki/Wikidata:Project_chat#Identical_data_sets

here's a query to find multiple humans with nationality:Greece that have the same day of birth and day of death: http://tinyurl.com/ow6lpen It produces one pair, and executes in about 0.6 seconds.

Here's a query to try to add item numbers and labels to the previous search: http://tinyurl.com/ovjwzc9

It *just* completes, taking just over 60 seconds to execute.

(Please don't merge the two items yet, because that will destroy the example).

Analogous queries with lookups for France (71 apparent sets of duplicates), UK (32), and Italy(14) fail to complete.

Two questions therefore: (1) Why are the two queries taking such different times to run ? (2) Is there a good way to rewrite the second to make it faster ?

Obviously the second query as written at the moment involves a sub-query, which inevitably must make it a bit slower -- but given the solution set of the sub-query only has two rows, and an exact date for a given property ought to be a fairly quick key to look up, why is the second query taking 100 times longer than the first ?

And is there a better way I should be doing this, since the query does appear to be producing useful real matches ?

Thanks,
James.
Wikidata mailing list Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata

James Heald

8:44 p.m.

New subject: Why do these two SPARQL queries take such different times to run?

Are you sure? 121 ms ??

It's consistently not working for me. (Whereas the first query consistently does; but even that just took 457 ms).

However, I have been finding Reasonator's date lookup feature, eg https://tools.wmflabs.org/reasonator/?date=1889-11-10 a very very useful way to work round this.

All best,

James.

On 09/09/2015 16:06, Magnus Manske wrote:

...

Your "labeled" example just ran for me in 121ms.

Maybe the server gets overloaded at times and goes into disk swap? Nothing to do with the query?

On Wed, Sep 9, 2015 at 2:06 PM James Heald j.heald@ucl.ac.uk wrote:

...
Prompted by this thread at Project Chat, https://www.wikidata.org/wiki/Wikidata:Project_chat#Identical_data_sets

here's a query to find multiple humans with nationality:Greece that have the same day of birth and day of death: http://tinyurl.com/ow6lpen It produces one pair, and executes in about 0.6 seconds.

Here's a query to try to add item numbers and labels to the previous search: http://tinyurl.com/ovjwzc9

It *just* completes, taking just over 60 seconds to execute.

(Please don't merge the two items yet, because that will destroy the example).

Analogous queries with lookups for France (71 apparent sets of duplicates), UK (32), and Italy(14) fail to complete.

Two questions therefore: (1) Why are the two queries taking such different times to run ? (2) Is there a good way to rewrite the second to make it faster ?

Obviously the second query as written at the moment involves a sub-query, which inevitably must make it a bit slower -- but given the solution set of the sub-query only has two rows, and an exact date for a given property ought to be a fairly quick key to look up, why is the second query taking 100 times longer than the first ?

And is there a better way I should be doing this, since the query does appear to be producing useful real matches ?

Thanks,
 James.
Wikidata mailing list Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata
Wikidata mailing list Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata

Stas Malyshev

10:07 p.m.

New subject: Why do these two SPARQL queries take such different times to run?

Hi!

...

here's a query to find multiple humans with nationality:Greece that have the same day of birth and day of death: http://tinyurl.com/ow6lpen It produces one pair, and executes in about 0.6 seconds.

Here's a query to try to add item numbers and labels to the previous search: http://tinyurl.com/ovjwzc9

It *just* completes, taking just over 60 seconds to execute.

It looks like some issue with nested queries in Blazegraph, I've sent a report to them and will see what they say.

...

Obviously the second query as written at the moment involves a sub-query, which inevitably must make it a bit slower -- but given the solution set of the sub-query only has two rows, and an exact date for a given property ought to be a fairly quick key to look up, why is the second query taking 100 times longer than the first ?

Yes, in theory it should be fast, so I suspect some kind of bug.

-- Stas Malyshev smalyshev@wikimedia.org

Kingsley Idehen

10 Sep 10 Sep

3:31 a.m.

New subject: Why do these two SPARQL queries take such different times to run?

On 9/9/15 4:07 PM, Stas Malyshev wrote:

...

Hi!

...
...
here's a query to find multiple humans with nationality:Greece that have the same day of birth and day of death: http://tinyurl.com/ow6lpen It produces one pair, and executes in about 0.6 seconds.

Here's a query to try to add item numbers and labels to the previous search: http://tinyurl.com/ovjwzc9

It *just* completes, taking just over 60 seconds to execute.

It looks like some issue with nested queries in Blazegraph, I've sent a report to them and will see what they say.

What's the URL of the dataset loaded into your Blazegraph DBMS? Ideally, we should have this data available from a variety of SPARQL Query Services [1].

I know we loaded some Wikidata into the latest DBpedia 2015-4 release, but I am not 100% sure we have all the datasets that are currently available.

[1] https://www.wikidata.org/wiki/Wikidata:Data_access#SPARQL_endpoints

-- Regards, Kingsley Idehen Founder & CEO OpenLink Software Company Web: http://www.openlinksw.com Personal Weblog 1: http://kidehen.blogspot.com Personal Weblog 2: http://www.openlinksw.com/blog/~kidehen Twitter Profile: https://twitter.com/kidehen Google+ Profile: https://plus.google.com/+KingsleyIdehen/about LinkedIn Profile: http://www.linkedin.com/in/kidehen Personal WebID: http://kingsley.idehen.net/dataspace/person/kidehen#this

Stas Malyshev

12 Sep 12 Sep

2 a.m.

New subject: Why do these two SPARQL queries take such different times to run?

Hi!

...

What's the URL of the dataset loaded into your Blazegraph DBMS? Ideally, we should have this data available from a variety of SPARQL Query Services [1].

The data is from https://dumps.wikimedia.org/wikidatawiki/entities/ plus streaming updates from Wikidata. You can see RDF format description at https://www.mediawiki.org/wiki/Wikibase/Indexing/RDF_Dump_Format

-- Stas Malyshev smalyshev@wikimedia.org

Stas Malyshev

1:30 a.m.

New subject: Why do these two SPARQL queries take such different times to run?

Hi!

...

...
Obviously the second query as written at the moment involves a sub-query, which inevitably must make it a bit slower -- but given the solution set of the sub-query only has two rows, and an exact date for a given property ought to be a fairly quick key to look up, why is the second query taking 100 times longer than the first ?

Yes, in theory it should be fast, so I suspect some kind of bug.

It looks like it is indeed a problem with query optimizer, and if you add:

hint:Query hint:optimizer "None" .

to the query, it becomes much better. In general, the optimizer for Blazegraph is not perfect, so if some (especially complex) query behaves weirdly, it may always be a good idea to try "no optimizer" hint and see if it makes the matter better. Sometimes it does.

-- Stas Malyshev smalyshev@wikimedia.org

3361

Age (days ago)

3364

Last active (days ago)

wikidata@lists.wikimedia.org

6 comments

4 participants

tags (0)

participants (4)

James Heald
Kingsley Idehen
Magnus Manske
Stas Malyshev