Hi,
There is a performance issue with the labelling service. Using labels makes even simple queries time out. For example this one:
SELECT $p $pLabel WHERE { $p wdt:P31 _:bnode . SERVICE wikibase:label { bd:serviceParam wikibase:language "en" . } } LIMIT 11
The workaround is to use subqueries. For example, the following query returns immediately:
SELECT $p $pLabel WHERE { { SELECT $p WHERE { $p wdt:P31 _:bnode . } LIMIT 11 } SERVICE wikibase:label { bd:serviceParam wikibase:language "en" . } }
I strongly suppose that almost every use of the labelling service could be performed like this (the only exception is when you apply further query conditions on the label). BlazeGraph should recognize this.
Meanwhile, everybody who uses queries with labels in an application should rewrite them as above to get the best performance (and reduce load on the query service ;-).
Cheers,
Markus
Hi!
There is a performance issue with the labelling service. Using labels makes even simple queries time out. For example this one:
SELECT $p $pLabel WHERE { $p wdt:P31 _:bnode . SERVICE wikibase:label { bd:serviceParam wikibase:language "en" . } } LIMIT 11
I suspect the issue here can be that it tries to calculate the full set of values before applying service. Which may make sense if the service is external, but if it is internal and result set is huge it obviously is not working.
Other alternative can be, since you are just looking for English labels, to use direct query approach:
SELECT $p $pLabel WHERE { $p wdt:P31 _:bnode . OPTIONAL { $p rdfs:label $pLabel . FILTER(lang($pLabel) = "en") } } LIMIT 11
This seems to work just fine. You lose a bit of added value on the service (nicer no-label labels) but you gain a lot of speed.
In any case, I'll raise this issue with Blazegraph and it also may be worth to submit Phabricator issue about it.
Hi Stas,
Thanks for the really quick reply. I agree with your analysis: it seems the service is implemented as a blocking operator, whereas it could really be streaming for local services (and maybe even for remote ones).
My version with the subquery seems really fast now, but I did not do any profiling. I would have thought that the service is generally faster than the OPTIONAL-FILTER-LANG combination. Would be interesting to know which one is better (I will use it in /many/ queries).
Regards,
Markus
On 06.03.2016 22:46, Stas Malyshev wrote:
Hi!
There is a performance issue with the labelling service. Using labels makes even simple queries time out. For example this one:
SELECT $p $pLabel WHERE { $p wdt:P31 _:bnode . SERVICE wikibase:label { bd:serviceParam wikibase:language "en" . } } LIMIT 11
I suspect the issue here can be that it tries to calculate the full set of values before applying service. Which may make sense if the service is external, but if it is internal and result set is huge it obviously is not working.
Other alternative can be, since you are just looking for English labels, to use direct query approach:
SELECT $p $pLabel WHERE { $p wdt:P31 _:bnode . OPTIONAL { $p rdfs:label $pLabel . FILTER(lang($pLabel) = "en") } } LIMIT 11
This seems to work just fine. You lose a bit of added value on the service (nicer no-label labels) but you gain a lot of speed.
In any case, I'll raise this issue with Blazegraph and it also may be worth to submit Phabricator issue about it.