Thank you so much David!
This was such a great example that I had to add this to our SPARQL Examples page in a new section "Mediawiki API": *https://www.wikidata.org/wiki/Wikidata:SPARQL_query_service/queries/examples... https://www.wikidata.org/wiki/Wikidata:SPARQL_query_service/queries/examples#Mediawiki_API*
The community thanks you sincerely!
Thad https://www.linkedin.com/in/thadguidry/
On Mon, Jul 13, 2020 at 2:26 AM David Causse dcausse@wikimedia.org wrote:
On Sat, Jul 11, 2020 at 7:12 PM Thad Guidry thadguidry@gmail.com wrote:
This query times out:
SELECT ?item ?label WHERE { ?item wdt:P31 ?instance ; rdfs:label ?label ; rdfs:label ?enLabel . FILTER(CONTAINS(lcase(?label), "Soriano")). FILTER(?instance != wd:Q5). SERVICE wikibase:label {bd:serviceParam wikibase:language "en".} } LIMIT 100
I have this feeling that it's not actually using an index or even asking the right question and so is slow and times out?
Indeed, none of the criteria in your query allows the triple store to determine an index to follow to extract the results in a timely manner. The sole non negative criterion would be FILTER(CONTAINS(lcase(?label), "Soriano")) but being in a FILTER and moreover a function it cannot be used to determine an index to work on. The only way to speed-up your query would be to introduce a discriminant "matching" criterion.
However the MediaWiki wbsearchentities API does seem to use an index and
is performant for label searching:
https://www.wikidata.org/w/api.php?action=wbsearchentities&search=sorian...
wbsearchentitiies is backed by elasticsearch which is optimized for such lookups.
How can I get my SPARQL query to be more performant or asking the right
question?
Unfortunate I don't see an obvious way to adapt your sparql query and keep exactly the same semantic but to illustrate the problem:
SELECT ?item ?label WHERE { ?item wdt:P31 ?instance ; rdfs:label "Soriano"@en . FILTER(?instance != wd:Q5). } LIMIT 100
will return results in a timely manner, only because we helped the graph traversal with an initial path on ?item rdfs:label "Soriano"@en.
But by combining the query service and the wikidata API[0] baked by elasticsearch I think you can extract what you want:
SELECT ?item ?itemLabel WHERE { ?item wdt:P31 ?instance . FILTER(?instance != wd:Q5). SERVICE wikibase:mwapi { bd:serviceParam wikibase:endpoint "www.wikidata.org"; wikibase:api "EntitySearch"; mwapi:search "soriano"; mwapi:language "en". ?item wikibase:apiOutputItem mwapi:item. } SERVICE wikibase:label {bd:serviceParam wikibase:language "en".} } LIMIT 100
This query will first contact EntitySearch (an alias to wbsearchentities) which will pass the items it found to the triple store which in turn can now query the graph in a timely manner. Obviously this solution only works if the number of items returned by wbsearchentities remains reasonable.
0: https://www.mediawiki.org/wiki/Wikidata_Query_Service/User_Manual/MWAPI
David C. _______________________________________________ Wikidata mailing list Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata