Re: [Wikidata] Differences in label searching with SPARQL and MediaWiki API

18 Jul 2020


      Thank you so much David!
This was such a great example that I had to add this to our SPARQL Examples
page in a new section "Mediawiki API":
*https://www.wikidata.org/wiki/Wikidata:SPARQL_query_service/queries/examples...
https://www.wikidata.org/wiki/Wikidata:SPARQL_query_service/queries/examples#Mediawiki_API*
The community thanks you sincerely!
Thad
https://www.linkedin.com/in/thadguidry/
On Mon, Jul 13, 2020 at 2:26 AM David Causse dcausse@wikimedia.org wrote:
...
On Sat, Jul 11, 2020 at 7:12 PM Thad Guidry thadguidry@gmail.com wrote:
...
This query times out:
SELECT ?item ?label
WHERE
{
  ?item wdt:P31 ?instance ;
    rdfs:label ?label ;
    rdfs:label ?enLabel .
  FILTER(CONTAINS(lcase(?label), "Soriano")).
  FILTER(?instance != wd:Q5).
  SERVICE wikibase:label {bd:serviceParam wikibase:language "en".}
}
LIMIT 100
I have this feeling that it's not actually using an index or even asking
the right question and so is slow and times out?
Indeed, none of the criteria in your query allows the triple store to
determine an index to follow to extract the results in a timely manner.
The sole non negative criterion would be FILTER(CONTAINS(lcase(?label),
"Soriano")) but being in a FILTER and moreover a function it cannot be used
to determine an index to work on.
The only way to speed-up your query would be to introduce a discriminant
"matching" criterion.
However the MediaWiki wbsearchentities API does seem to use an index and
...
is performant for label searching:
https://www.wikidata.org/w/api.php?action=wbsearchentities&search=sorian...
wbsearchentitiies is backed by elasticsearch which is optimized for such
lookups.
How can I get my SPARQL query to be more performant or asking the right
...
question?
Unfortunate I don't see an obvious way to adapt your sparql query and keep
exactly the same semantic but to illustrate the problem:
SELECT ?item ?label WHERE {
  ?item wdt:P31 ?instance ;
        rdfs:label "Soriano"@en .
  FILTER(?instance != wd:Q5).
}
LIMIT 100
will return results in a timely manner, only because we helped the graph
traversal with an initial path on ?item rdfs:label "Soriano"@en.
But by combining the query service and the wikidata API[0] baked by
elasticsearch I think you can extract what you want:
SELECT ?item ?itemLabel WHERE {
  ?item wdt:P31 ?instance .
  FILTER(?instance != wd:Q5).
  SERVICE wikibase:mwapi {
      bd:serviceParam wikibase:endpoint "www.wikidata.org";
        wikibase:api "EntitySearch";
        mwapi:search "soriano";
        mwapi:language "en".
      ?item wikibase:apiOutputItem mwapi:item.
  }
  SERVICE wikibase:label {bd:serviceParam wikibase:language "en".}
}
LIMIT 100
This query will first contact EntitySearch (an alias to wbsearchentities)
which will pass the items it found to the triple store which in turn can
now query the graph in a timely manner. Obviously this solution only works
if the number of items returned by wbsearchentities remains reasonable.
0: https://www.mediawiki.org/wiki/Wikidata_Query_Service/User_Manual/MWAPI
David C.
_______________________________________________
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

Re: [Wikidata] Differences in label searching with SPARQL and MediaWiki API

0: https://www.mediawiki.org/wiki/Wikidata_Query_Service/User_Manual/MWAPI