Some answers inline,
On Fri, Aug 7, 2020 at 6:07 PM Thad Guidry thadguidry@gmail.com wrote:
Very nice David!
- Does the MINUS actually utilize ElasticSearch indexes or just
Blazegraph?
No, elasticsearch is being used only during the call to the wikibase:mwapi SERVICE. Everything happening outside this call is handled by blazegraph.
I'd like to help the community by writing up a bit better documentation on our SPARQL pages that talks about FILTER() versus MINUS() if no one has this info floating around? The only footnote I saw was: " MINUS lets you select results that *don’t* fit some graph pattern. FILTER NOT EXISTS is mostly equivalent (see the SPARQL spec for an example where they differ), but – at least on WDQS – usually slower by quite a bit." at the bottom of the SPARQL tutorial
https://www.wikidata.org/wiki/Wikidata:SPARQL_tutorial and the wiki page SPARQL query service https://www.wikidata.org/wiki/Wikidata:SPARQL_query_service/queries#Excluding_subsets has:
Excluding subsets
SPARQL has three different idioms for excluding subsets:
- OPTIONAL { ... ?x ... } FILTER(!bound(?x)),
- FILTER NOT EXISTS { ... }
- MINUS { ... }
Currently, in almost all circumstances, Blazegraph resolves all of these to the same query plan.
- Is that still a true statement that those 3 above use the same query
plan currently?
I think they indeed serve the same purpose but might vary in subtle ways, for MINUS vs FILTER NOT EXISTS the sparql specs states that they can produce different solutions https://www.w3.org/TR/sparql11-query/#neg-notexists-minus. As to which approach is better I can't answer clearly, I tend to prefer MINUS as I find it easier to read/understand. I also tend to avoid plain FILTER(constraint on ?x) when possible as they tend to be rather slow (here the FILTER(!bound(?x)) should be pretty fast though).
David.