Some answers inline,
On Fri, Aug 7, 2020 at 6:07 PM Thad Guidry <thadguidry(a)gmail.com> wrote:
Very nice David!
1. Does the MINUS actually utilize ElasticSearch indexes or just
Blazegraph?
No, elasticsearch is being used only during the call to the wikibase:mwapi
SERVICE. Everything happening outside this call is handled by blazegraph.
I'd like to help the community by writing up a bit
better documentation on
our SPARQL pages that talks about FILTER() versus MINUS() if no one has
this info floating around?
The only footnote I saw was:
" MINUS lets you select results that *don’t* fit some graph pattern. FILTER
NOT EXISTS is mostly equivalent (see the SPARQL spec for an example where
they differ), but – at least on WDQS – usually slower by quite a bit."
at the bottom of the SPARQL tutorial
<https://www.wikidata.org/wiki/Wikidata:SPARQL_tutorial>
and the wiki page SPARQL query service
<https://www.wikidata.org/wiki/Wikidata:SPARQL_query_service/queries#Excluding_subsets>
has:
Excluding subsets
SPARQL has three different idioms for excluding subsets:
- OPTIONAL { ... ?x ... } FILTER(!bound(?x)),
- FILTER NOT EXISTS { ... }
- MINUS { ... }
Currently, in almost all circumstances, Blazegraph resolves all of these
to the same query plan.
2. Is that still a true statement that those 3 above use the same query
plan currently?
I think they indeed serve the same purpose but might vary in subtle ways,
for MINUS vs FILTER NOT EXISTS the sparql specs states that they can
produce different solutions
<https://www.w3.org/TR/sparql11-query/#neg-notexists-minus>.
As to which approach is better I can't answer clearly, I tend to prefer
MINUS as I find it easier to read/understand. I also tend to avoid plain
FILTER(constraint on ?x) when possible as they tend to be rather slow (here
the FILTER(!bound(?x)) should be pretty fast though).
David.