David,

Thank you so much.
This is very helpful and I've improved the wiki docs in a few places with this new information.



On Fri, Aug 7, 2020 at 12:31 PM David Causse <dcausse@wikimedia.org> wrote:
Some answers inline,

On Fri, Aug 7, 2020 at 6:07 PM Thad Guidry <thadguidry@gmail.com> wrote:
Very nice David!

1. Does the MINUS actually utilize ElasticSearch indexes or just Blazegraph?


No, elasticsearch is being used only during the call to the wikibase:mwapi SERVICE. Everything happening outside this call is handled by blazegraph.
 
I'd like to help the community by writing up a bit better documentation on our SPARQL pages that talks about FILTER() versus MINUS() if no one has this info floating around?
The only footnote I saw was:
" MINUS lets you select results that don’t fit some graph pattern. FILTER NOT EXISTS is mostly equivalent (see the SPARQL spec for an example where they differ), but – at least on WDQS – usually slower by quite a bit."
at the bottom of the SPARQL tutorial

and the wiki page SPARQL query service has:

Excluding subsets

SPARQL has three different idioms for excluding subsets:

  • OPTIONAL { ... ?x ... } FILTER(!bound(?x)),
  • FILTER NOT EXISTS { ... }
  • MINUS { ... }

Currently, in almost all circumstances, Blazegraph resolves all of these to the same query plan.


2. Is that still a true statement that those 3 above use the same query plan currently?

I think they indeed serve the same purpose but might vary in subtle ways, for MINUS vs FILTER NOT EXISTS the sparql specs states that they can produce different solutions.
As to which approach is better I can't answer clearly, I tend to prefer MINUS as I find it easier to read/understand. I also tend to avoid plain FILTER(constraint on ?x) when possible as they tend to be rather slow (here the FILTER(!bound(?x)) should be pretty fast though).

David.
_______________________________________________
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata