On Thu, Oct 18, 2018 at 4:48 PM Trey Jones <tjones(a)wikimedia.org> wrote:
Hi Everyone,
I'm at WikiConference NA today, and I was chatting with someone from OCLC, and he
mentioned that BlazeGraph can be configured to call out to a full-text search engine. It
looks like it only works with SOLR out of the box, but the documentation mentions that
Elasticsearch is a candidate search endpoint.
Obviously it wouldn't be worth doing any real work on investigating this until the
BlazeGraph/Amazon situation is clearer, and maybe Stas or others have looked at it in the
past and already know why it isn't worth the added complexity, but there are some
interesting use cases where combining full text and SPARQL would be useful—for example if
you are looking for a person, you know part of their name, and some facts about them. In
general, any full-text search with additional structured data constraints.
Anyone already know anything about the capacity of BlazeGraph?
It all depends on what you mean by "capacity" and by "blazegraph". If
by capacity you mean do we have enough hardware, the answer is not
entirely easy.
The cluster servicing the public wdqs endpoint (which probably means
"blazegraph" in this context) has widely varying load patterns, is
sometime overloaded and is overall difficult to size correctly
(especially since we don't have a good definition of what a good SLO
would be, see [1]).
The internal wdqs endpoint is in a much better situation, with a more
controlled load and a reasonable amount of headroom. I don't have a
good visibility on the projects that might start using this internal
cluster more, so that headroom might be consumed fairly quickly
depending of what load we add to the cluster.
Last point: I have no idea what that blazegraph / elasticsearch
integration looks like, but it sounds like it might be possible to
generate arbitrary elasticsearch queries from SPARQL. If that's the
case, we don't want to expose such a functionality on the public wdqs
endpoint, or at least not with our current production elasticsearch
backend as the target. That being said, it sounds like a very
interesting idea!
Have fun!
Guillaume
[1]
https://phabricator.wikimedia.org/T199228
Thanks,
—Trey
Trey Jones
Sr. Software Engineer, Search Platform
Wikimedia Foundation
_______________________________________________
Discovery mailing list
Discovery(a)lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/discovery
--
Guillaume Lederrey
Operations Engineer, Search Platform
Wikimedia Foundation
UTC+2 / CEST