Instead of "the capacity" I meant "this capacity", but should have
said
"this feature", referring to Elasticsearch integration—though the
information on system capacity was still interesting.
On Fri, Oct 19, 2018 at 3:57 AM, Guillaume Lederrey <glederrey(a)wikimedia.org
wrote:
> On Thu, Oct 18, 2018 at 4:48 PM Trey Jones <tjones(a)wikimedia.org
wrote:
> >
> > Hi Everyone,
> >
> > I'm at WikiConference NA today, and I was chatting with someone from
> OCLC, and he mentioned that BlazeGraph can be configured to call out to a
> full-text search engine. It looks like it only works with SOLR out of the
> box, but the documentation mentions that Elasticsearch is a candidate
> search endpoint.
> >
> > Obviously it wouldn't be worth doing any real work on investigating this
> until the BlazeGraph/Amazon situation is clearer, and maybe Stas or others
> have looked at it in the past and already know why it isn't worth the added
> complexity, but there are some interesting use cases where combining full
> text and SPARQL would be useful—for example if you are looking for a
> person, you know part of their name, and some facts about them. In general,
> any full-text search with additional structured data constraints.
> >
> > Anyone already know anything about the capacity of BlazeGraph?
>
> It all depends on what you mean by "capacity" and by
"blazegraph". If
> by capacity you mean do we have enough hardware, the answer is not
> entirely easy.
>
> The cluster servicing the public wdqs endpoint (which probably means
> "blazegraph" in this context) has widely varying load patterns, is
> sometime overloaded and is overall difficult to size correctly
> (especially since we don't have a good definition of what a good SLO
> would be, see [1]).
>
> The internal wdqs endpoint is in a much better situation, with a more
> controlled load and a reasonable amount of headroom. I don't have a
> good visibility on the projects that might start using this internal
> cluster more, so that headroom might be consumed fairly quickly
> depending of what load we add to the cluster.
>
> Last point: I have no idea what that blazegraph / elasticsearch
> integration looks like, but it sounds like it might be possible to
> generate arbitrary elasticsearch queries from SPARQL. If that's the
> case, we don't want to expose such a functionality on the public wdqs
> endpoint, or at least not with our current production elasticsearch
> backend as the target. That being said, it sounds like a very
> interesting idea!
>
> Have fun!
>
> Guillaume
>
>
> [1]
https://phabricator.wikimedia.org/T199228
>
> > Thanks,
> > —Trey
> >
> > Trey Jones
> > Sr. Software Engineer, Search Platform
> > Wikimedia Foundation
> > _______________________________________________
> > Discovery mailing list
> > Discovery(a)lists.wikimedia.org
> >
https://lists.wikimedia.org/mailman/listinfo/discovery
>
>
>
> --
> Guillaume Lederrey
> Operations Engineer, Search Platform
> Wikimedia Foundation
> UTC+2 / CEST
>
> _______________________________________________
> Discovery mailing list
> Discovery(a)lists.wikimedia.org
>
https://lists.wikimedia.org/mailman/listinfo/discovery
>