Re: [discovery] BlazeGraph + Elasticsearch?

19 Oct 2018


      On Fri, Oct 19, 2018 at 3:40 PM Trey Jones tjones@wikimedia.org wrote:
...
Instead of "the capacity" I meant "this capacity", but should have said "this feature", referring to Elasticsearch integration—though the information on system capacity was still interesting.
Isn't that "capability" more than "capacity" (I'm trying to improve my
English here). Though I knew that is sounded ambiguous!
...
On Fri, Oct 19, 2018 at 3:57 AM, Guillaume Lederrey glederrey@wikimedia.org wrote:
...
On Thu, Oct 18, 2018 at 4:48 PM Trey Jones tjones@wikimedia.org wrote:
...
Hi Everyone,
I'm at WikiConference NA today, and I was chatting with someone from OCLC, and he mentioned that BlazeGraph can be configured to call out to a full-text search engine. It looks like it only works with SOLR out of the box, but the documentation mentions that Elasticsearch is a candidate search endpoint.
Obviously it wouldn't be worth doing any real work on investigating this until the BlazeGraph/Amazon situation is clearer, and maybe Stas or others have looked at it in the past and already know why it isn't worth the added complexity, but there are some interesting use cases where combining full text and SPARQL would be useful—for example if you are looking for a person, you know part of their name, and some facts about them. In general, any full-text search with additional structured data constraints.
Anyone already know anything about the capacity of BlazeGraph?
It all depends on what you mean by "capacity" and by "blazegraph". If
by capacity you mean do we have enough hardware, the answer is not
entirely easy.
The cluster servicing the public wdqs endpoint (which probably means
"blazegraph" in this context) has widely varying load patterns, is
sometime overloaded and is overall difficult to size correctly
(especially since we don't have a good definition of what a good SLO
would be, see  [1]).
The internal wdqs endpoint is in a much better situation, with a more
controlled load and a reasonable amount of headroom. I don't have a
good visibility on the projects that might start using this internal
cluster more, so that headroom might be consumed fairly quickly
depending of what load we add to the cluster.
Last point: I have no idea what that blazegraph / elasticsearch
integration looks like, but it sounds like it might be possible to
generate arbitrary elasticsearch queries from SPARQL. If that's the
case, we don't want to expose such a functionality on the public wdqs
endpoint, or at least not with our current production elasticsearch
backend as the target. That being said, it sounds like a very
interesting idea!
Have fun!
Guillaume
[1] https://phabricator.wikimedia.org/T199228
...
Thanks,
—Trey
Trey Jones
Sr. Software Engineer, Search Platform
Wikimedia Foundation
_______________________________________________
Discovery mailing list
Discovery@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/discovery
--
Guillaume Lederrey
Operations Engineer, Search Platform
Wikimedia Foundation
UTC+2 / CEST

Discovery mailing list
Discovery@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/discovery

Discovery mailing list
Discovery@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/discovery
-- 
Guillaume Lederrey
Operations Engineer, Search Platform
Wikimedia Foundation
UTC+2 / CEST

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

Re: [discovery] BlazeGraph + Elasticsearch?