On Fri, Oct 19, 2018 at 3:40 PM Trey Jones tjones@wikimedia.org wrote:
Instead of "the capacity" I meant "this capacity", but should have said "this feature", referring to Elasticsearch integration—though the information on system capacity was still interesting.
Isn't that "capability" more than "capacity" (I'm trying to improve my English here). Though I knew that is sounded ambiguous!
On Fri, Oct 19, 2018 at 3:57 AM, Guillaume Lederrey glederrey@wikimedia.org wrote:
On Thu, Oct 18, 2018 at 4:48 PM Trey Jones tjones@wikimedia.org wrote:
Hi Everyone,
I'm at WikiConference NA today, and I was chatting with someone from OCLC, and he mentioned that BlazeGraph can be configured to call out to a full-text search engine. It looks like it only works with SOLR out of the box, but the documentation mentions that Elasticsearch is a candidate search endpoint.
Obviously it wouldn't be worth doing any real work on investigating this until the BlazeGraph/Amazon situation is clearer, and maybe Stas or others have looked at it in the past and already know why it isn't worth the added complexity, but there are some interesting use cases where combining full text and SPARQL would be useful—for example if you are looking for a person, you know part of their name, and some facts about them. In general, any full-text search with additional structured data constraints.
Anyone already know anything about the capacity of BlazeGraph?
It all depends on what you mean by "capacity" and by "blazegraph". If by capacity you mean do we have enough hardware, the answer is not entirely easy.
The cluster servicing the public wdqs endpoint (which probably means "blazegraph" in this context) has widely varying load patterns, is sometime overloaded and is overall difficult to size correctly (especially since we don't have a good definition of what a good SLO would be, see [1]).
The internal wdqs endpoint is in a much better situation, with a more controlled load and a reasonable amount of headroom. I don't have a good visibility on the projects that might start using this internal cluster more, so that headroom might be consumed fairly quickly depending of what load we add to the cluster.
Last point: I have no idea what that blazegraph / elasticsearch integration looks like, but it sounds like it might be possible to generate arbitrary elasticsearch queries from SPARQL. If that's the case, we don't want to expose such a functionality on the public wdqs endpoint, or at least not with our current production elasticsearch backend as the target. That being said, it sounds like a very interesting idea!
Have fun!
Guillaume
[1] https://phabricator.wikimedia.org/T199228
Thanks, —Trey
Trey Jones Sr. Software Engineer, Search Platform Wikimedia Foundation _______________________________________________ Discovery mailing list Discovery@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/discovery
-- Guillaume Lederrey Operations Engineer, Search Platform Wikimedia Foundation UTC+2 / CEST
Discovery mailing list Discovery@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/discovery
Discovery mailing list Discovery@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/discovery