BlazeGraph + Elasticsearch?

List overview All Threads
Download

newer

older

Playing with SonarCloud

Discovery Weekly Update for the...

Trey Jones

18 Oct 2018 18 Oct '18

2:47 p.m.

Hi Everyone, I'm at WikiConference NA today, and I was chatting with someone from OCLC <https://en.wikipedia.org/wiki/OCLC>, and he mentioned that BlazeGraph can be configured to call out to a full-text search engine. It looks like it only works with SOLR out of the box, but the documentation <https://wiki.blazegraph.com/wiki/index.php/ExternalFullTextSearch> mentions that Elasticsearch is a candidate search endpoint. Obviously it wouldn't be worth doing any real work on investigating this until the BlazeGraph/Amazon situation is clearer, and maybe Stas or others have looked at it in the past and already know why it isn't worth the added complexity, but there are some interesting use cases where combining full text and SPARQL would be useful—for example if you are looking for a person, you know part of their name, and some facts about them. In general, any full-text search with additional structured data constraints. Anyone already know anything about the capacity of BlazeGraph? Thanks, —Trey Trey Jones Sr. Software Engineer, Search Platform Wikimedia Foundation

Attachments:

attachment.htm (text/html — 1.6 KB)

Show replies by date

Guillaume Lederrey

19 Oct 19 Oct

7:57 a.m.

On Thu, Oct 18, 2018 at 4:48 PM Trey Jones <tjones(a)wikimedia.org> wrote:

...

Hi Everyone, I'm at WikiConference NA today, and I was chatting with someone from OCLC, and he mentioned that BlazeGraph can be configured to call out to a full-text search engine. It looks like it only works with SOLR out of the box, but the documentation mentions that Elasticsearch is a candidate search endpoint. Obviously it wouldn't be worth doing any real work on investigating this until the BlazeGraph/Amazon situation is clearer, and maybe Stas or others have looked at it in the past and already know why it isn't worth the added complexity, but there are some interesting use cases where combining full text and SPARQL would be useful—for example if you are looking for a person, you know part of their name, and some facts about them. In general, any full-text search with additional structured data constraints. Anyone already know anything about the capacity of BlazeGraph?

It all depends on what you mean by "capacity" and by "blazegraph". If by capacity you mean do we have enough hardware, the answer is not entirely easy. The cluster servicing the public wdqs endpoint (which probably means "blazegraph" in this context) has widely varying load patterns, is sometime overloaded and is overall difficult to size correctly (especially since we don't have a good definition of what a good SLO would be, see [1]). The internal wdqs endpoint is in a much better situation, with a more controlled load and a reasonable amount of headroom. I don't have a good visibility on the projects that might start using this internal cluster more, so that headroom might be consumed fairly quickly depending of what load we add to the cluster. Last point: I have no idea what that blazegraph / elasticsearch integration looks like, but it sounds like it might be possible to generate arbitrary elasticsearch queries from SPARQL. If that's the case, we don't want to expose such a functionality on the public wdqs endpoint, or at least not with our current production elasticsearch backend as the target. That being said, it sounds like a very interesting idea! Have fun! Guillaume [1] https://phabricator.wikimedia.org/T199228

...

Thanks, —Trey Trey Jones Sr. Software Engineer, Search Platform Wikimedia Foundation _______________________________________________ Discovery mailing list Discovery(a)lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/discovery

-- Guillaume Lederrey Operations Engineer, Search Platform Wikimedia Foundation UTC+2 / CEST

Trey Jones

1:39 p.m.

Instead of "the capacity" I meant "this capacity", but should have said "this feature", referring to Elasticsearch integration—though the information on system capacity was still interesting. On Fri, Oct 19, 2018 at 3:57 AM, Guillaume Lederrey <glederrey(a)wikimedia.org

...

wrote:

> On Thu, Oct 18, 2018 at 4:48 PM Trey Jones <tjones(a)wikimedia.org

...

wrote:

> > > > Hi Everyone, > > > > I'm at WikiConference NA today, and I was chatting with someone from > OCLC, and he mentioned that BlazeGraph can be configured to call out to a > full-text search engine. It looks like it only works with SOLR out of the > box, but the documentation mentions that Elasticsearch is a candidate > search endpoint. > > > > Obviously it wouldn't be worth doing any real work on investigating this > until the BlazeGraph/Amazon situation is clearer, and maybe Stas or others > have looked at it in the past and already know why it isn't worth the added > complexity, but there are some interesting use cases where combining full > text and SPARQL would be useful—for example if you are looking for a > person, you know part of their name, and some facts about them. In general, > any full-text search with additional structured data constraints. > > > > Anyone already know anything about the capacity of BlazeGraph? > > It all depends on what you mean by "capacity" and by "blazegraph". If > by capacity you mean do we have enough hardware, the answer is not > entirely easy. > > The cluster servicing the public wdqs endpoint (which probably means > "blazegraph" in this context) has widely varying load patterns, is > sometime overloaded and is overall difficult to size correctly > (especially since we don't have a good definition of what a good SLO > would be, see [1]). > > The internal wdqs endpoint is in a much better situation, with a more > controlled load and a reasonable amount of headroom. I don't have a > good visibility on the projects that might start using this internal > cluster more, so that headroom might be consumed fairly quickly > depending of what load we add to the cluster. > > Last point: I have no idea what that blazegraph / elasticsearch > integration looks like, but it sounds like it might be possible to > generate arbitrary elasticsearch queries from SPARQL. If that's the > case, we don't want to expose such a functionality on the public wdqs > endpoint, or at least not with our current production elasticsearch > backend as the target. That being said, it sounds like a very > interesting idea! > > Have fun! > > Guillaume > > > [1] https://phabricator.wikimedia.org/T199228 > > > Thanks, > > —Trey > > > > Trey Jones > > Sr. Software Engineer, Search Platform > > Wikimedia Foundation > > _______________________________________________ > > Discovery mailing list > > Discovery(a)lists.wikimedia.org > > https://lists.wikimedia.org/mailman/listinfo/discovery > > > > -- > Guillaume Lederrey > Operations Engineer, Search Platform > Wikimedia Foundation > UTC+2 / CEST > > _______________________________________________ > Discovery mailing list > Discovery(a)lists.wikimedia.org > https://lists.wikimedia.org/mailman/listinfo/discovery >

Guillaume Lederrey

3:25 p.m.

On Fri, Oct 19, 2018 at 3:40 PM Trey Jones <tjones(a)wikimedia.org> wrote:

...

Instead of "the capacity" I meant "this capacity", but should have said "this feature", referring to Elasticsearch integration—though the information on system capacity was still interesting.

Isn't that "capability" more than "capacity" (I'm trying to improve my English here). Though I knew that is sounded ambiguous!

...

On Fri, Oct 19, 2018 at 3:57 AM, Guillaume Lederrey <glederrey(a)wikimedia.org> wrote:

On Thu, Oct 18, 2018 at 4:48 PM Trey Jones <tjones(a)wikimedia.org> wrote:

-- Guillaume Lederrey Operations Engineer, Search Platform Wikimedia Foundation UTC+2 / CEST _______________________________________________ Discovery mailing list Discovery(a)lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/discovery

_______________________________________________ Discovery mailing list Discovery(a)lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/discovery

-- Guillaume Lederrey Operations Engineer, Search Platform Wikimedia Foundation UTC+2 / CEST

Trey Jones

5:01 p.m.

On Fri, Oct 19, 2018 at 11:25 AM, Guillaume Lederrey < glederrey(a)wikimedia.org> wrote:

...

On Fri, Oct 19, 2018 at 3:40 PM Trey Jones <tjones(a)wikimedia.org> wrote: > > > Anyone already know anything about the capacity of BlazeGraph?

...

Instead of "the capacity" I meant "this capacity", but should have said

"this feature", referring to Elasticsearch integration—though the information on system capacity was still interesting. Isn't that "capability" more than "capacity" (I'm trying to improve my English here). Though I knew that is sounded ambiguous!

English isn't content with having too many words, it also has to give many of them too many meanings, especially related shades of meanings that have to be inferred from context and/or reading the mind of the speaker. So, "capacity" can also mean "capability" or "role", and I think I was going for something of a blend of those two—so it was both the perfect word and a poor choice. ;)

David Causse

3:08 p.m.

Hi, I remember Stas playing with it a bit, see https://phabricator.wikimedia.org/T141813 On Thu, Oct 18, 2018 at 4:48 PM Trey Jones <tjones(a)wikimedia.org> wrote:

...

Stas Malyshev

5:18 p.m.

Hi!

...

I'm at WikiConference NA today, and I was chatting with someone from OCLC <https://en.wikipedia.org/wiki/OCLC>, and he mentioned that BlazeGraph can be configured to call out to a full-text search engine. It looks like it only works with SOLR out of the box, but the documentation <https://wiki.blazegraph.com/wiki/index.php/ExternalFullTextSearch> mentions that Elasticsearch is a candidate search endpoint.

Technically it is possible, and I looked into it, but given that we have a gateway to Mediawiki API (which can do the same search, essentially) I decided not to pursue this for now. We'd have basically to duplicate the work we've done in Mediawiki to compose proper Elastic queries, parse results, etc. and the best we'd have is the same thing we already have with Mediawiki API search. So I decided not to duplicate efforts for now. -- Stas Malyshev smalyshev(a)wikimedia.org

Trey Jones

6:02 p.m.

...

the best we'd have is the same thing we already have with Mediawiki API search.

Ah, so there isn't a way to combine full-text results and SPAQRL results? That was the point of my original discussion with the fellow from OCLC, so if that's not possible, then, yeah, there's no point. On Fri, Oct 19, 2018 at 1:18 PM, Stas Malyshev <smalyshev(a)wikimedia.org> wrote:

...

Hi!

mentions

that Elasticsearch is a candidate search endpoint.

Stas Malyshev

6:49 p.m.

Hi! On 10/19/18 11:02 AM, Trey Jones wrote:

...

the best we'd have is the same thing we already have with Mediawiki API search. Ah, so there isn't a way to combine full-text results and SPAQRL results? That was the point of my original discussion with the fellow from OCLC, so if that's not possible, then, yeah, there's no point.

Ah yes, you can combine! Just call Mediawiki API from inside SPRARQL query and combine with other clauses: https://www.mediawiki.org/wiki/Wikidata_Query_Service/User_Manual/MWAPI -- Stas Malyshev smalyshev(a)wikimedia.org

Kunal Mehta

25 Oct 25 Oct

3:45 a.m.

Hi Trey, On 10/18/18 7:47 AM, Trey Jones wrote:

...

Obviously it wouldn't be worth doing any real work on investigating this until the BlazeGraph/Amazon situation is clearer...

I might have missed something, but what is this situation? -- Legoktm

Guillaume Lederrey

2:24 p.m.

On Thu, Oct 25, 2018 at 5:45 AM Kunal Mehta <legoktm(a)member.fsf.org> wrote:

...

Hi Trey, On 10/18/18 7:47 AM, Trey Jones wrote:

Obviously it wouldn't be worth doing any real work on investigating this until the BlazeGraph/Amazon situation is clearer...

I might have missed something, but what is this situation?

Amazon has acquired Blazegraph. It looks like they don't want to kill it, and the team itself is willing to continue to support Blazegraph. That being said, there has not been much activity in the last 2 years on their github repo [1]. [1] https://github.com/blazegraph/database

...

-- Legoktm _______________________________________________ Discovery mailing list Discovery(a)lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/discovery

-- Guillaume Lederrey Operations Engineer, Search Platform Wikimedia Foundation UTC+2 / CEST

Erika Bjune

3:59 p.m.

Kunal, see also https://phabricator.wikimedia.org/T206560 ------------------------------------ Erika Bjune Director of Engineering - Search Platform & Fundraising Tech Wikimedia Foundation On Thu, Oct 25, 2018 at 7:25 AM Guillaume Lederrey <glederrey(a)wikimedia.org> wrote:

...

On Thu, Oct 25, 2018 at 5:45 AM Kunal Mehta <legoktm(a)member.fsf.org> wrote:

Hi Trey, On 10/18/18 7:47 AM, Trey Jones wrote: > Obviously it wouldn't be worth doing any real work on investigating

this

until the BlazeGraph/Amazon situation is clearer...

I might have missed something, but what is this situation?

-- Legoktm _______________________________________________ Discovery mailing list Discovery(a)lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/discovery

2010

days inactive

2017

days old

discovery@lists.wikimedia.org

Manage subscription

11 comments

6 participants

tags (0)

participants (6)

David Causse
Erika Bjune
Guillaume Lederrey
Kunal Mehta
Stas Malyshev
Trey Jones