Re: [Wikimedia-search] Asynchronously calling elasticsearch

21 Sep 2015


      That's very cool! Have you stress-tested it at all? Like, what happens if
you search 10 wikipedias at once? (Because you know I want to search 10
wikis at once. <g>)
Trey Jones
Software Engineer, Discovery
Wikimedia Foundation
On Mon, Sep 21, 2015 at 11:15 AM, Erik Bernhardson <
ebernhardson@wikimedia.org> wrote:
...
Just to follow up here, i've updated the `async` branch of my Elastica
fork, it now completely passes the test suite so might be ready for further
CirrusSearch testing.
On Wed, Sep 9, 2015 at 12:23 PM, Erik Bernhardson <
ebernhardson@wikimedia.org> wrote:
...
This would allow them to be run in parallel, yes. Being in separate
databases means extra last-minute checks for existence or security (CYA for
if deletes get missed) are skipped as interwiki links are generated, but
overall not a big deal and an expected loss as part of the interwiki search.
On Wed, Sep 9, 2015 at 10:11 AM, Kevin Smith ksmith@wikimedia.org
wrote:
...
Would this help if we wanted to simultaneously search multiple wikis, or
are those in separate databases so it would have no effect?
Kevin Smith
Agile Coach, Wikimedia Foundation
On Wed, Sep 9, 2015 at 5:18 AM, David Causse dcausse@wikimedia.org
wrote:
...
Thanks Erik!
This is very promising and it opens a lot of new possibilities.
Guessing the gain is pretty hard but I think we run many small requests
where network overhead is quite high compared to the actual work done by
elastic. This would definitely help.
Le 08/09/2015 21:01, Erik Bernhardson a écrit :
The php engine used in prod by the wmf, hhvm, has built in support for
shared (non-preemptive) concurrency via async/await keywords[1][2]. Over
the weekend i spent some time converting the Elastica client library we use
to work asynchronously, which would essentially let us continue on
performing other calculations in the web request while network requests are
processing. I've only ported over the client library[3], not the
CirrusSearch code. Also this is not a complete port, there are a couple
code paths that work but most of the test suite still fails.
The most obvious place we could see a benefit from this is when
multiple queries are issued to elasticsearch from a single web request. If
the second query doesn't depend on the results of the first it can be
issued in parallel. This is actually somewhat common use case, for example
doing a full text and a title search in the same request. I'm wary of
making much of a guess in terms of actual latency reduction we could
expect, but maybe on the order of 50 to 100 ms in cases which we currently
perform requests serially and we have enough work to process. Really its
hard to say at this point.
In addition to making some existing code faster, having the ability to
do multiple network operations in an async manner opens up other
possibilities when we are implementing things in the future.  In closing,
this currently isn't going anywhere it was just something interesting to
toy with.  I think it could be quite interesting to investigate further.
[1] http://docs.hhvm.com/manual/en/hack.async.php
[2] https://phabricator.wikimedia.org/T99755
[2] https://github.com/ebernhardson/Elastica/tree/async

Wikimedia-search mailing listWikimedia-search@lists.wikimedia.orghttps://lists.wikimedia.org/mailman/listinfo/wikimedia-search

Wikimedia-search mailing list
Wikimedia-search@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikimedia-search

Wikimedia-search mailing list
Wikimedia-search@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikimedia-search

Wikimedia-search mailing list
Wikimedia-search@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikimedia-search

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

Re: [Wikimedia-search] Asynchronously calling elasticsearch