A couple options:
1) We recently made it possible to load in the configuration of another
wiki and perform search queries as if you were on the other wiki. This is
used in the language detection code we are testing this month. The problem
with this is php has no async support. So you make 1 request to ES for
eswiki. Then 1 request for eswiktionary, and on and on. To this end i
ported the library we use for talking to elasticsearch to hacklang which
does support async, but it ended up being a ridiculously large patch[1]
which amounts to a fork we don't really want to maintain, even if 95% of
the patch is just adding type signatures to existing functions.
2) Continue along with the experiemental interwiki search on itwiki[2].
This has the exact same problems as above with making requests one after
the other, but doesn't know anything about the configuration of the other
search so has to dumb down the query that is made to the lowest possible
common denominator. This needs a *lot* of love in the UI department, and
has frankly very poor performance characteristics.
3) Refactor cirrussearch such that it can take advantage of elasticsearch
querying multiple indices during the same request. This is possible, but a
ton of work because the code wasn't written to support this. Additionally
we are back to the lowest possible common denominator, where any particular
configuration that makes that wiki unique is lost as elasticsearch is
running the exact same query just against multiple indices.
Everything above still fails to address our desire to include content other
than standard wiki pages such as natural language search via WDQS. So then
we also have:
4) Write a service in a language that naturally supports parallelism and
async (hacklang, node.js, go, etc.) that would be a frontend to multiple
backend search services (elasticsearch, wdqs, etc). This would strictly be
an api level service. Reduce the code in mediawiki to just calling that api
and formatting the output. With this type of thing updates to the search
indices would be ported over to use the EventBus MVP ottomata and gabriel
have been hashing out lately. For long term goals of integrating not only
standard wiki content but also maps, natural language search via WDQS and
such this is probably going to end up on the roadmap eventually.
In my head though, all of this is useless if we can't provide good ranking
of search results in the first place. Our current user satisfaction metric
estimates that around 15% of our users are happy with the search results
they are getting. Most of the time, IMO, this isn't because we don't have
the information already within the search indices of the particular wiki
but because we do a, frankly, pretty poor job of scoring results and
surfacing the best matching articles to the top.
TL/DR: Several quarters worth of quarterly goals
[1]
https://github.com/ebernhardson/Elastica/commit/e856616a4ea480f4a6aa1c07f5c…
[2]
https://it.wikipedia.org/w/index.php?title=Speciale%3ARicerca&profile=d…
On Tue, Nov 3, 2015 at 3:57 PM, Pine W <wiki.pine(a)gmail.com> wrote:
Hi Discovery folks,
I'd love to make it easier for readers to discover related materials
across projects and formats (Wikipedia, Wiktionary, Wikivoyage, Commons,
Wikisource, maps, weather, etc). Any ideas about how to make this happen?
Thanks,
Pine
_______________________________________________
discovery mailing list
discovery(a)lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/discovery