A couple options:

1) We recently made it possible to load in the configuration of another wiki and perform search queries as if you were on the other wiki. This is used in the language detection code we are testing this month. The problem with this is php has no async support. So you make 1 request to ES for eswiki. Then 1 request for eswiktionary, and on and on. To this end i ported the library we use for talking to elasticsearch to hacklang which does support async, but it ended up being a ridiculously large patch[1] which amounts to a fork we don't really want to maintain, even if 95% of the patch is just adding type signatures to existing functions.

2) Continue along with the experiemental interwiki search on itwiki[2]. This has the exact same problems as above with making requests one after the other, but doesn't know anything about the configuration of the other search so has to dumb down the query that is made to the lowest possible common denominator. This needs a *lot* of love in the UI department, and has frankly very poor performance characteristics.

3) Refactor cirrussearch such that it can take advantage of elasticsearch querying multiple indices during the same request. This is possible, but a ton of work because the code wasn't written to support this. Additionally we are back to the lowest possible common denominator, where any particular configuration that makes that wiki unique is lost as elasticsearch is running the exact same query just against multiple indices.

Everything above still fails to address our desire to include content other than standard wiki pages such as natural language search via WDQS. So then we also have:

4) Write a service in a language that naturally supports parallelism and async (hacklang, node.js, go, etc.) that would be a frontend to multiple backend search services (elasticsearch, wdqs, etc). This would strictly be an api level service. Reduce the code in mediawiki to just calling that api and formatting the output. With this type of thing updates to the search indices would be ported over to use the EventBus MVP ottomata and gabriel have been hashing out lately. For long term goals of integrating not only standard wiki content but also maps, natural language search via WDQS and such this is probably going to end up on the roadmap eventually.


In my head though, all of this is useless if we can't provide good ranking of search results in the first place. Our current user satisfaction metric estimates that around 15% of our users are happy with the search results they are getting. Most of the time, IMO, this isn't because we don't have the information already within the search indices of the particular wiki but because we do a, frankly, pretty poor job of scoring results and surfacing the best matching articles to the top.

TL/DR: Several quarters worth of quarterly goals

[1] https://github.com/ebernhardson/Elastica/commit/e856616a4ea480f4a6aa1c07f5ce7d77ace12145
[2] https://it.wikipedia.org/w/index.php?title=Speciale%3ARicerca&profile=default&search=ricerca&fulltext=Search


On Tue, Nov 3, 2015 at 3:57 PM, Pine W <wiki.pine@gmail.com> wrote:
Hi Discovery folks,

I'd love to make it easier for readers to discover related materials across projects and formats (Wikipedia, Wiktionary, Wikivoyage, Commons, Wikisource, maps, weather, etc). Any ideas about how to make this happen?

Thanks,
Pine

_______________________________________________
discovery mailing list
discovery@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/discovery