Cross-project visibility of content - Discovery - lists.wikimedia.org

List overview All Threads
Download

Cross-project visibility of content

Google code-in?

Discovery Deployment freeze in...

Pine W

3 Nov 2015 3 Nov '15

11:57 p.m.

Hi Discovery folks, I'd love to make it easier for readers to discover related materials across projects and formats (Wikipedia, Wiktionary, Wikivoyage, Commons, Wikisource, maps, weather, etc). Any ideas about how to make this happen? Thanks, Pine

Attachments:

attachment.htm (text/html — 328 bytes)

Reply

Show replies by date

Erik Bernhardson

4 Nov 4 Nov

12:13 a.m.

A couple options: 1) We recently made it possible to load in the configuration of another wiki and perform search queries as if you were on the other wiki. This is used in the language detection code we are testing this month. The problem with this is php has no async support. So you make 1 request to ES for eswiki. Then 1 request for eswiktionary, and on and on. To this end i ported the library we use for talking to elasticsearch to hacklang which does support async, but it ended up being a ridiculously large patch[1] which amounts to a fork we don't really want to maintain, even if 95% of the patch is just adding type signatures to existing functions. 2) Continue along with the experiemental interwiki search on itwiki[2]. This has the exact same problems as above with making requests one after the other, but doesn't know anything about the configuration of the other search so has to dumb down the query that is made to the lowest possible common denominator. This needs a *lot* of love in the UI department, and has frankly very poor performance characteristics. 3) Refactor cirrussearch such that it can take advantage of elasticsearch querying multiple indices during the same request. This is possible, but a ton of work because the code wasn't written to support this. Additionally we are back to the lowest possible common denominator, where any particular configuration that makes that wiki unique is lost as elasticsearch is running the exact same query just against multiple indices. Everything above still fails to address our desire to include content other than standard wiki pages such as natural language search via WDQS. So then we also have: 4) Write a service in a language that naturally supports parallelism and async (hacklang, node.js, go, etc.) that would be a frontend to multiple backend search services (elasticsearch, wdqs, etc). This would strictly be an api level service. Reduce the code in mediawiki to just calling that api and formatting the output. With this type of thing updates to the search indices would be ported over to use the EventBus MVP ottomata and gabriel have been hashing out lately. For long term goals of integrating not only standard wiki content but also maps, natural language search via WDQS and such this is probably going to end up on the roadmap eventually. In my head though, all of this is useless if we can't provide good ranking of search results in the first place. Our current user satisfaction metric estimates that around 15% of our users are happy with the search results they are getting. Most of the time, IMO, this isn't because we don't have the information already within the search indices of the particular wiki but because we do a, frankly, pretty poor job of scoring results and surfacing the best matching articles to the top. TL/DR: Several quarters worth of quarterly goals [1] https://github.com/ebernhardson/Elastica/commit/e856616a4ea480f4a6aa1c07f5c… [2] https://it.wikipedia.org/w/index.php?title=Speciale%3ARicerca&profile=d… On Tue, Nov 3, 2015 at 3:57 PM, Pine W <wiki.pine(a)gmail.com> wrote:

Hi Discovery folks, I'd love to make it easier for readers to discover related materials across projects and formats (Wikipedia, Wiktionary, Wikivoyage, Commons, Wikisource, maps, weather, etc). Any ideas about how to make this happen? Thanks, Pine _______________________________________________ discovery mailing list discovery(a)lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/discovery

Reply

Federico Leva (Nemo)

3:11 p.m.

Pine W, 04/11/2015 00:57:

I'd love to make it easier for readers to discover related materials across projects and formats (Wikipedia, Wiktionary, Wikivoyage, Commons, Wikisource, maps, weather, etc). Any ideas about how to make this happen?

Make your wiki: 1) install wdsearch by default https://en.wikipedia.org/wiki/MediaWiki_talk:Wdsearch.js , 2) enable the "other projects sidebar" by default: https://www.mediawiki.org/wiki/Beta_Features/Other_projects_sidebar . Nemo

Reply

3102

days inactive

3103

days old

discovery@lists.wikimedia.org

Manage subscription

2 comments

3 participants

tags (0)

participants (3)

Erik Bernhardson
Federico Leva (Nemo)
Pine W