
As part of our goals for Q3 FY 2016-17 (Jan - Mar 2017), the Search Team will be researching, testing, and deploying new language analysers.

Language analysers are features in Elasticsearch that analyse and alter queries to give users better results. Language analysers perform important functions such as tokenisation, and can also alter queries with language-specific features, such as:
These alteration to users queries improve the relevance of the results given to users compared to not analysing the queries, because they can add extra documents that may be relevant into the results. Elastic has a bunch of documentation if you want to read more about the language analysers do.

Some of the criteria we'll be using to evaluate the new analysers are:
We'll be testing using our standard search metrics, such as zero results rate, PaulScore, and others.
We'll be starting with Polish, since we already have some ideas for possible new plugins, and that'll allow us to more precisely figure out what criteria we want to use when evaluating the plugin.

As always, if there are any questions, please let me know!


Dan Garry
Lead Product Manager, Discovery
Wikimedia Foundation