Hello!
As part of our
goals for Q3 FY 2016-17 (Jan - Mar 2017), the Search Team will be researching, testing, and deploying new language analysers.
Language analysers are features in Elasticsearch that analyse and alter queries to give users better results. Language analysers perform important functions such as
tokenisation, and can also alter queries with language-specific features, such as:
- The English analyser would make the query "john's" also search for "john".
- The German analyser would make the query "äußerst" also search for "ausserst".
These alteration to users queries improve the relevance of the results given to users compared to not analysing the queries, because they can add extra documents that may be relevant into the results. Elastic has a
bunch of documentation if you want to read more about the language analysers do.
Some of the criteria we'll be using to evaluate the new analysers are:
- how much better we expect the analyser to be than the one we have
- the maturity and maintainability of the code of the analyser
- flexibility of customisation of the plugin
We'll be testing using our standard search metrics, such as zero results rate,
PaulScore, and others.
We'll be starting with Polish, since we already have some ideas for possible new plugins, and that'll allow us to more precisely figure out what criteria we want to use when evaluating the plugin.
As always, if there are any questions, please let me know!
Thanks,
Dan
--
Dan Garry
Lead Product Manager, Discovery
Wikimedia Foundation