SUMMARY: The Search Platform team (formerly part of Discovery) is planning to fix a long-standing search bug on many wiki projects by disabling the code in CirrusSearch that re-uses the “fallback” languages (which are specified for user interface or system messages) for the language analysis modules (which are used to index words in search). Deployment is planned to start the week of October 9, 2017.

Messaging fallbacks specify what language to show a message in when there is no message available in the language of a given wiki. A language analysis module is language-specific software that processes text to improve searching—so that, for example, searching for a given word will find related forms of that word, like "hope, hopes, hoping, hoped" or "resume, resumé, résumé" on English-language wikis.

Fallback languages for system messages make sense for historical and cultural reasons—a reader of the Chechen Wikipedia is more likely to understand a user interface or system message in Russian than in French, Greek, Hindi, Italian, or Japanese—but the fallbacks don't necessarily make any linguistic sense. Chechen and Russian, for example, are from unrelated language families; while the languages have undoubtedly influenced one another, their grammars are completed different.

We will deploy the software change that disables using messaging fallbacks for language analysis fallbacks in about two weeks (targeting the week of October 9, 2017), with any cross-language analysis exceptions explicitly configured in a new manner. Changes will not immediately happen to all affected wikis because each wiki in each language will need to be re-indexed, which is a separate process that takes time. There may also be other delays caused by Elasticsearch upgrades or other changes that need immediate attention.

You can also track progress of the tasks on Phabricator[1] or read more, see examples, and get the full list of languages affected on MediaWiki.[2]



Trey Jones
Sr. Software Engineer, Search Platform
Wikimedia Foundation