SUMMARY: The Search Platform team (formerly part of Discovery) is planning to fix a long-standing search bug on many wiki projects by disabling the code in CirrusSearch that re-uses the “fallback” languages (which are specified for user interface or system messages) for the language analysis modules (which are used to index words in search). Deployment is planned to start the week of October 9, 2017.
Messaging fallbacks specify what language to show a message in when there is no message available in the language of a given wiki. A language analysis module is language-specific software that processes text to improve searching—so that, for example, searching for a given word will find related forms of that word, like "hope, hopes, hoping, hoped" or "resume, resumé, résumé" on English-language wikis.
Fallback languages for system messages make sense for historical and cultural reasons—a reader of the Chechen Wikipedia is more likely to understand a user interface or system message in Russian than in French, Greek, Hindi, Italian, or Japanese—but the fallbacks don't necessarily make any linguistic sense. Chechen and Russian, for example, are from unrelated language families; while the languages have undoubtedly influenced one another, their grammars are completed different.
We will deploy the software change that disables using messaging fallbacks for language analysis fallbacks in about two weeks (targeting the week of October 9, 2017), with any cross-language analysis exceptions explicitly configured in a new manner. Changes will not immediately happen to all affected wikis because each wiki in each language will need to be re-indexed, which is a separate process that takes time. There may also be other delays caused by Elasticsearch upgrades or other changes that need immediate attention.
You can also track progress of the tasks on Phabricator[1] or read more, see examples, and get the full list of languages affected on MediaWiki.[2]
[1] https://phabricator.wikimedia.org/T147959
[2] https://www.mediawiki.org/wiki/Wikimedia_Discovery/Disabling_Messaging_Fallb...
Trey Jones Sr. Software Engineer, Search Platform Wikimedia Foundation
These changes were completed the week of October 9th and deployed the following week. The re-indexing of the affected wikis was completed a few hours ago and should be live everywhere now.
The list of affected languages is on the Phab ticket T177871[1] and a list by wiki is on that page in a comment.[2]
For more details, see the write up on MediaWiki.[3]
[1] https://phabricator.wikimedia.org/T177871 [2] https://phabricator.wikimedia.org/T177871#3702836 [3] https://www.mediawiki.org/wiki/Wikimedia_Discovery/ Disabling_Messaging_Fallbacks_for_Language_Analysis
Trey Jones Sr. Software Engineer, Search Platform Wikimedia Foundation
On Tue, Sep 26, 2017 at 12:12 PM, Trey Jones tjones@wikimedia.org wrote:
SUMMARY: The Search Platform team (formerly part of Discovery) is planning to fix a long-standing search bug on many wiki projects by disabling the code in CirrusSearch that re-uses the “fallback” languages (which are specified for user interface or system messages) for the language analysis modules (which are used to index words in search). Deployment is planned to start the week of October 9, 2017.
Messaging fallbacks specify what language to show a message in when there is no message available in the language of a given wiki. A language analysis module is language-specific software that processes text to improve searching—so that, for example, searching for a given word will find related forms of that word, like "hope, hopes, hoping, hoped" or "resume, resumé, résumé" on English-language wikis.
Fallback languages for system messages make sense for historical and cultural reasons—a reader of the Chechen Wikipedia is more likely to understand a user interface or system message in Russian than in French, Greek, Hindi, Italian, or Japanese—but the fallbacks don't necessarily make any linguistic sense. Chechen and Russian, for example, are from unrelated language families; while the languages have undoubtedly influenced one another, their grammars are completed different.
We will deploy the software change that disables using messaging fallbacks for language analysis fallbacks in about two weeks (targeting the week of October 9, 2017), with any cross-language analysis exceptions explicitly configured in a new manner. Changes will not immediately happen to all affected wikis because each wiki in each language will need to be re-indexed, which is a separate process that takes time. There may also be other delays caused by Elasticsearch upgrades or other changes that need immediate attention.
You can also track progress of the tasks on Phabricator[1] or read more, see examples, and get the full list of languages affected on MediaWiki.[2]
[1] https://phabricator.wikimedia.org/T147959
[2] https://www.mediawiki.org/wiki/Wikimedia_Discovery/ Disabling_Messaging_Fallbacks_for_Language_Analysis
Trey Jones Sr. Software Engineer, Search Platform Wikimedia Foundation
wikitech-l@lists.wikimedia.org