Did we ever look into whether we managed to address all that the custom Lucene code used to do, especially for Russian? https://wikitech.wikimedia.org/wiki/Search/2013#Search_details_.28Java.29
While we're at it, perhaps Hebrew's tokenization can be improved: https://phabricator.wikimedia.org/T154348#2912086
Starting with Polish makes sense, however.
Nemo