After extensive testing over the last several months using a new search
query scoring method called BM25 (Best Matching) , we recently completed
release to the following top languages: English, German, Spanish, Russian,
Portuguese, French, Italian, Polish, Dutch and Arabic. This new release is
replacing the older search method called tf-idf (term frequency-inverse
document frequency) .
testing to do [3,4] to figure out if BM25 will work in languages that
don’t use spaces in-between their words
i.e.: Japanese, Chinese, etc.
The Discovery team announces much of
completed work in weekly status updates [5
], but some of the work isn’t actually obvious to anyone who uses our
hat is because it isn’t actually ‘live’ until a complete re-index of the
servers occur. We’ve created a recurring ticket in Phabricator [
] to keep track of the work that goes live
after a re-index, such as the one we’ve also just completed. A few
are implementing ascii-folding for the French language and
for French ÿ, and Russian ’Е’ and 'Ё' when
those characters are
entered in a search query.
Cheers from the Discovery Search Team!
Product Manager, Discovery