Wikipedia search sltr model - Wiki-research-l

18 May 2022

Hi everyone!

I have a question concerning the relevance search on wikipedia articles, and Robert West
from EPFL pointed me to this mailing list as the best chance to answer it. I have been
checking the elasticsearch query performed by the wikipedia api when it runs a basic
search on the articles. More precisely, I am talking of the following api call:

https://en.wikipedia.org/w/api.php?action=query&list=search&format=…

The actual elasticsearch query is available with the cirrusDumpQuery parameter:

https://en.wikipedia.org/w/api.php?action=query&list=search&format=…

There are many things going on in that query, but my question is related with the
rescoring of the results that gives the final score. In particular, with the clause

{
    "sltr": {
        "model": "enwiki-20220421-20180215-query_explorer",
        "params": {
            "query_string": "architecture mathematics"
        }
    }
}

I understand that the results are passed together with the keywords to a stored machine
learning model whose name is enwiki-20220421-20180215-query_explorer. This, as far as I
understand, is done using the LTR plugin for elasticsearch
(https://github.com/o19s/elasticsearch-learning-to-rank). My question is the following: Is
this model openly available anywhere? If so, could you point me where? If not, do you know
why is it not openly available and yet used by Wikipedia?

I posted this as part of a question on stackoverflow some days ago. Please check
https://stackoverflow.com/questions/72213203/elasticsearch-query-for-wikipe… for
more context and some more related questions.

I thank you all in advance, have a nice day!

Aitor Pérez
Machine Learning Engineer
EPFL Graph - CEDE - EPFL
aitor.perez(a)epfl.ch