Hi everyone!
I have a question concerning the relevance search on wikipedia articles, and Robert West
from EPFL pointed me to this mailing list as the best chance to answer it. I have been
checking the elasticsearch query performed by the wikipedia api when it runs a basic
search on the articles. More precisely, I am talking of the following api call:
https://en.wikipedia.org/w/api.php?action=query&list=search&format=…
The actual elasticsearch query is available with the cirrusDumpQuery parameter:
https://en.wikipedia.org/w/api.php?action=query&list=search&format=…
There are many things going on in that query, but my question is related with the
rescoring of the results that gives the final score. In particular, with the clause
{
"sltr": {
"model": "enwiki-20220421-20180215-query_explorer",
"params": {
"query_string": "architecture mathematics"
}
}
}
I understand that the results are passed together with the keywords to a stored machine
learning model whose name is enwiki-20220421-20180215-query_explorer. This, as far as I
understand, is done using the LTR plugin for elasticsearch
(
https://github.com/o19s/elasticsearch-learning-to-rank). My question is the following: Is
this model openly available anywhere? If so, could you point me where? If not, do you know
why is it not openly available and yet used by Wikipedia?
I posted this as part of a question on stackoverflow some days ago. Please check
https://stackoverflow.com/questions/72213203/elasticsearch-query-for-wikipe… for
more context and some more related questions.
I thank you all in advance, have a nice day!
Aitor Pérez
Machine Learning Engineer
EPFL Graph - CEDE - EPFL
aitor.perez(a)epfl.ch