On 2023-05-09 22:09, Isaac Johnson wrote:
+1 to the suggestion to connect with the Search team.
Also a few more
thoughts about vector / natural-language search and its relevance to
Wikimedia from my perspective in Research:
* The common critique of lexical / keyword-based search and why
folks point to vector / embedding-based search is handling more
natural-language queries (e.g., "What are the different objectives
of the United Nations Sustainable Development Goals?" vs. "UN
SDG"). The former has a lot of words in it that lead to keyword
overlap with less-relevant pages so keyword-based search doesn't
do as well. The latter is much more direct and even matches an
existing redirect on Wikipedia to the article on UN Sustainable
Development Goals, so our existing keyword-based search handles it
very well.
* Most existing users of Wikimedia's search are probably doing
something closer to the latter above -- i.e. using pretty exact
keywords to navigate to a specific page (or find it exists).
I disagree. The benefit we should expect from vector search is not the
ability to write questions with fuzzy grammar while still using exact
terminology, but instead to use fuzzy terminology. Today most users
search with exact terms, because that's the only thing our search
function can handle. You can only search for the terms that are used
in the articles. That's not any stranger than the observation that owners
of a Fortran compiler tend to write programs in Fortran, as those are
the only ones that will compile into running code. Most users would not
search for "sustainable development goals" because they are not familiar
with this exact UN terminology. Instead they might wonder how the UN
envisions the future for humanity. And if those exact words are not in
the relevant article, the current text-based search will yield nothing.
On Meta there's a list of mailing lists that mentions "wikimedia-search",
but that list seems to be dead and the archive is full of spam.
Another list exists, called "discovery", but not listed on Meta.
https://lists.wikimedia.org/hyperkitty/list/discovery@lists.wikimedia.org/
--
Lars Aronsson (lars(a)aronsson.se, user:LA2)
Linköping, Sweden