I'm curious what the actual question is. The basic concepts are
studied for about 60 years, and are in use for about 20 to 30 years.
One particular detail the industry apparently needs to re-learn every
time is how easily such vector spaces encode and reproduce any
existing bias, racism, phobia, and so on, and how hard it is to raise
awareness, despite doing something about it.
That said, the Elasticsearch technology we currently use on Wikimedia
infrastructure in version 7.10.x is already responding to the current
machine learning hype cycle.
https://www.elastic.co/de/blog/introducing-approximate-nearest-neighbor-sea…
https://en.wikipedia.org/wiki/Special:Version
We certainly need to update some day, but I think nobody is actively
working on this at the moment. However, the topic appears in the
currently discussed annual plan. The responsible Search Platform team
is also quite active and monitors a good selection of communication
channels, including a separate mailing list.
https://meta.wikimedia.org/wiki/Wikimedia_Foundation_Annual_Plan/2023-2024/…
https://wikitech.wikimedia.org/wiki/Search_Platform/Contact#Office_Hours
Kind regards
Thiemo