[Wikitech-l] Re: Word embeddings / vector search

9 May 2023

I'm curious what the actual question is. The basic concepts are
studied for about 60 years, and are in use for about 20 to 30 years.
One particular detail the industry apparently needs to re-learn every
time is how easily such vector spaces encode and reproduce any
existing bias, racism, phobia, and so on, and how hard it is to raise
awareness, despite doing something about it.

That said, the Elasticsearch technology we currently use on Wikimedia
infrastructure in version 7.10.x is already responding to the current
machine learning hype cycle.

https://www.elastic.co/de/blog/introducing-approximate-nearest-neighbor-sea…
https://en.wikipedia.org/wiki/Special:Version

We certainly need to update some day, but I think nobody is actively
working on this at the moment. However, the topic appears in the
currently discussed annual plan. The responsible Search Platform team
is also quite active and monitors a good selection of communication
channels, including a separate mailing list.

https://meta.wikimedia.org/wiki/Wikimedia_Foundation_Annual_Plan/2023-2024/…
https://wikitech.wikimedia.org/wiki/Search_Platform/Contact#Office_Hours

Kind regards
Thiemo

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

2003

2002

[Wikitech-l] Re: Word embeddings / vector search