On 1/8/09 7:47 AM, Uwe Baumbach wrote:
Hi,
is there a comprehensive, reliable, more profound description of the logical steps the internal search engine (or parser before the engine) undertakes to define:
- what is recognized as a single word in an entered search string
(blanks - OK, but what about slash, back slash, hyphen, period?) ?
Check MySQL's documentation; also try diving through SearchMySQL.php to check how it's breaking up the input when rendering its output. Also check Language.php for the horrid search tweaking code.
- what are "similar words" (closeness of words) ?
No such metric exists afaik.
Different sources (www.mediawiki.org, xy.wikipedia.org/wiki/Help:Search, ...) tell more or less and then different things too.
Note that Wikimedia's sites use a different search engine (MWSearch extension plus our Lucene-based backend), so descriptions of their behavior would not necessarily be what you want if you're looking for descriptions of the default MySQL backend. Note also that the PostgreSQL backend is different.
-- brion