On 1/8/09 7:47 AM, Uwe Baumbach wrote:
Hi,
is there a comprehensive, reliable, more profound description of the
logical steps the internal search engine (or parser before the engine)
undertakes to define:
- what is recognized as a single word in an entered search string
(blanks - OK, but what about slash, back slash, hyphen, period?) ?
Check MySQL's documentation; also try diving through SearchMySQL.php to
check how it's breaking up the input when rendering its output. Also
check Language.php for the horrid search tweaking code.
- what are "similar words" (closeness of
words) ?
No such metric exists afaik.
Different sources (
www.mediawiki.org,
xy.wikipedia.org/wiki/Help:Search, ...) tell more or less and then different things too.
Note that Wikimedia's sites use a different search engine (MWSearch
extension plus our Lucene-based backend), so descriptions of their
behavior would not necessarily be what you want if you're looking for
descriptions of the default MySQL backend. Note also that the PostgreSQL
backend is different.
-- brion