Steve Bennett skrev:
On 10/26/07, Rolf Lampa <rolf.lampa(a)rilnet.com>
wrote:
Soundex.
The title variants, or, very often due to differencies in spelling, is
an old problem which was solved a long time ago, long before computers
came about. The (old) solution was based on the fact that sound
comprises differencies in spelling etc, hence "Soundex" :
Heh. No. Soundex is awful. There might be something better by now,
Probably.
but not Soundex. Anything but that.
In a previous job I briefly flirted with it to perform name matching but it
(or the SQL Server implementation at least)
is useless - it collapses any name down to 4 consonants, making Steve
and Stove identical, for instance.
Soundex is of course not a replacement for neither Redirects nor
Aliases. Apart from that, Soundex, or its derivations, is getting
better and better.
Anyway a Soundex-like tool might be useful to complement or improve
searching,
Correct. And this is why I think it's a bit unfortunate that the
entire WP is saturated with phonetic redirects (which seems to be a
big part of the redirects). The phonetic part should have been taken
care of "at the root of the tree", that is, in the search mechanism.
but the situation I'm describing here is when you
know exactly
what search terms you want to reach, but it's a lot of effort to create all
those redirects.
Aliases is at risk of only creating another YARR, since an Alias is
just that, a Redirect. Moreover, when you that you "know exactly" what
terms you would like to be associated with that article then that
alias cannot, in principle, be automagically created, instead an alias
will always require your explicit definition. Which IS a good idea,
but technically that is already supported through the existing redirects.
However, there is a difference, the Aliases would, as opposed to the
existing redirects, be defined inside of the article instead of
outside, and that opens up interesting perspectives, especially if
changing the term to *Synonyms* instead of Aliases. I like the term
"Synonyms" better because it implies supporting also human reading
with more info (more than aliases does).
Synonyms should (for the same reasons as you have given for Aliases -
and redirects) have its own unique markup. That would make it possible
for machine reading, which means that the HTML-parser could
autogenerate keywords, and other text indexers can prepare for
presenting search results also based on these synonyms.
Therefore, in summary, I suggest Soundex (or modern derivations
thereof, perhaps as part of the search mechanism - entirely automated
though), and the concept of Synonyms to support a wider range of
application than Aliases implies (the term "alias" is rather abstract
and not very meaningful to most people). With an appropriate
implementation* of a Synonyms concept, parsers and both internal and
external Indexers could benefit from this info while at the same time
it would potentially increase the informational value for human
reading as well, especially if displayed** near the top of the article.
At last, Synonyms, and Soundex-like solutions for the search
mechanism, are different enough, compared to Redirects, to not make
for just YARR, as I pointed out in the previous post.
Regards,
// Rolf Lampa
* Synonyms could still be stored as Redirects, in the same table,
perhaps with an extra state field identifying them as "InlineSynonyms".
** Perhaps special rendering for Synonyms, kind of like the Category
rendering at the bottom of the pages, but near the top instead.