Steve Bennett skrev:
On 10/26/07, Rolf Lampa rolf.lampa@rilnet.com wrote:
Soundex.
The title variants, or, very often due to differencies in spelling, is an old problem which was solved a long time ago, long before computers came about. The (old) solution was based on the fact that sound comprises differencies in spelling etc, hence "Soundex" :
Heh. No. Soundex is awful. There might be something better by now,
Probably.
but not Soundex. Anything but that. In a previous job I briefly flirted with it to perform name matching but it (or the SQL Server implementation at least) is useless - it collapses any name down to 4 consonants, making Steve and Stove identical, for instance.
Soundex is of course not a replacement for neither Redirects nor Aliases. Apart from that, Soundex, or its derivations, is getting better and better.
Anyway a Soundex-like tool might be useful to complement or improve searching,
Correct. And this is why I think it's a bit unfortunate that the entire WP is saturated with phonetic redirects (which seems to be a big part of the redirects). The phonetic part should have been taken care of "at the root of the tree", that is, in the search mechanism.
but the situation I'm describing here is when you know exactly what search terms you want to reach, but it's a lot of effort to create all those redirects.
Aliases is at risk of only creating another YARR, since an Alias is just that, a Redirect. Moreover, when you that you "know exactly" what terms you would like to be associated with that article then that alias cannot, in principle, be automagically created, instead an alias will always require your explicit definition. Which IS a good idea, but technically that is already supported through the existing redirects.
However, there is a difference, the Aliases would, as opposed to the existing redirects, be defined inside of the article instead of outside, and that opens up interesting perspectives, especially if changing the term to *Synonyms* instead of Aliases. I like the term "Synonyms" better because it implies supporting also human reading with more info (more than aliases does).
Synonyms should (for the same reasons as you have given for Aliases - and redirects) have its own unique markup. That would make it possible for machine reading, which means that the HTML-parser could autogenerate keywords, and other text indexers can prepare for presenting search results also based on these synonyms.
Therefore, in summary, I suggest Soundex (or modern derivations thereof, perhaps as part of the search mechanism - entirely automated though), and the concept of Synonyms to support a wider range of application than Aliases implies (the term "alias" is rather abstract and not very meaningful to most people). With an appropriate implementation* of a Synonyms concept, parsers and both internal and external Indexers could benefit from this info while at the same time it would potentially increase the informational value for human reading as well, especially if displayed** near the top of the article.
At last, Synonyms, and Soundex-like solutions for the search mechanism, are different enough, compared to Redirects, to not make for just YARR, as I pointed out in the previous post.
Regards,
// Rolf Lampa
* Synonyms could still be stored as Redirects, in the same table, perhaps with an extra state field identifying them as "InlineSynonyms".
** Perhaps special rendering for Synonyms, kind of like the Category rendering at the bottom of the pages, but near the top instead.