On 10/29/07, GerardM <gerard.meijssen(a)gmail.com> wrote:
"Sounds alike" is a feature that will prove exceedingly problematic. Have
an
Irishman, a Brit, an Australian, someone from Louisiana and a Canadian
pronounce the same words and then determine if the words still sound
alike.
The notion that the written word in a language like English defines the
pronunciation is wrong, at best it gives an approximation.
I think you're exaggerating. Certainly, there's a difference between
"r-ful"
and "r-less" speech, and the character of vowels changes, and there are even
slight differences in which vowels are distinguished (most Americans
pronounce "pa" and "paw" identically, while most Australians
pronounce
"poor" and "pour" identically), but these aren't major
issues. However, sound-matching just isn't the solution here: we're
not primarily concerned with helping people find an article
if they don't know how to spell it, we're more concerned with getting
people to the right article when either
they can't spell it on their keyboard, or there are many ways it could be
spelt, or even different words corresponding to the same article.
Does sound matching help at all in the "Nice" case? No. Not unless we really
think someone is going to type "Neece" when looking for the French city.
Does it help in the mulled wine/vin chaud/gluehwein case? No, again, except
for exceptional instances like someone desperately typing "van show" or
"glue vine". It may have some mild benefit to improving a good search
algorithm even further, but it's certainly not the essence of a solution
here.
However, I confess to being a bit stuck in my brainstorming
here. To summarise the chain of reasoning so far:
* I started this thread with a suggestion for a way to augment manual
redirects with lightweight pattern-based aliases.
* Then we realised that redirects are required to make existing articles
work, not just for searching.
* Having both redirects and another system would be kludgy and complex.
* So I propose attempting to do away with almost all redirects, by making
disambiguation happen at save time,
and thus only saving real links to real, unambiguous pages.
However, this major paradigm shift will cause a lot of upheaval,
development effort etc. What are the
benefits? Is it worth it? What problem are we trying to solve exactly?
Steve