On 10/29/07, GerardM gerard.meijssen@gmail.com wrote:
"Sounds alike" is a feature that will prove exceedingly problematic. Have an Irishman, a Brit, an Australian, someone from Louisiana and a Canadian pronounce the same words and then determine if the words still sound alike. The notion that the written word in a language like English defines the pronunciation is wrong, at best it gives an approximation.
I think you're exaggerating. Certainly, there's a difference between "r-ful" and "r-less" speech, and the character of vowels changes, and there are even slight differences in which vowels are distinguished (most Americans pronounce "pa" and "paw" identically, while most Australians pronounce "poor" and "pour" identically), but these aren't major issues. However, sound-matching just isn't the solution here: we're not primarily concerned with helping people find an article if they don't know how to spell it, we're more concerned with getting people to the right article when either they can't spell it on their keyboard, or there are many ways it could be spelt, or even different words corresponding to the same article.
Does sound matching help at all in the "Nice" case? No. Not unless we really think someone is going to type "Neece" when looking for the French city. Does it help in the mulled wine/vin chaud/gluehwein case? No, again, except for exceptional instances like someone desperately typing "van show" or "glue vine". It may have some mild benefit to improving a good search algorithm even further, but it's certainly not the essence of a solution here.
However, I confess to being a bit stuck in my brainstorming here. To summarise the chain of reasoning so far: * I started this thread with a suggestion for a way to augment manual redirects with lightweight pattern-based aliases. * Then we realised that redirects are required to make existing articles work, not just for searching. * Having both redirects and another system would be kludgy and complex. * So I propose attempting to do away with almost all redirects, by making disambiguation happen at save time, and thus only saving real links to real, unambiguous pages.
However, this major paradigm shift will cause a lot of upheaval, development effort etc. What are the benefits? Is it worth it? What problem are we trying to solve exactly?
Steve