On 21 Jul 2010, at 09:42, Daniel Kinzler wrote:
> Kang+Hsu+Krajbich+2009+the+wick+in
This seems best to me of what's proposed so far.
Both seem good, though i would suggest to form a
convention to ignore any
leading "the" and "a", to a more distinctive 3 word suffix.
While that's a good idea, then we'd have to know all "indistinctive"
words in all languages. (Die, Der, La, L', ...)
Stopword lists for major languages exists, and where they don't, they are easily
created, even automatically. Word frequency analysis on a few megabyte of text
is cheap these days :)
-- daniel