Jodi Schneider schrieb:
On 21 Jul 2010, at 09:42, Daniel Kinzler wrote:
Kang+Hsu+Krajbich+2009+the+wick+in
This seems best to me of what's proposed so far.
Both seem good, though i would suggest to form a convention to ignore any leading "the" and "a", to a more distinctive 3 word suffix.
While that's a good idea, then we'd have to know all "indistinctive" words in all languages. (Die, Der, La, L', ...)
Stopword lists for major languages exists, and where they don't, they are easily created, even automatically. Word frequency analysis on a few megabyte of text is cheap these days :)
-- daniel