On 21 Jul 2010, at 09:42, Daniel Kinzler wrote:
Kang+Hsu+Krajbich+2009+the+wick+in
This seems best to me of what's proposed so far.
Both seem good, though i would suggest to form a convention to ignore any leading "the" and "a", to a more distinctive 3 word suffix.
While that's a good idea, then we'd have to know all "indistinctive" words in all languages. (Die, Der, La, L', ...)
There are still going to be duplicates, alas...
Of course, it does not have to be _exactly_ three authors, nor three words from the title, and it does not solve the John Smith (or Zheng Wang) problem.
It also doesn't solve issues with transliteration: Merik Möller may become "Moeller" or "Moller", Jakob Voß may become "Voss" or "Vosz" or even "VoB", etc. In case of chinese names, it's often not easy to decide which part is the last name.
To avoid this kind of ambiguity, i suggest to automatically apply some type of normalization and/or hashing. There is quite a bit of research about this kind of normalisation out there, generally with the aim of detecting duplicates. Perhaps we can learn from bibsonomy.org, have a look how they do it: http://www.bibsonomy.org/help/doc/inside.html.
Good idea!
-Jodi