On 07/21/2010 03:36 PM, Jakob wrote:
Hi,
Talking about identifiers for bibliographic records I just want to stress one crucial point:
This gives us the following key, guaranteed to be unique: KangHsuKrajbich20091011b
There is absolutely no such thing as a "guaranteed unique identifier" that can be derived from existing metadata. You will *always* have false positives (different publications get the same identifier [1]) and false negatives (same publication has different identifiers [2]). Fuzzy identifiers even occur if they are created by the publisher or author himself (for instance duplicate ISBNs for definitely different editions or even totally different books). If you argue about identifiers please keep in mind that you *always* talk about heuristics but not about something "unique per se". Existing identifiers only differ in the ratio of false positives and false negatives.
The only way you may get unique identifiers is to assign your own identifiers that are *not* derived from the content - such as auto-incremented record ids in a database. Even then they are not unique if you change the content because the identity of the object may change.
I haven't been following this thread, but the way I addressed this in my own bibliography manager (http://yabman.sourceforge.net/) is: the BibTeX key is the first author's name (lowercased) plus an auto-incremented ID. So for example, one of my papers is "priedhorsky229". 229 is arbitrary, but there's only a few 3-digit numbers per author, so I don't get confused.
Now in a large system, that would obviously break down into the long, incomprehensible CiteULike-type IDs.
A compromise could be that the ID is the first author's name plus an auto-incrememented ID per author. So for example, the first paper of mine the system learns is priedhorsky1, the second priedhorsky2, etc. So you get a system-generated ID for uniqueness but also something comprehensible for people.
HTH,
Reid