On Wed, Jul 21, 2010 at 2:36 PM, Jakob
<jakob.voss@s1999.tu-chemnitz.de> wrote:
Hi,
Talking about identifiers for bibliographic records I just want to
stress one crucial point:
> This gives us the following key, guaranteed to be unique:
> KangHsuKrajbich20091011b
There is absolutely no such thing as a "guaranteed unique identifier"
that can be derived from existing metadata. You will *always* have
false positives (different publications get the same identifier [1])
and false negatives (same publication has different identifiers [2]).
Fuzzy identifiers even occur if they are created by the publisher or
author himself (for instance duplicate ISBNs for definitely different
editions or even totally different books). If you argue about
identifiers please keep in mind that you *always* talk about
heuristics but not about something "unique per se". Existing
identifiers only differ in the ratio of false positives and false
negatives.
The only way you may get unique identifiers is to assign your own
identifiers that are *not* derived from the content - such as
auto-incremented record ids in a database. Even then they are not
unique if you change the content because the identity of the object
may change. A MD5 or SHA-sum on the full content [3] or the version id
in a versioning database (like MediaWiki) is unique but not practical
if you want to change content. A solution to this problem is to let
people decide in every single case about how an identifier looks like
and when it should change (example: Wikipedia article titles). But
then the identifiers are not permanent (records may split and join and
be renamed).
That's the way it is. You have to decide which problem to solve with
an identifier and then be aware of its limitations. As Brooks [3]
wrote there is no silver bullet - so there is no silver identifier.
Cheers
Jakob
[1] For instance if you have a common name and a general title or if
you want to distinguish the printed version and the presentation
slides of the same publication etc.
[2] For instance different ways to abbreviate and/or write the name of
an author and/or title, different years (year of preprint vs year of
printed version) etc.
[3] See http://en.wikipedia.org/wiki/No_Silver_Bullet which cites an
article that has been published in 1986 and 1987, and probably
reprinted in another year - so what's the identifier? ;-)
Hi Jakob,
I would like to counter this point with the following rule: There is always a way to adjudicate ambiguity. It is easy to create a rule that works in 90% of cases:
Author1Author2Author3EtAl10
It is easy to modify this rule to work in 99% of cases:
Author1Author2Author3EtAl20101011b
Modifying the rule to work in 100% of cases requires a community of users to adjudicate the relatively small number of special cases.
Brian