Names can have up to two of these three properties:
- Secure (Unique) - Decentralized (Global) - Human-meaningful
Decentralized and human-meaningful: this is true of nicknames people choose for themselves
Secure and human-meaningful: this is the property that domain names and URLs aim for
Secure and decentralized: this is a property of OpenPGP key fingerprints
Terrell
/bcc Zooko
On 7/21/10 4:36 PM, Jakob wrote:
Hi,
Talking about identifiers for bibliographic records I just want to stress one crucial point:
This gives us the following key, guaranteed to be unique: KangHsuKrajbich20091011b
There is absolutely no such thing as a "guaranteed unique identifier" that can be derived from existing metadata. You will *always* have false positives (different publications get the same identifier [1]) and false negatives (same publication has different identifiers [2]). Fuzzy identifiers even occur if they are created by the publisher or author himself (for instance duplicate ISBNs for definitely different editions or even totally different books). If you argue about identifiers please keep in mind that you *always* talk about heuristics but not about something "unique per se". Existing identifiers only differ in the ratio of false positives and false negatives.
The only way you may get unique identifiers is to assign your own identifiers that are *not* derived from the content - such as auto-incremented record ids in a database. Even then they are not unique if you change the content because the identity of the object may change. A MD5 or SHA-sum on the full content [3] or the version id in a versioning database (like MediaWiki) is unique but not practical if you want to change content. A solution to this problem is to let people decide in every single case about how an identifier looks like and when it should change (example: Wikipedia article titles). But then the identifiers are not permanent (records may split and join and be renamed).
That's the way it is. You have to decide which problem to solve with an identifier and then be aware of its limitations. As Brooks [3] wrote there is no silver bullet - so there is no silver identifier.
Cheers Jakob
[1] For instance if you have a common name and a general title or if you want to distinguish the printed version and the presentation slides of the same publication etc.
[2] For instance different ways to abbreviate and/or write the name of an author and/or title, different years (year of preprint vs year of printed version) etc.
[3] See http://en.wikipedia.org/wiki/No_Silver_Bullet which cites an article that has been published in 1986 and 1987, and probably reprinted in another year - so what's the identifier? ;-)
Wiki-research-l mailing list Wiki-research-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wiki-research-l