On Wed, Jul 21, 2010 at 2:42 AM, Daniel Kinzler <daniel(a)brightbyte.de>wrote;wrote:
1) The first three author names separated by slashes
why not separate by pluses? they don't form part of names either, and
don't cause problems with wiki page titles.
I like this... however, how would you represent this in a URL? Also note
using plusses in page names don't work with all server configurations,
plus has a special meaning in URLs.
> 3) Some or all of the date. For instance, if
there is only one source by
> this set of authors that year, we can just use YYYY. However, once
> source by those set of authors is added, the
key should change to
I don't think it is a good idea to change one key as a function of
updates on another, except for a generic disambiguation tag.
I agree. And if you *have* to use the full date, use YYYYMMDD, not the
> Since the slashes are somewhat cumbersome,
perhaps we can not make them
> mandatory, but similarly use them only when they are necessary in order
> "escape" a name. In the case that
one of the authors does not have a
> in their name - the dominant case - we can
stick to the easily legible
compact CamelCase format.
Example keys generated by this algorithm:
Both seem good, though i would suggest to form a convention to ignore any
leading "the" and "a", to a more distinctive 3 word suffix.
Of course, it does not have to be _exactly_ three
authors, nor three
words from the title, and it does not solve the John Smith (or Zheng
It also doesn't solve issues with transliteration: Merik Möller may become
"Moeller" or "Moller", Jakob Voß may become "Voss" or
"Vosz" or even
etc. In case of chinese names, it's often not easy to decide which part is
To avoid this kind of ambiguity, i suggest to automatically apply some type
normalization and/or hashing. There is quite a bit of research about this
of normalisation out there, generally with the aim of detecting duplicates.
Perhaps we can learn from bibsonomy.org
, have a look how they do it:
Gotta love open source university research projects :)
Bibsonomy seems to suffer from the same problem as CiteULike - urls which
convey no meaning. An example url id from CiteULike is 2434335, and one from
Bibsonomy is 29be860f0bdea4a29fba38ef9e6dd6a09. I hope to continue to steer
the conversation away from that direction. These IDs guarantee uniqueness,
but I believe that we can create keys that both guarantee uniqueness and
convey some meaning to humans. Consider that this key will be embedded in
wiki articles any time a source is cited. It's important that it make some
Plus signs and slashes in the key appear to be cumbersome. Perhaps we can
avoid this by truncating last names that involve a slash to either the
portion before or after the slash.
Changing the key seems to be a bad idea, so we want a key system that is
unique from the start. That means we should use the full date, YYYYMMDD as
suggested by Daniel.
In the event that multiple sources are published by the same set of authors
on the same day, we can use a, b, c disambiguation.
This gives us the following key, guaranteed to be unique: