1) The first
three author names separated by slashes
why not separate by pluses? they don't
form part of names either, and
don't cause problems with wiki page titles.
I like this... however, how would you represent this in a URL? Also note that
using plusses in page names don't work with all server configurations, since
plus has a special meaning in URLs.
3) Some or all
of the date. For instance, if there is only one source by
this set of authors that year, we can just use YYYY. However, once another
source by those set of authors is added, the key should change to MMDDYYYY
I don't think it is a good idea to change one key as a function
updates on another, except for a generic disambiguation tag.
I agree. And if you *have* to use the full date, use YYYYMMDD, not the other way
slashes are somewhat cumbersome, perhaps we can not make them
mandatory, but similarly use them only when they are necessary in order to
"escape" a name. In the case that one of the authors does not have a slash
in their name - the dominant case - we can stick to the easily legible and
niecly compact CamelCase format.
Example keys generated by this algorithm:
Both seem good, though i would suggest to form a convention to ignore any
leading "the" and "a", to a more distinctive 3 word suffix.
Of course, it does not have to be _exactly_ three
authors, nor three
words from the title, and it does not solve the John Smith (or Zheng
It also doesn't solve issues with transliteration: Merik Möller may become
"Moeller" or "Moller", Jakob Voß may become "Voss" or
"Vosz" or even "VoB",
etc. In case of chinese names, it's often not easy to decide which part is the
To avoid this kind of ambiguity, i suggest to automatically apply some type of
normalization and/or hashing. There is quite a bit of research about this kind
of normalisation out there, generally with the aim of detecting duplicates.
Perhaps we can learn from bibsonomy.org
, have a look how they do it:
Gotta love open source university research projects :)